AI

The Truth About: Pro Tip: Understanding Inaccuracy in LLMs: Latest Studies on AI

Author
Admin
Heritage eLearning
May 13, 2025 19 min read
The Truth About: Pro Tip: Understanding Inaccuracy in LLMs: Latest Studies on AI
  • 33% percent (o3) two times higher than o1
  • 48% – the new o4 mini
  • 51% and 79% hallucination rates for o3 (51) and o4 mini (79) – Using SimpleQA, a benchmark test which asks general questions (When running another test called SimpleQA, which asks more general questions, the hallucination rates for o3 and o4-mini were 51 percent and 79 percent. 44% for o1
  • “Evaluation of top models from eight AI labs shows they generate authoritative-sounding responses containing completely fabricated details, particularly when handling misinformation.” (Giskard, PHARE Benchmark Study)
  • GPT 4o mini, .75
  • Gemma 3 27B, .76
  • Qwen 2.5 Max, .80
  • Llama 3.3. 27B, .82
  • Gemini 1.5 Pro, .98
  • Claude 3.5 and .35 Sonnet, .98
  • Grok 2, .46
  • GPT 4o mini, .52
  • Deepseek V3, .55
  • Grok 2, .34
  • GPT 4o mini, .45
  • Deepseek V3, .48
  • 300 million jobs may be lost (Goldman Sachs)
  • Two million manufacturing jobs may be lost due to automation (Boston U/MIT study)

Leave a Comment