A study by Legal Guardian Digital measured hallucination rates, uptime, and user satisfaction across nine leading AI assistants — and found that popularity does not guarantee accuracy.
With roughly one in four American workers now relying on AI assistants for daily tasks, the question of which chatbot is most dependable has moved from tech circles to boardrooms and offices. A new study published in April 2026 by Legal Guardian Digital, a digital marketing agency for law firms, set out to answer that question with data rather than reputation.
The research scored nine widely used AI chatbots across four dimensions: hallucination rate, customer satisfaction rating, response quality and consistency, and uptime. The findings challenge a common assumption in the workplace — that the most popular tool is the most trustworthy one.
Methodology
Researchers tracked how often each chatbot produced false or fabricated information, monitored service availability over a defined period, collected product ratings from users, and assessed the consistency of responses across varied query types. These metrics were combined into a composite reliability index scored from 0 to 100.
Rankings at a Glance
| Rank | Chatbot | Hallucination Rate | Customer Rating (0–5) | Consistency (0–5) | Uptime | Index Score (0–100) |
| 1 | Perplexity AI | 13% | 4.6 | 3.5 | 100% | 85 |
| 2 | Grok | 15% | 4.5 | 3.5 | 100% | 79 |
| 3 | DeepSeek | 14% | 4.7 | 3.5 | 99.52% | 76 |
| 4 | Kimi | 27% | 4.5 | 4.3 | 99.94% | 60 |
| 5 | Microsoft Copilot | 27% | 4.4 | 4.0 | 99.9% | 53 |
| 6 | ChatGPT | 30% | 4.7 | 4.0 | 99.98% | 50 |
| 7 | Claude | 20% | 4.4 | 3.5 | 98.68% | 45 |
| 8 | Google Gemini | 32% | 4.4 | 4.0 | 99.95% | 41 |
| 9 | Meta AI | 25% | 3.4 | 3.4 | 99.9% | 37 |
Source: Legal Guardian Digital (April 2026). Highlighted rows = top 3.
Perplexity AI: Accuracy Leads the Field
Perplexity AI earned the top reliability score of 85 out of 100, driven by its low hallucination rate of 13% — the second-lowest in the study — and a perfect uptime record. Users rated the service 4.6 out of 5, the highest customer satisfaction score among the top three. At $40 per month, it is the most expensive option reviewed, but user ratings suggest the premium is viewed as justified by a significant share of subscribers.
The platform’s accuracy means employees receive correct answers in roughly nine out of ten queries, a margin the study’s authors describe as meaningful in professional environments where factual errors carry real consequences.
Grok Holds Perfect Uptime in Second Place
Grok ranked second with a reliability index of 79. Like Perplexity, it recorded 100% uptime throughout the study period, meaning workers face no service interruptions during use. Its hallucination rate of 15% sits close to the top, and user ratings of 4.5 out of 5 reflect broad satisfaction. The service costs $30 per month.
Where Grok loses ground is response consistency, scoring 3.5 out of 5 — the same figure as Perplexity — meaning answers can vary in quality depending on how a question is worded.
DeepSeek: Free and More Accurate Than Most Paid Rivals
The Chinese-developed DeepSeek placed third with an index score of 76. It recorded the highest user satisfaction rating in the study at 4.7 out of 5, and its 14% hallucination rate is lower than that of several paid competitors, including ChatGPT and Microsoft Copilot. The service is free to use, which the study authors note makes its accuracy-to-cost ratio the most favorable of any chatbot reviewed.
Its principal weakness is uptime. DeepSeek’s 99.52% availability, while still high in absolute terms, is the lowest among the top four and means users may encounter outages during periods of heavy demand.
ChatGPT: Dominant Market Share, Lagging Accuracy
ChatGPT, which commands approximately 60% of the AI chatbot market, placed sixth in the reliability index with a score of 50. Its hallucination rate of 30% is the central concern: three in ten responses contain incorrect or fabricated information, a rate roughly 2.3 times higher than the study’s top performer.
Austin Hunt, CEO of Legal Guardian Digital, addressed this directly: “People assume ChatGPT is the most reliable because it’s the most popular, but that’s not true. Its market share comes from being first and having strong marketing, not from being the best product. When you actually measure error rates and uptime, smaller chatbots like Perplexity and Grok beat the big names.”
ChatGPT’s uptime of 99.98% is among the best in the study, and its user rating of 4.7 out of 5 matches DeepSeek’s for the joint highest. The data suggests users appreciate the experience even where factual accuracy falls short.
Other Notable Findings
Kimi, a lesser-known option ranked fourth, achieved the highest consistency score in the study at 4.3 out of 5 — higher than every competitor — making it a practical choice for workers who need reliable performance across extended, varied conversations. At $19 per month, it is among the more affordable paid options.
Microsoft Copilot, fifth in the rankings, holds 12.8% market share and has become a common fixture in corporate environments. Its 27% hallucination rate and 4.0 consistency score place it in the middle tier, performing comparably to Kimi at a similar price point of $20 per month.
Claude, produced by Anthropic, ranked seventh with a hallucination rate of 20% — below the industry average of 22% cited in the study — but its uptime of 98.68% is the lowest of any chatbot reviewed, a factor that weighed on its overall score. Google Gemini placed eighth with a 32% hallucination rate, the highest in the group, while Meta AI ranked ninth.
Implications for Workplace Adoption
The study’s broader finding is that accuracy and market presence do not necessarily correlate. Workers and organizations choosing AI tools based on name recognition may be accepting a higher error rate than less prominent alternatives would produce. For use cases involving research, legal review, or business decision-making, the gap between a 13% and a 30% hallucination rate represents a material difference in outcome reliability.
The full dataset is available via the link provided by Legal Guardian Digital. Source credit: legalguardian.io.

Dr. Jakob Jung is Editor-in-Chief of Security Storage and Channel Germany. He has been working in IT journalism for more than 20 years. His career includes Computer Reseller News, Heise Resale, Informationweek, Techtarget (storage and data center) and ChannelBiz. He also freelances for numerous IT publications, including Computerwoche, Channelpartner, IT-Business, Storage-Insider and ZDnet. His main topics are channel, storage, security, data center, ERP and CRM.
Contact via Mail: jakob.jung@security-storage-und-channel-germany.de