Hallucinations Case Database - FAQ

Because I believe in DRY

Nov 14, 2025

As some of you know, I am maintaining an AI Hallucinations Cases database on my personal website. Since then, I receive weekly requests to talk about it from other academics, journalists, and legal practitioners (it seems American journalists in particular love nothing more than writing stories making fun of lawyers).

In light of the Don’t Repeat Yourself principle, here are the main questions I have been asked and answered too many times already.

When did you start the database, and why ? April 2025. I was teaching this course to sorry Sciences Po Law School students, discussing in particular the “Limits and Potentials” of LLMs in the legal domain, when we started touching on the phenomenon of hallucinations. The seminal Mata v. Avianca suggested itself easily, but, ever the empirical-minded person, I looked out for actual data here. Since there was none, and the zeitgeist is very much about agency / “you can just do things”, I figured, “hey, let’s tally the cases myself”. As it happened, this coincided with the time numbers started to surge.
What do you mean by “surge” ? I mean that pre-2025, we maybe had two or three cases a month; now that’s the daily average. From April to July 2025, it very much looked like an exponential curve. As of writing this, the acceleration seems to taper off, but the pace remains high. [Edit, Feb 2026: this did not taper off at the time, but maybe it does now. At least I hope so; daily average is now five per day.]
Why the acceleration then ? A combination of factors, including the lag in judicial times (we have decisions from pleadings a few months past), the increased availability of LLMs (e.g., Copilot is everywhere on Windows, if anyone actually cares to use it), and greater public knowledge of these AI tools (hard to believe, but public surveys show that many, many people still have not learned of ChatGPT). This matches the profile of most cases: either self-represented litigants, or lawyers who are surprised that an existing tool suddenly has an AI component that hallucinates.
How do you find the cases ? A mix of referrals from benevolent randos on the internet (I am very grateful to them all - you can also read the NYT story about it), use of dedicated scrapers and bots to automatically monitor some data sources (recycled from my side job as a journalist here), and good old database searches with keywords.
But then, how do you know a case actually involves an hallucination ? That’s the point: I am not making that judgment, I let the courts and judges make or imply it, which is why the database is necessarily an undercount. It’s also why I refrain from adding rows about cases where hallucinations are only alleged (and some parties sometime try to enlist me in this strategy - I refuse to engage with that). But this being said, I think there’s a misconception behind that question: by their nature, most hallucinations are very obvious, making up a case name or a false quote is not, cannot be a human mistake (we do have some examples of pre-LLM fraudulent fabrications, but that’s another story). Even when it comes to misrepresented cases, the misrepresentation is typically evident: this is not your typical lawyer fudging the law or stretching a precedent. As such, there is no need to second guess here. [Edit Feb 2026: I should add an exception here, in cases where the alleged hallucination comes from a judge: absent an appeal court decision or an official retractation, I am bound to make a judgment call.]
Why so many entries come from the USA ? ‘Murica is blessed - and I mean it - with an excellent legal data ecosystem. PACER, Courtlistener, and other data providers are a godsend. In many countries, especially European countries, legal data - though supposedly public - is hoarded by legal editors, or subject to many artificial frictions (don’t get me started on anonymisation) that prevent easy access for researchers. Yet, at the same time, I don’t think it’s an anomaly to have the USA first in the list here: the rate of adoption of AI is typically higher, and it’s a very litigious society with many avenues for self-represented litigants to participate. I also suspect US judges are also more likely to call out bad behaviour from counsel or parties ; civil law judges would likely prefer to ignore the matter entirely.
Any national peculiarities ? You see different styles of dealing with hallucinations, and different actors involved. I am rather fond of the Australian practice (also adopted by some US judges) of not reproducing the hallucinated citations - a good prophylactic move against epistemic pollution.
Any other trends in the data ? Maybe not trends, but one rather evident divisio stands between pure mistakes (which should eventually go away as awareness of AI tools expands, but that’s not a given), the vast majority of cases, and the (substantial) minority of records that involve, for lack of a better word, “bad” actors. By this I mean either vexatious litigants, who got even more empowered by AI, and sloppy lawyers, who were reckless and incompetent to begin with. If you filter the database by monetary and professional sanctions in particular, you’ll often find that these are cases where the hallucination is just the tip of the iceberg: people are rarely sanctioned only because they erred in using an AI tool but, when caught out, refused to own up to it, double-downed, made up stories, or blamed the intern. AI hallucinations are shedding light on this entire side of the litigation world.
When do you intend to stop ? Unclear. I currently have a rather efficient pipeline to process new entries - which in fact involve the use of AI, though I am careful to check it does not hallucinate. Still, that’s a few hours/week that I might want to free eventually.
What’s the point of the database ultimately ? Intrisic value: practitioners use it to find out what cases are relevant in their own jurisdiction. I also know people conduct data analyses over it, which is wonderful (great exemple here). And then all these hallucinated cases can serve as benchmarks to fix the issue.
Because you expect this to be fixed ? It’s a complicated question, but likely not at the model stage, no - I don’t really buy the advances in terms of reduced hallucinations for newer models, or at least I don’t think we can ever reduce it to zero given the existing paradigm. And even if the best models are better in this respect, many people will rely on cheap models that remain terrible. Yet, I am certain tools will help to better check outputs - I am marketing one such tool, PelAIkan, with the idea that it will be incumbent on producers and recipients of legal outputs to check them (incentives are there on both sides), so that hallucinations can be caught before they enter (and rot) the legal domain.
Anything else ? In academic writings and corporate presentations, I have made the point that hallucinations are fascinating for that they tell us about the theory of the law (the chain of authorities we always relied on) and its practice (the time-worn habit of copying and pasting strings of citations without checking them). For years, we (me included) have cited without reading; now the costs of that habit have become explicit. In other words, hallucinations expose the epistemic hygiene the legal profession has long lacked, and that is precisely why they deserve to be studied.

Of course, if you have any further questions, feel free to contact me.

Artificial Authority

Discussion about this post

Ready for more?