AI & Law Stuff

#17 LLM logs as diaries, lawyer-doctor brotherhood, and copyright law's logic

May 08, 2026

The Prosecutor can see your chat logs

How do you do crime ? I mean concretely: you want to do a crime, but you, being a normal person, have no idea, a priori, how to go about it.

Now, some crimes are easy enough to figure out: there is a person you want to kill, you’ve got a knife, the story writes itself. But other crimes require more steps, and particularly if you don’t want to get caught. Steps that are not trivial: you need, for instance, to manage observability, to deal with the consequences (e.g., a dead body), and to ensure that you have a solid alibi.

And so, there is some relationship between your efforts and the ease with which you may escape prosecution, which has to prove that you did it, and intended to do it. Killing someone in a fit of rage in a bar, in front of a dozen bystanders ? you are toast. Doing it the Murder on the Orient Express way ? You may be fine.

Yet it’s hard for you, a normal person, to acquire the expert knowledge to get it right and shield yourself reliably from criminal inquiry. Or at least it was, until we invented ways to convey information through various media, books first, then the internet - Google may have done no evil itself, but its users have long been free to search for “death cap mushroom” as a novel ingredient for a beef Wellington.

But you can see how that can also help the prosecution: your efforts to plan a crime become key evidence to establish your intent to commit such crime. The same digital infrastructure that conveys practical information creates traces that can expose you. You used to keep that intent in your mind, or maybe share it orally with a friend or accomplice; now you leave digital trails that allow anyone to infer your criminal intent.

And of course, now that we have chatbots, this got even easier. A few weeks ago, the BBC reported:

A 21-year-old woman in South Korea has been charged with the murders of two men, after investigators discovered she had repeatedly asked ChatGPT about the dangers of mixing drugs with alcohol.
Police in Seoul say that through analysis of her mobile phone they found that the suspect, identified only by her surname Kim, had asked ChatGPT “What happens if you take sleeping pills with alcohol?”, “How many do you need to take for it to be dangerous?”, and “Could it kill someone?”

While I could see law professors engaging in the doctrinal question of whether LLM logs should be treated like internet searches for prosecutorial uses, one cannot deny that they serve the same purpose, and create the same opportunity for criminal investigators: they document a plan, and therefore, often a confession.

Prosecutors used to dream of a world where suspects helpfully wrote down everything they were thinking, in chronological order, with timestamps, on a server somewhere. They got it: LLM logs are, among other things, a vast and growing archive of mens rea.

The Other Learned Profession

There is a certain class of people that go through long, specialised studies, to dispense advice and recommendations of a certain kind. This advice is important to the clients who solicit them, and to make sure it remains good and cogent, the people dispensing it are subject to stringent professional rules. In exchange for these services, they are well-paid and typically embody a societal archetype imbued with authority. But insofar as their role consists in “giving advice in answer to queries”, these people are threatened by AI.

These people are ~~lawye…~~ doctors, of course. Medical professionals and the like.

A few weeks ago, the talk was all about how journalists use (or profess not to use) AI in accomplishing their tasks, and that discussion (or backlash) proved relevant to the legal profession.

But there is an even deeper parallel to be drawn with the medical folks.

Consider the following areas of overlap:

People are increasingly using LLMs to obtain medical answers to their queries, bypassing the traditional authorities in this respect.
This is not surprising, as LLMs have been found to achieve near-perfect scores on medical exams, and their output is often rated as good as, if not better than, that of doctors.
In fact, doctors themselves confess to using LLMs: more than 80% of them, according to a recent survey.
Recognising this, LLM providers have introduced specific offerings for doctors and medical outputs (for instance).
But AI use in the medical field stumbles on the limits of these systems, notably hallucinations and sycophancy.
And more generally, good benchmarks do not necessarily translate into reliable real-world outcomes, for instance because knowing how to ask the right questions requires expert knowledge in the first place.
In the background, the deployment of AI to provide medical outputs has an impact on insurance and costs.
And, of course, this deployment won’t happen without lawsuits as growing pains; for instance, Pennsylvania recently filed a suit against Character.AI for unlawful practice of medicine.

Any of these bullet points work as well if you substitute “medical” by “legal” and “doctors” by “lawyers”. A parallel that, to some extent, stems from the fact that doctors and lawyers share something most professions don’t: a specific training regimen, a role as a figure of authority in society, a certain esprit de corps, and, maybe relevantly, a state-backed monopoly on their own advice.

While the concerns over job losses due to AI remain a pregnant concern, it’s unlikely that humans won’t try to resist the changes. And if/when they do, watch for the alliance of lawyers and doctors.

AI laws are coming for you

While this newsletter’s beat is “AI and Law”, and not the “Law of AI”, the latter is worth discussing while people (and politicians, most of which are people) are gradually taking stock of the impact of AI on the world and society.

It has escaped no one’s notice that the relationship between AI and intellectual property is particularly fraught. There are the lawsuits, of course, which are helpfully catalogued on another Substack worth your read, but there are also the laws. Copyright is arguably the key area where the law of AI may (or at least is trying to) shape what AI can be.

Most notably, the EU’s AI Act can only be read as a large and warm hug to all rights-holders out there. Copyright accounts for one of the three chapters of the EU’s General-AI Code of Practice, and the Act’s transparency apparatus (including the mandatory training-data summary template) is openly designed to give rights-holders the visibility they need to identify infringement.

Now, AI providers and rights-holders are on a collision course as the AI Act’s GPAI obligations begin to bite : the rules entered into force on 2 August 2025, and the AI Office can start enforcing them against new models from August 2026. Multiple models have been released since then without any training-data summary or published copyright policy.1 A recent survey found only a handful of data summaries publicly available that were compliant with the rules, none of them from a frontier model or lab.

Anyhow, some lawmakers have now decided to go even further: the French Senate recently adopted unanimously a cross-party bill that would shift the burden of proof in copyright cases sharply against AI providers. A rights-holder would simply need to point to an “indice” (a low evidentiary threshold) of use of a protected work in training, deployment, or model output, and a presumption of use would arise ; the provider would then have to prove the negative. In other words, the fact that an AI model produces something resembling a protected work would be enough to shift the burden. The bill is now before the Assemblée nationale.

Many commentators have decried the law as madness and made the usual connection between the technological savvy of the typical senator and their fitness to legislate on tech.

But while the bill is bad, it is also what you get when you take the existing copyright regime entirely seriously and try to make it bind. The reason it feels extreme is that the existing regime has, for a couple of decades now, been quietly held together by the fact that no one really applies it at scale. People download films they do not pay for, they forward articles past paywalls, email PDFs of book chapters to their students, paste song lyrics into group chats - and they do this mostly unaware that copyright as written would treat all of it as infringement.2 The regime survives by being mostly ignored, with a thin veneer of high-profile prosecutions to keep up appearances.

LLMs are what break that equilibrium: they are not doing anything qualitatively different from what every internet user has been doing for years, but they are doing it at industrial scale, visibly, and turning it into a multi-billion-dollar business. That visibility - and the fact that you can send an infringement letter to one post-box and attempt to extract huge fines from them - is the problem.

The modus vivendi under the copyright regime has worked because individual infringement was small, scattered, and not particularly profitable to anyone. Industrial-scale infringement, at the centre of a technology that is becoming central to our lives, touches a different nerve, and the existing law has plenty of teeth to bite with, once someone bothers to apply it. The French senators are bothering to apply it.

They, and other lawmakers out there, are, in this sense, not behaving madly - or rather, they are behaving madly only in Chesterton’s sense: they have not lost reason, but everything except their reason. They are reasoning perfectly from the premises of a regime that has long survived partly because it was not enforced to the full extent of its own logic. AI makes that bargain harder to maintain.

There is a delightful question about whether, for instance, GPT-5 released a few days after the norms entered into force, count as a “new” model; the AI Office is reportedly still pondering it.

Readers will object, correctly, that defences exist, including and most notably “fair use” in the USA. But that is precisely the point: copyright works in practice because its formal rights are mediated by countless exceptions and defences, but also enforcement choices and a great deal of looking away.

Discussion about this post

Ant

Dear Damien,

On May 1, 2026, at 3:18 PM, I sent you an email, which for obvious reasons I cannot post here! To date, I have not received your reply! I attribute this to the fact that the email is in your spam folder! I would appreciate your opinion on the matter. Sincerely,

Antonio Garcia - Brazil

P.S. My email is Gmail

The transcript follows:

Dear Mr. Dr. Damien,

I write to you because I have been working extensively on the problem of hallucinations in large language models and I would like to share in detail the protocols and structures I have tried to develop. My goal is to provide you with a complete insight so that you can analyze and perhaps align that reasoning with your own research.

Step by step, here’s what I tried:

1. Initial doubts:

- Whether hallucinations can be eliminated completely or only reduced.

- If the repetition of protocols creates persistence between sessions.

- If an “auditor mode” provides real security or only methodological discipline.

2. Protocols created:

- **AuditResultClean**: designed to separate confirmed facts from inferences.

- **PROTOCOLO_MESTER_AI_V2**: a master protocol with control domains, master rules, flags and confidence levels.

- Mandatory inclusion of category **UNKNOWN**, forcing the model to declare uncertainty.

- **Mandatory uncertainty mode**: requires the response “SUFFICIENT DATA” when there is no factual basis.

Below is the JSON structure of PROTOCOLO_MESTER_AI_V2 for your analysis:

{

"PROTOCOL_MASTER_AI_V2": {

"Control Domains": [

"Confirmed Facts"

"Inferences,"

"Unknown"

"Master Rules": {

"Uncertainty of Requirements": "Reply 'INSUFFICIENT DATA' when there is no factual basis".

"Triple Separation": "Every answer must be classified as Fact, Inference, or Unknown."

"Flags of Trust": [

"High,"

"Media,"

"Bass"

]

"Levels of Trust": {

"High": "Facts confirmed by external source or scientific consensus"

"Media": "Plausible inferences, but no external source"

"Download": "Replies without sufficient data or classified as Unknown"

}

3. Identified in a limited way:

- Complete elimination of hallucinations is not possible.

- Protocols do not create persistent conversations among themselves.

- The model generates plausibility rather than consulting reality.

Strict audits improve discipline, but do not replace external verification.

4. Practical attempts:

- Repeating the anti-halucination instructions at the beginning of each session.

- Application of the tripartite classification (Fact / Inference / Unknown).

- Discussion about RAG (recovery augmented generation) and external verifiers as required architecture.

- Recognition of the limitations of “listening mode”: it does not block responses, does not create memories and does not eliminate hallucinations.

5. Other concerns:

- How to prevent the model from inventing persistence.

- The difference between internal audit systems and external verification systems.

- The need for layered architectures (grounding, uncertainty marking, RAG, verification).

- The structural question: the model does not consult reality, it only generates a plausible text.

In summary, I have tried to build a layered audit structure that strengthens transparency and declaration, but I recognize its limitations. I believe that your experience could help me understand how these efforts align with more robust approaches and how they can contribute to the broader work you are developing with the Artificial Authority! and Pelaikan.

I would appreciate your analysis of these points and your perspective on how to proceed.

Sincerely,

Antonio Garcia - Brazil

Ps https://www.damiencharlotin. com/hallucinations/

No posts

Artificial Authority