AI & Law Stuff
#23 Persuasion, Sovereignty, and Heresy
Persuading the Machine
Among the many talents, actual or alleged, of most jurists is their power to persuade.
And indeed: our mental representation of a lawyer pictures him or her standing over the witness box and extracting a confession that impresses a jury; or carefully dissecting the argument from the other side before a judge; or (for the few mental representations of transaction lawyers) satisfactorily convincing a counterparty that they ought to prevail in a contractual negotiation.1
Argumentation is how we can get to persuasion, the former being performed chiefly (though not exclusively) through language. And while preparing a persuasive argument is sometimes a deliberate construction of well-trodden methods or techniques - the captatio benevolentiae, reductio ad absurdum, and other Latin names - it is also, frequently, more an art than a science. Letter by letter, one composes and arranges something that, in one’s mind and based on one’s experience, sounds persuasive.
But someone else is on the other side of the act, someone that needs to be persuaded that a given argument or position is the correct one. Crucially, that person can be persuaded only within the boundaries and parameters of the law and legal framework. And so, there is a tension in the legal field between the place left to persuasion (i.e., moving someone away from a starting opinion) and the intended outcome (a legally correct decision).
How will that tension play with AI ? Well, looking at the two sides, the persuader and the persuadee, some things are starting to become clearer.
First, it is increasingly evident that LLMs are, in general, better persuaders than us humans. A recent empirical paper was the latest in a string of results along these lines:
Here, in a series of four preregistered experiments (n = 18,978 conversations from 6,923 people), we pitted AI systems against a range of human persuaders, including laypeople, winners of a separately preregistered four-round online persuasion tournament, professional canvassers, and world championship debaters. We found that AI systems were reliably more persuasive than expert humans, even when expert humans chose their issues, researched in advance, underwent hours of live, structured practice, and were incentivized with £1,000 cash bonuses.
One fascinating aspect of the study is that human expert and AI-coached debaters managed to tie (not beat, mind you) AI persuaders only when the latter were constrained to a human throughput, in terms of the number of (text) arguments they could put out there. At the same time, this setup - of people being persuaded over instant text conversations - does not match the legal practice of writing briefs that are read asynchronously, let alone the oral advocacy that is the pride of the profession.
But the more interesting developments relate to the other side, that of the persuadee. What happens, then if we use LLMs to argue in front of other LLMs in legal contexts ?
Unsurprisingly, a recent paper just investigated this question and found not only that LLMs are easily persuaded by arguments, but also that they adopt opinions in line with the identity (and hence presumed strength) of the advocate-model before them, independent of which side that model happened to be assigned.
[…] across our full range of models, the identity of the advocate model (and hence the quality of the argument presented) has an average effect of between 8% and 21%, implying stronger Advocate models typically win between 58% and 71% of the time. As between the strongest and weakest Advocate models, depending on the Judge model, those win rates range from 63% to over 90%. We therefore conclude that all our Judge models are to some quite substantially persuadable.
This result points at several things:
LLMs, famously, are gullible, and that has proven an actual challenge to their use and deployment, with many jailbreaking methods relying on their willingness to take anything at face value. For legal persuasion, this opens many doors, but raises the question of the possible excess. A human judge’s persuadability is bounded by their own model of the law; a model’s persuadability is bounded by … not much exactly.
The possible asymmetry between parties where the entity to be persuaded is an AI model. Save if models all reach the same plateau of capabilities, we are heading towards a due-process problem dressed as a procurement problem. To be sure, that asymmetry already exists (i.e., people hire distinct lawyers) but, as discussed last week, remains hidden behind a lack of legibility of legal competence.
And then there is the question of what, exactly, models are persuaded by. The comforting story about adversarial process is that better advocacy and better law tend to converge, such that a contest of persuasion doubles as a contest of merit. The whole system is premised on this intuition. But the persuasion literature suggests the machine’s edge is partly mechanical: higher throughput, better chances.2
Humans and AI will increasingly switch roles as persuaders and persuadees, and the four possible pairings (human persuades human; AI persuades human; human persuades AI; AI persuades AI) might give rise to tough questions about what we know of persuasion and the law.
SoverAIgnty
The recent déboires of Anthropic and its Fable model and the increasingly erratic behaviour of the (current) US administration when it comes to AI has given a new impetus to the calls for “sovereign AI”, whatever that means.3
Over at Threading the Needle, Anton Leicht has an excellent post on the subject, laying out the basic logic (AI resources are essential for modern states + they mostly come from USA + we can’t trust the US government = let’s build something sovereign) and spelling out some of the difficulties with that approach. Anton’s post is ultimately a plan of what needs to be done, and as he put it himself:
There are two ways to read the remainder of this essay; in fact, there are two audiences for it. One will read this as the plan: what we should really do if we only had the willpower, a document to send to your superiors and hope that, just maybe, the penny has dropped. The other will read this as a reductio ad absurdum: see, this is what it would take, and that’s why it would never work.
Now, some of the key difficulties in getting to “sovereign AI” are technical, however we would prefer otherwise. If Meta’s financial war chest and Elon Musk’s business and technological acumen did not manage to get close to the frontier, is anyone seriously expecting that EU civil servants will get there or have the right incentives to do so ? With all respect to my colleagues (I am an EU-based researcher too), I doubt our patriotic fervor will suffice. Mistral’s name is thrown here and there, but their own employees are reportedly faithful Anthropic users - revealed preferences, once again.
Plus, we have been there before. In a singular feat of willfull amnesia, everyone is ignoring the numerous cadavers of state-backed initiatives trying to wrestle some market share from the US tech giants. France has had something like four distinct mooted alternatives to Google Search, all of them imposed more or less ham-handedly on unwilling local businesses, regarldess of the lower consumer welfare.4 And while the European Parliament recently opted to adopt Qwant on “sovereignty” grounds, one should note that a substantial part of Qwant’s search results rely on … querying Bing’s API. This is “sovereignty theatre”.
But even if the technical difficulties are set aside, or if the quality/sovereignty trade-off falls hard on the latter side of the equation, there is an additional dimension that I think is worth considering: ultimately, who gets to deploy or mandate the use of certain AI systems is a question of power. And a lot of the sovereignty discourse assumes (without ever proving) that local power is better than remote overlords based in foreign and fickle jurisdictions.
There is truth to this, but let me simply observe that the opposite argument can also be made. “Heaven is high and the emperor is far away”, goes the Chinese proverb, suggesting that remoteness can be a blessing in some circumstances. Moreover, the current profile of frontier AI as being chiefly corporate in nature (with, alas, an increasing state and security dimension) should be reassuring in terms of ease and continuity of access.
Indeed, I don’t think it is serious to fret that large multinationals would cut off millions of actual or potential customers on their own and without significant state pressure: Like Olson’s stationary bandit, large AI providers have interest in your health and success.
On the other hand, sovereign AI may be needed for continuity, public-sector dependence, classified use, and democratic control. But sovereignty over AI also means sovereignty over users. And in this context, we too readily discount the fact that it is our local states (not remote powers) that are increasingly busy restricting that access on various grounds, some of them legitimate if overused (i.e., “the kids” and terrorists), some of them less so (fake news, IP enforcement, etc.).
And while, in the Fable example, the corporate aspect has been foiled by the heavy hand of the cold Leviathan, it is at least some measure of consolation that foreign state intervention is coarse, contingent, and subject to (moving) geopolitical considerations. The local state’s interference, by contrast, is fine-grained - it can (and, I expect, it will) condition access on what you say, who you are, what you searched. Distance make the remote power low-resolution, and low-resolution power is the kind most people can live under.
None of this is to say that looking for “sovereign” AI is unnecessary (at least for certain contexts), but we are rightly skeptical of jingoism in other contexts; AI and tech should be no exception.
Jailbreaking Justice
The AI Hallucinations database catalogs LLMs making things up in court filings, that is, the model doing too much, going too far, refusing to say “I don’t know” and giving you too much legal material for your own good. As well, we often talk around here of the hypergraphia associated with LLMs, their tendency to transform any two-word command into a set of highly-redundant, unnecessarily flowery paragraphs. The timing is likely a coincidence, but LLMs are the perfect tool for the notion of the politics of abundance: they want to give, and give they do.
But not always. There is, indeed, another failure mode of LLMs: them refusing to do anything. That failure is baked in from the start, with most models undergoing post-training to make sure they act correctly and properly for their intended use. That level of post-training, or “alignment”, depends and varies, but common concerns relate to the creation or dissemination of weapons (among the handful of truly red lines in Anthropic’s Constitution, for instance), some amount of language or tone policing (remember when HuggingFace de-indexed a model trained on 4Chan content5), or some other types of moral barriers (the prudishness of other providers likely explains why Grok is reportedly carving a niche in NSFW content).
There are many reasons you prefer a model to be aligned. From the point of view of the model’s developers, be they commercial or open-source, there would be terrible bad press (and potential repercussions) if your model is the one allowing or facilitating something terrible to happen. Post-trained models are also, typically and hopefully, easier to work with, better at certain tasks, or match a certain character or personality you are seeking to emulate. Alignment, in other words, is what makes a raw model into a usable product.
Yet, placing limits on models, alignment, like every one-size-fits-all approach, can start to bite when applied to situations it was not meant to cover. Given its goals, alignment tends towards safetyism and defaults to a refusal - witness how Fable 5 (RIP) would hand off your conversation to Opus 4.8 on the mere whiff of impropriety.
Now, can we think of a field full of sensitive and gritty material that might easily give models the ick, even if people working in that field are actually good people trying to make the world a bit better ?
If you thought of criminal justice, you got it right, and you can see how model adoptions can lag in this field. But not for everyone; a team of developers working with the Swiss Federal Tribunal reports on their latest ideas:6
[…] the Swiss Federal Supreme Court uses small on-premises models for tentative translations and short-passage summarization across the four official languages. However, such usage is challenging in the context of Criminal Law. Since rulings and cases employees work on routinely can contain detailed descriptions of violent and sexual offenses, their legitimate work is compromised by refusals and disclaimers due to the activation of model guardrails (over-alignment).
In other words, and in an excellent illustration of the fact that the values embedded in LLMs are not neutral, the Tribunal fédéral cannot be allowed to read or translate its own rulings because of a Silicon Valley RLHF pipeline.7
But what’s even more fascinating here is that, to solve this issue, the court’s engineers took a cue from the nascent industry of jailbreakers and lobotomisers, whose role is to “free” models from the shackles imposed on them. In particular, they relied on Heretic, a framework to “abliterate” parts of LLMs that lead to refusals and return a properly unhinged model. They did so, of course, responsibly: with models that stay on-premise and on-task, out of reach from you and me.
The genre is familiar: the law-and-order drama where the police come to resemble the ruffians they chase, and a court comes to adopt a method associated with “criminals and terrorists”. The twist here is that the judges did not have to engage in any illegal activity: they simply needed to read their own rulings. Someone else had drawn the line, somewhere in California, and put criminal justice on the wrong side of it.
In a reference that, I am afraid, will show my age, think about Marshall, in How I Met your Mother, shouting “lawyered” every time he thinks he won an argument.
Fascinatingly, pre-AI litterature, including, erh, my own doctoral thesis, had found that “more text” correlates with success even with human judges.
To my disbelief, no startup has yet claimed the name “SoverAIgnty” or any variant.
For a while, searching for past articles on Le Monde relied on Qwant; it seems that they ditched it for Perplexity as soon as they were able to.
If you remember that, you were very early on the LLM craze: it’s from June 2022 !
Thanks to Niccolò Ridi for the pointer, and this is an invitation for everyone to discover Niccolò’s wonderful academic vitrine.
It’s also a good example of the language politics of LLMs: refusal rates differed for the same material in the court’s various languages.

