"I cannot and would not deny that tools, be they legal-tech specific or off-the-shelf models, are much better now than they used to be, including with respect to hallucinated legal material." If a flawed architecture conveys greater confidence, is it better or more dangerous? A predictive text generator as a basis for legal research is akin to building a house framed with plastic drinking straws. The homebuilder can save money on wood and claim that by erecting a crane next to the home that is able to support its full weight, it is a good technological choice. The Stanford study you cited was carefully constructed measurement that was completely irrelevant based on its own section 6.2... "These vulnerabilities remain problematic for AI adoption in a profession that requires precision, clarity, and fidelity."
It's indeed a key question, and the confession I cite in introduction is very relevant in this respect, because the attorney basically stated that he got tired of checking accuracy after all his checks proved right - until they did not.
Still, with the proper harnesses and lower baserates, there is hope hallucinations can simply become manageable.
Why hope? Is it a foregone conclusion that we must make the best of whatever big tech gives us? This is quite the opposite of "Liberté, égalité, fraternité," no? And my question is not to you personally, it's rhetorical. Thank you for your writing and leadership Damien.
I think the larger point is that almost everyone affiliated with LLM AI has a horrible track record of over-promising and underdelivering- And people who point that out, and discuss the many technical limitations of llms, get labeled as luddites and dismissed. At this point, no public claim made by a frontier lab can be taken at face value as a realistic assessment of capabilities. Hacking benchmarks is built into their business model, whether that product is aimed at a legal services, customer or just a general business.
"I cannot and would not deny that tools, be they legal-tech specific or off-the-shelf models, are much better now than they used to be, including with respect to hallucinated legal material." If a flawed architecture conveys greater confidence, is it better or more dangerous? A predictive text generator as a basis for legal research is akin to building a house framed with plastic drinking straws. The homebuilder can save money on wood and claim that by erecting a crane next to the home that is able to support its full weight, it is a good technological choice. The Stanford study you cited was carefully constructed measurement that was completely irrelevant based on its own section 6.2... "These vulnerabilities remain problematic for AI adoption in a profession that requires precision, clarity, and fidelity."
It's indeed a key question, and the confession I cite in introduction is very relevant in this respect, because the attorney basically stated that he got tired of checking accuracy after all his checks proved right - until they did not.
Still, with the proper harnesses and lower baserates, there is hope hallucinations can simply become manageable.
Why hope? Is it a foregone conclusion that we must make the best of whatever big tech gives us? This is quite the opposite of "Liberté, égalité, fraternité," no? And my question is not to you personally, it's rhetorical. Thank you for your writing and leadership Damien.
I think the larger point is that almost everyone affiliated with LLM AI has a horrible track record of over-promising and underdelivering- And people who point that out, and discuss the many technical limitations of llms, get labeled as luddites and dismissed. At this point, no public claim made by a frontier lab can be taken at face value as a realistic assessment of capabilities. Hacking benchmarks is built into their business model, whether that product is aimed at a legal services, customer or just a general business.