The Hallucinating AI Assistant

Would you want someone working for you who regularly hallucinates on the job? If I ran an ad agency, I might want such a person. Maybe he could dream up a series of ads in which a guy and an emu run around telling people about insurance.

But if you are lawyer, do you want an associate or paralegal who can’t tell reality from fantasy?

Imagine this conversation:

LAWYER 1: I’ve just hired the most fabulous paralegal. He’s the cheapest hire I’ve ever had, never takes breaks, and does his work ten times faster than anyone else I’ve ever employed. OK, he makes up eight percent of whatever he writes and I can’t tell what’s real and what isn’t, but he’s so fast! And cheap!

LAWYER 2: I’ve got the same small problem with my new paralegal, but he only makes up one percent of whatever he writes. The future is bright!

If a lawyer ever consistently made stuff up at this rate, he would be disbarred. But not our AI paralegals.

I continue to be flabbergasted by the way lawyers shrug off what’s known as the hallucination problem in artificial intelligence. Just the other day, I watched an entire continuing legal education program about artificial intelligence, and the issue of making stuff up was mentioned for a minute and then never revisited. Before they left the topic, though, they introduced me to the Hallucination Leaderboard (its real name), which reports that the best generative AI program is down to (just!) 0.7 percent of its findings made up out of thin air.

But many in the sample are in the six to eight percent invented range and one is at 29 percent. Another problem with the leaderboard is that if you look at the sample size it’s extremely small. Most of the time, you are getting made-up material in a summary of less than 100 words.

Can you imagine what would happen if you needed to summarize a whole docket? An entire series of depositions? What happens if AI finds the perfect case for you to cite? You still have to make sure it’s a real case, but if it is, do you still have to read it to make sure AI summarized it correctly?

Don’t put me in the basket of cranky boomers who say AI will never replace them. It’s enough to be able to say, “Not yet, and I’ll let you know when I start to sweat about having a job to do.”