ChatGPT not that great at bar exam after all

When GPT-4 was released, one of the hype lines was that it passed the bar exam at the 90th percentile. "GPT-4 Passes the Bar Exam: What That Means for Artificial Intelligence Tools in the Legal Profession," explainered Stanford Law School. The claim is in the first paragraph of OpenAI's announcement.

Efforts have been made to replicate the claim and the results aren't so impressive.

Perhaps the most widely touted of GPT-4's at-launch, zero-shot capabilities has been its reported 90th-percentile performance on the Uniform Bar Exam. …

examining official NCBE data and using several conservative statistical assumptions, GPT-4's performance against first-time test takers is estimated to be 62nd percentile, including 42nd percentile on essays. Fourth, when examining only those who passed the exam (i.e. licensed or license-pending attorneys), GPT-4's performance is estimated to drop to 48th percentile overall, and 15th percentile on essays.

15th percentile! It's not clear what happened, but maybe a too-obvious first guess is that the original finding involved ChatGPT's "bar exam" already being in its training data. You might call it… *swipes on sunglasses* … Cold Diffusion.