
Aug 29, 2024 I was a panelist at Princeton Language and Intelligence’s Workshop on Useful and Reliable Agents, discussing our experience maintaining the LM Evaluation Harness and considerations for evaluating LM agents.
Jul 22, 2024 I gave an ICML 2024 tutorial with Lintang Sutawika on “Challenges in LM Evaluation”! For ICML attendees, the recording can be found on the ICML website and the slides are uploaded here. Thank you to all who attended!
Jun 22, 2024 I gave a talk on “Lessons Learned on Effective and Reproducible Evaluations of LLMs” at Cohere For AI’s NLP community group. Thanks for having me!
Jun 11, 2024 I gave a talk on “A Deep Dive on LM Evaluation” for Maven and Parlance Labs’ LLM Fine-Tuning Conference. Thanks to all who attended. Slides can be found here.
Jun 06, 2024 New preprint released: “Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?”