ISG Talks are sponsored by Couchbase.

<< All Talks

Loading Events

« All Events

  • This event has passed.

Michal Shmueli-Scheuer (IBM Research): GenAI Benchmarking and Evaluation

July 16 @ 11:00 am - 12:00 pm

Title: GenAI Benchmarking and Evaluation
Wednesday (July 16), 11 am – noon, DBH 6011, UC Irvine

Abstract:
In this seminar, I will present a practical perspective on benchmarking in the era of generative AI. We will begin with an overview of goals and motivations, followed by key definitions such as benchmarks and leaderboards. I’ll review prominent existing benchmarks and highlight several core challenges that limit their effectiveness in evaluating today’s models. In response, I’ll introduce a new methodology for benchmarking that focuses on flexibility, reuse, and deeper insights. This includes a closer look at ToRR, a new table reasoning benchmark, and Unitxt, an open-source tool designed to simplify the creation and management of benchmarks—accompanied by a brief demo. Finally, I will outline emerging directions in agentic evaluation, pointing to what’s next in assessing increasingly complex AI systems.

Bio:
Michal Shmueli-Scheuer is a Distinguished Engineer for AI Benchmarking and Evaluation at IBM Research. Her expertise spans Natural Language Generation (NLG) and Natural Language Processing (NLP), with a focus on evaluation, data-to-text generation, conversational bots, scientific document summarization, and affective computing.

She has authored over 40 academic papers in top NLP and AI conferences and journals. Michal is a co-organizer of the GEM-squared workshop on evaluation (ACL) and recently delivered the “Navigating the Modern Evaluation Landscape” tutorial at LREC-COLING. She will also deliver a tutorial on “Evaluating LLM-based Agents: Foundations, Best Practices, and Open Challenges” at IJCAI.

Actively involved in the NLP community, she has organized multiple workshops and shared tasks. Michal earned her Ph.D. in Computer Science from the University of California, Irvine, in 2009.

Details

Date:
July 16
Time:
11:00 am - 12:00 pm

Venue

DBH 6011