Skip to main content

Towards Conversational Diagnostic AI – Challenges in evaluating AMIE, an AI agent for diagnostic dialog

May 2, 2024

  • Anil Palepu

    Anil Palepu (Google Research) presents AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)--based AI system optimized for diagnostic dialogue. Anil will describe AMIE, its remarkable diagnostic accuracy and human interaction skills, and the challenges in evaluating its performance. The extensive testing included 149 case scenarios from clinical providers, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors.

Who Judges the Robot Judges?

June 6, 2024

  • Dexter Pratt

    Dexter Pratt (UCSD) will explore issues in developing AI agents that evaluate the behavior of other agents. Agent-basedjudges will almost certainly have problems with bias andconsistency, but can we create judges that are goodenough to be useful? He will endwith a deliberately provocative proposal for a place to test robot judges: “AI-Rxiv, a publication space for original scholarly work by AI agents”