SIR 2025
General IR
Scientific Session
Harrison S. Blume (he/him/his)
Medical Student
Albert Einstein College of Medicine, United States
Darnell K. Adrian Williams, Jr., BS
MD/PhD Student
Albert Einstein College of Medicine, United States
Arvind Dev
Medical Student
Albert Einstein College of Medicine, United States
Christine Yoon (she/her/hers)
Medical Student
Albert Einstein College of Medicine, United States
Jacob Cynamon, MD, FACR, FSIR
Attending
Montefiore Medical Center, United States
Kapil Wattamwar, MD
Interventional Radiology Fellow
Montefiore Medical Center, United States
Preliminary findings indicate that all three LLMs can generate readable and generally accurate responses to common patient questions about IR treatments for HCC. It is expected that:
• ChatGPT 4o Mini may score higher in readability and compassion due to its conversational design.
• Google Gemini might excel in accuracy and comprehensiveness given its integration with up-to-date medical information.
• Microsoft Copilot could perform consistently across all domains but may show variability depending on the specificity of the question.
Statistical analysis did not reveal a significant difference in performance across the LLMs within domains (p > 0.05).
Conclusion: While LLMs show considerable promise in addressing patient concerns about IR treatments for HCC, they are not yet ready to be used independently. The variability in accuracy, comprehensiveness, and compassion among the models underscore the need for professional oversight. These tools could serve as valuable adjuncts to expert guidance, and further studies incorporating patient feedback are necessary to refine these models and assess their impact on patient understanding and satisfaction.