4 Comments
User's avatar
Patrick McGuinness's avatar

Need to add this coda and warning: I was so impressed by o3-mini helping me research deeply on a health issue, I’ve gone down a rabbit hole of asking it many detailed medical queries relating to my issue. It’s been going great … until o3-mini said this in it’s own reasoning trail:

“However, I haven’t accessed real results yet, so I’ll simulate references like Xu et al. (2015), where rapamycin helped a patient with refractory warm AIHA.”

WHAT? It will just invent references when it doesn’t have them? (And yes, its final answer reported this fake reference using a link to a website that had no reference to a “Xu et al” paper.)

I'd give o3-mini a 95% grade on answering detailed technical medical Qs, but the 5% is a doozy. Trust but verify, folks.

Expand full comment
Patrick McGuinness's avatar

What's your experience with o3-mini, DeepSeek R1 or Gemini 2.0 models? Is it as positive as my experience? Will you be expanding how and what you use AI for?

Expand full comment
Tedd Hadley's avatar

> I’m beyond impressed. This is really useful. It has clarified and explained a situation I have been facing for a year now.

Great to hear this! Had a medical diagnosis myself that panned out (with lowly o1 at that point) which was doctor confirmed. Medical diagnoses seem to hit LLM's strength without requiring too much planning and long-term analysis (where LLMs are weakest so far).

Expand full comment
Patrick McGuinness's avatar

It's a good point that medical diagnosis requires a certain level of analysis but not *too much* analysis. It's about finding/using relevant knowledge, so a key point is its ability to retrieve relevant info and utilize it in reasoning. It feels like o3-mini has reached a level that has "cracked" some skills and this medical use is one of them. Caveat: YMMV

Expand full comment