New model outperforms popular AI in women's health

A new scientific paper from Aneira Health finds that its women’s-health-specialist large language model (LLM), improves on the performance of popular AI models using context from a large volume of expert-curated clinical documents.

The enhanced model improves on the performance of both generalist and medical AI systems, including GPT o1-preview, GPT-o3-mini-high, DeepSeek, LLaMA, and Claude on advanced clinical reasoning tasks related to women’s health.  

This breakthrough matters as it shows that AI systems are advancing toward becoming viable complementary decision-support tools in the challenging clinical domain of women’s health. Women spend 25% more of their lives in ill health than men, with conditions like endometriosis and PCOS taking years to diagnose and treat. AI, combined with deep data-driven insights could support clinicians in greatly reducing diagnosis time and reducing years of misdiagnosis and costly interventions, while accelerating treatment interventions.  

Researchers evaluated state-of-the-art LLMs on 2,337 questions about obstetrics and gynaecology, including 1,392 from the Royal College of Obstetricians and Gynaecologists (RCOG) qualifying exam. These tasks consider decision-making, reasoning, and patient management. While earlier benchmarks showed LLMs could only pass entry-level medical exams and struggled with advanced reasoning, this study marks a breakthrough:  the latest reasoning LLMs score above the average pass rate (~60-65%) for the RCOG entrance exam, with Aneira’s optimisation improving the performance of other models, indicating proficiency in real-world clinical reasoning and application.  

Aneira’s model incorporates clinically-authored women’s health knowledge. It achieves 72% accuracy, outperforming the mainstream model’s 70%: a meaningful gain in a clinical setting. Surprisingly, larger general-purpose models eclipsed specialised models such as Google’s Med LM and Med-Llama (despite its medical training), likely due to the limitations of smaller, domain-only datasets versus the broader reach of other models.  

“Most AI systems today aren’t built to reflect women’s healthcare needs,” says Dr. Cecilia Lindgren, Co-Founder and Chief Scientific Officer at Aneira Health. “By grounding our model in expert, evidence-based women’s health research, we’ve trained a model to outperform generalist systems and help improve the quality of care that women receive, supporting clinicians as diagnostic companions. These findings suggest that advanced AI can have tangible results on the current women’s health crisis.”  

The findings arrive just after the NHS 10-Year Plan has been unveiled, highlighting how advanced technology can inform women’s health policy. The study raises a critical insight: with appropriate safeguards and human oversight, the potential of domain-specific LLMs to be safe and clinically valuable towards closing the gender care gap is increasingly promising.

For more startup news, check out the other articles on the website, and subscribe to the magazine for free. Listen to The Cereal Entrepreneur podcast for more interviews with entrepreneurs and big-hitters in the startup ecosystem.