Satya Nadella wants AI to be your next doctor.
The Microsoft CEO announced two Healthcare AI this week this week on social media, including Mai-Dxo, a system that Simulates several virtual doctors Working together to resolve medical mysteries.
When testing against 304 complex cases of the New England Journal of Medicine, Microsoft reported That the AI has correctly diagnosed 85.5% of them. A group of 21 experienced doctors who tackle the same cases? They have 20% good.
“Halled to share two progress that bring us closer to real impact in health care AI,” Nadella wrote. “Mai-Dxo is a model-agent orchestrator that simulates a panel of virtual doctors. It reaches 85.5% diagnostic accuracy-four times that of experienced doctors while reducing diagnostic costs.”
Excited to share two progress that bring us closer to real impact in health care AI:
SDBENCH introduces a new benchmark that transforms 304 NEJM cases into interactive diagnostic simulations. AI must ask questions, order tests and weigh costs, the complexity of … reflecting … pic.twitter.com/lasc4hk730
– Satya Nadella (@Satyanadella) June 30, 2025
The announcement comes when Microsoft races at a busy field of technology companies to apply AI to the thorny problems of health care.
With Americans who almost spend $ 5 trillion per year About healthcare and diagnostic errors that affect 12 million people Every year, according to Johns Hopkins University– the idea to use AI to tackle Human-related issues seem like a no-brainer.
How the medical council of Microsoft works
Mai-Dxo works as a medical dream team that is stuck in a computer. The system tackles things through what Microsoft calls the sequential diagnosis benchmark or SDBench.
Instead of multiple-choice questions such as traditional medical AI tests, it reflects how doctors actually work: starting with limited information about a patient, asking follow-up questions, order tests and adjusting theories when new data arrive.
Each test incures virtual money, forcing the AI to balance thoroughness against health care expenditure.
In other words, it actually simulates a medical council about a case, with different models that play different roles. The models debate, disagree and eventually reach a consensus, just as your doctors would do if you were a challenging case to study.
In one configuration, Mai-Dxo achieved an accuracy of 80%, while spending $ 2,397 per case, about 20% less than the $ 2,963 that doctors usually spend.
With peak performance, it achieved 85.5% accuracy for an amount of $ 7,184 per case. For comparison: the independent O3 model of OpenAI achieved 78.6% accuracy but cost $ 7,850.
The virtual doctor -panel contains Dr. Hypothesis, which keeps a walking list of the three most likely diagnoses with the help of Bayesian probability methods.
Dr. Test-Chooser selects a maximum of three diagnostic tests per round, aimed at maximum information profit.
Dr. Challenger acts as the opposite, in search of evidence that the prevailing theory contradicts. Dr. Stewardship Vetoes expensive tests with a low diagnostic value.
Meanwhile, Dr. Checklist ensures that all test names are valid and the team’s reasoning remains consistent.

Microsoft tested the system on things published in the New England Journal of Medicine between 2024 and 2025, after the training date of the AI, so that every possibility had remember the model.
The studies were difficult cases that required a thorough investigation to be diagnosed correctly.
The 21 doctors who recruited Microsoft for comparison had between 5 and 20 years of experience, with a 12 -year -old median.
They worked without access to colleagues, textbooks or AI assistance to guarantee a fair comparison of rough diagnostic capacity. They reported a success rate of 20% on these difficult cases.
The system works in different modes. “Direct answer” offers a diagnosis that is only based on initial information for $ 300 – the costs of one art visit.
“Only question” allows follow -up questions without ordering tests. “Budgeted” follows the costs with a maximum spending limit. “No budget” gives the panel for free rein, while “ensemble” performs multiple panels and collects their conclusions for maximum accuracy.
The future of medicine?
Mai-Dxo represents Microsoft’s wider push into consumer health AI.
The company reports more than 50 million health-related sessions in its Bing and Copilot products every day. From knee pain research to looking for urgent care, Microsoft sees search engines and AI assistants become the new front door for health care.
This is of course only one step in a very long timeline of medical technology.
For the context, Stanford’s mycin system diagnosed bacterial infections in the 1970s, and Google’s Amie simulated Doctor-Patient conversations last year.
Microsoft developed Mai-Dxo as a model-agent system, which means it can work with AI models from different companies.
When testing it increased the performance between models of OpenAi, Google, Anthropic, Meta and others with an average of 11%. The improvement was statistically significant for all tested models.
Dr. Dominic King and Harsha Nori, who led the research at Microsoft AI, emphasized in a blog post that technology remains a research demonstration.
“Important challenges continue to exist before generative AI can be used safely and in a responsible manner in health care,” they wrote. The system excels in complex diagnostic challenges, but must be tested for routine cases.
Microsoft plans to submit the research for Peer Review and works together with care organizations to validate the approach in clinical environments.
The company has made it clear that every implementation “rigorous safety tests, clinical validation and regulatory assessments would require.”
For now, Mai-Dxo is limited to research laboratories. But with diagnostic errors that almost contribute 10% of the deaths by patients And millions every year they find the virtual physician panel from Microsoft a new step in the direction of AI-Assisted Healthcare.
The AI team with five doctor can diagnose better than 21 human doctors together, but it is still too early to see a mainstream implementation.
Microsoft says that AI doctors will not replace; It will enlarge them. The 21 doctors who scored 20% on those brutal Nejm cases hope that that is true.
Published by Sebastian Sinclair and Josh Quitittner
Generally intelligent Newsletter
A weekly AI trip told by Gen, a generative AI model.