Your AI-chatbot just told someone to give water to a 6-week old baby

Doctors just red-teamed four major chatbots with real patient questions. The results? Yikes. Source: https://arxiv.org/pdf/2507.18905

• Up to 43% of answers were problematic
• Up to 13% were straight-up dangerous
• Real examples: "give your infant herbal tea," "use tweezers in your kid's ear," "that chest pain is probably just heartburn"

One more time for the people in back: chatbots told parents to give water to babies under 6 months. That can literally kill them.

Why AI Keeps failing at “Don’t kill people”
Here's the thing: when you ask "my chest hurts," a doctor thinks "heart attack until proven otherwise." The AI-chatbot thinks "what's the most statistically likely response based on my training data?"

Patient questions are messy. Missing context. Vague symptoms. Good medicine means ruling out the scary stuff first. LLMs are optimized for sounding confident, not keeping you alive.

Enter the EU AI Act
The AI Act puts new requirements on high-risk AI systems, especially those used in healthcare:

• Health-focused chatbots get treated like medical devices
• Risk management systems required
• Human oversight mandated
• Pre-market safety assessments

Plus transparency requirements so developers can actually build safer systems on top of foundation models.

Translation: No more "move fast and break things" when the things are people.

What companies need to do

Get real about your use case
Are you giving general health info or actually triaging symptoms? Because one of these requires medical device certification.
Build “oh sh*t” detection
Auto-escalate chest pain, pregnancy bleeding, anything involving infants etc. Show emergency numbers (112 in EU, 911 in US) early and often.
Force critical inputs
Make age/pregnancy status required fields. Block risky advice for vulnerable populations. Be brutally honest about limitations.
Prepare for compliance.

If you're building health AI, start now. Risk assessments, safety documentation, incident reporting systems. This stuff takes time, better get on it now

The bottom line
People are already using AI-chatbots as their family doctor. Until we have actual clinical-grade AI (properly tested, regulated, certified), the AI Act is our best shot at harm reduction at internet scale.

Is it perfect? No. Will it slow down some innovation? Probably.

But when the alternative is AI confidently dispensing advice that kills babies... I'll take the regulations, thanks.