Technical documentation and AI: a misunderstanding that needs clarification

Michael M. Kania
July 9, 2025
Medical Device Regulation

When ChatGPT Entered the Stage, the Industry Cheered: "Now AI can finally write our technical documentation!" But this hope was (and still is) a dangerous misconception – especially in the regulated world of medical technology. As tempting as it sounds to generate complex technical documentation with just a few prompts: Anyone who believes this will work using PDFs, Word files, or Excel sheets as input is building a ticking time bomb.

Or to put it another way: You’re mounting a Formula 1 engine onto a horse-drawn carriage – and wondering why it crashes in the first corner.

AI Needs Structure – Not Paper

Large language models (LLMs) like GPT are impressively powerful. But they’re not magicians. They’re only as good as the data you feed them.

And here's the problem: Most technical documentation today still exists in unstructured formats.

That means:

  • PDFs that are 200 pages long, with no metadata.
  • Word documents full of copy-pasted legacy content.
  • Excel sheets that require three separate explanations – if you're lucky.

These documents might be readable (sort of) for humans. But for AI, they’re a maze. Facts live side-by-side with redundancies. Versions aren’t clearly separated. Terminology varies across pages. The result? AI starts guessing. And in the best case, a reviewer spots the mistake.

In the worst case? The errors go undetected – and influence critical product decisions. Not a minor issue in a regulated environment.

This risk is particularly severe in regulated environments, where the impact of incorrect or inconsistent information can be legally and clinically significant (see DocBench 2024, Microsoft Research 2025).

Structured Data: Better Fuel for AI

Our own work at meddevo – along with numerous studies – shows one thing clearly:
Structured, content-based data models are key to using AI in technical documentation both effectively and safely.

Here’s a quick comparison:

Feature Unstructured Sources (e.g. PDFs) Structured Data (e.g. modular data models)
Accuracy Hallucinations, misinterpretations Clear statements, fewer errors
Consistency Inconsistent terms, redundancy Standardized terminology, uniformly maintained
Relevance Irrelevant sections may be included Only contextually relevant content is processed
Clarity Often confusing or unclear Logically structured, easy to follow

A 2023 study by RWS (Tridion Docs) found that LLMs provided significantly higher factual accuracy and more relevant responses when fed content from structured, modular databases instead of raw PDFs or Word files.Similarly, Fluid Topics 2024 reported that AI assistants based on DITA XML content outperformed PDF-based approaches in answer quality and speed.

As one industry colleague aptly put it: "Garbage in, garbage out – but structured gold becomes real value."

Why the Medium Matters – and PDFs Are the Wrong One

Paper-based formats (PDF, Word, Excel) were never made for machines. They’re passive. They lack semantics, structure, and true metadata. They may look nice – but they’re not machine-readable in a meaningful way. AI needs context, clarity, and modularity.

Content-based data models provide exactly that:

Information organized by product components, intended use, or regulatory requirements. Versioned, referenceable, and traceable content.

And – most importantly – content that’s understandable not just by humans, but also by machines.

Microsoft's KBLaM (Knowledge Base Language Model, 2025) demonstrated that LLMs connected to structured knowledge sources were more accurate and less likely to hallucinate – even refusing to answer when reliable content wasn’t available. That’s a level of trustworthiness unstructured content simply can’t offer.

AI Isn’t the Problem – The Medium Is

The concern that AI poses a risk to regulated documentation is understandable – but misleading. AI isn’t the risk. The real risk is feeding it the wrong kind of input.

Today we already see this clearly:

A content-based eTD model without AI saves time, prevents errors, and improves quality. The same model with AI amplifies those benefits – because the machine no longer guesses, it delivers with precision.

According to ChatBees 2023 and Webex Developer Blog 2025, AI chatbots trained on structured documentation outperform those trained on freeform documents in nearly every metric: speed, relevance, and user satisfaction.

The Way Forward: Harmonized, Structured, Machine-Readable Knowledge

The future of technical documentation lies in modular, version-controlled, semantically enriched data models. Ideally: Harmonized across the EU, interoperable, and AI-ready.

What does it take to get there?

  • The courage to move from documents to content.
  • The insight that AI doesn't run on paper – it runs on data logic.
  • And the commitment to prioritize long-term quality over short-term convenience.
As summed up in the Agrawal et al. 2024 Knowledge Graph Survey: "The combination of LLMs and structured ontologies is not optional – it is the natural evolution of AI-based decision support."

Final Thought: AI Isn’t a Magic Wand – But It Is a Force Multiplier

Anyone hoping AI will turn legacy Word documents into perfect technical files is bound to be disappointed. But those willing to invest in structured content models will be rewarded – with greater efficiency, higher quality, and regulatory peace of mind. The carriage is obsolete. The engine is ready.

What’s missing is the right chassis – and it’s definitely not made of paper.

Sources & References

  • RWS / Tridion Docs (2023): AI-Powered Content Management – A Case for Structured Documentation
  • Fluid Topics (2024): How Structured DITA Boosts AI Capabilities
  • DocBench Benchmark (2024): Evaluating LLMs on Technical Documents
  • Microsoft Research (2025): KBLaM: Knowledge Base Language Models
  • ChatBees (2023): Real-World AI Performance in Customer Documentation
  • Webex Developers Blog (2025): Why Structured Data Wins in Conversational AI
  • Agrawal, S. et al. (2024): A Survey on Knowledge Graphs in LLM Applications

Für die Medizintechnik entwickelt
DEMO anschauen
Wenn Sie auf „Akzeptieren“ klicken, stimmen Sie der Speicherung von Cookies auf Ihrem Gerät zu, um die Navigation auf der Webseite zu verbessern, die Nutzung der Webseite zu analysieren und unsere Marketingaktivitäten zu unterstützen. Weitere Informationen finden Sie in unsereren Datenschutzrichtlinien.