Sensible Use of AI in Safety-Critical Applications

Andrei Biswas
Dec 26, 2025
5 min read

Updated: Feb 10

The Promise and Peril of AI in Engineering

Artificial Intelligence (AI) is transforming how engineers work. From automating routine documentation to analyzing complex datasets, AI tools - particularly Large Language Models (LLMs) - offer remarkable productivity gains. According to industry surveys, over 40% of automotive professionals now use AI in some capacity for vehicle design, and this number continues to grow.

However, for functional safety engineers, the question isn't whether AI is impressive - it's whether AI can be trusted in contexts where errors could lead to injury or loss of life. The answer requires understanding both the risks and the emerging strategies to mitigate them.

The Hallucination Problem

LLMs like ChatGPT, Claude, and Gemini generate responses by predicting the most likely next words based on patterns learned during training. This approach produces remarkably fluent and coherent text, but it also means these models can confidently generate information that is entirely fabricated - a phenomenon known as "hallucination".

Research published in the Journal of Medical Internet Research found hallucination rates ranging from 28% to over 90% depending on the model and task. In a safety-critical context, even a single fabricated failure mode, incorrect failure rate, or invented technical specification could cascade into flawed safety analyses and ultimately unsafe products.

Consider an engineer using an LLM to help identify potential failure modes for a battery management system. If the model confidently suggests a failure mode that doesn't exist or worse, omits a critical one, the resulting FMEA could give false confidence in a system that hasn't been properly analyzed.

The Knowledge problem - see publication (Challenges)

Beyond hallucinations, there is a more fundamental limitation: domain knowledge gaps. Current LLMs may lack deep expertise in specialized engineering fields or emerging technologies. While they excel at processing general engineering knowledge, they often miss subtle information specific to particular industries, manufacturing processes, or cutting-edge components. For example, an LLM can identify commonly occurring failure modes for a LiDAR sensor, but a complete and defensible list depends on the specific LiDAR type, its manufacturing processes, and the quality characteristics of individual components from specific suppliers. This kind of nuanced, context-dependent knowledge is rarely captured in general training data, which means engineers cannot rely on LLMs alone to produce comprehensive safety analyses for specialized or novel systems.

Why Traditional AI Approaches Fall Short

Standard functional safety frameworks like ISO 26262 were designed for deterministic systems i.e. systems where the same input always produces the same output. AI models, by contrast, are inherently probabilistic. They may generate different responses to the same question, and their behavior can be difficult to predict or explain.

This creates several challenges for safety-critical applications. First, there's the issue of traceability: functional safety standards require clear documentation of how decisions were made, but an LLM's reasoning process is opaque. Second, there's the validation problem: how do you verify that an AI tool will behave correctly across all possible inputs when even small changes in phrasing can produce dramatically different outputs?

These challenges don't mean AI has no place in safety engineering, but they do mean we need to be thoughtful about how we use it.

RAG (Retrieval-Augmented Generation): Grounding AI in Authoritative Data

One of the most promising approaches for using AI sensibly in safety-critical contexts is Retrieval-Augmented Generation, or RAG. Rather than relying solely on what an LLM learned during training, RAG systems first retrieve relevant information from a curated, authoritative knowledge base, then use that retrieved context to generate responses.

Here's how it works: when an engineer asks a question, the system searches a database of trusted documents such as component datasheets, reliability handbooks like MIL-HDBK-217 or IEC 62380, previous FMEA records, or internal engineering standards. The most relevant passages are then provided to the LLM along with the original question, and the model generates a response grounded in that specific context.

This architecture offers several advantages for safety applications. The LLM's responses are anchored to verified source material rather than potentially outdated or incorrect training data. Engineers can trace exactly which documents informed a given response. And the knowledge base can be updated continuously without retraining the entire model.

RAG for FMEA and FMEDA

Failure Mode and Effects Analysis (FMEA) and Failure Modes, Effects, and Diagnostic Analysis (FMEDA) are cornerstone activities in functional safety engineering. They require engineers to systematically identify potential failure modes, assess their effects, and determine appropriate detection and mitigation strategies. These analyses are time-consuming, knowledge-intensive, and critically dependent on access to accurate technical information.

Recent research published in the Journal of Intelligent Manufacturing demonstrates how RAG architectures can enhance FMEA workflows. The researchers developed a knowledge graph-enhanced RAG system that allows engineers to query FMEA databases using natural language. Instead of manually searching through spreadsheets, an engineer can ask questions like "What are the highest-risk failure modes for the welding process?" or "What detection methods have we used for similar defects in the past?"

The system retrieves relevant entries from historical FMEA records and presents them with full context, enabling faster identification of risk mitigation strategies. Importantly, every response links back to the source data, maintaining the traceability that functional safety standards require.

For FMEDA specifically, RAG systems can connect to component reliability databases, automatically retrieving failure rates and failure mode distributions when engineers are analyzing new circuit designs. Rather than manually looking up each component's failure characteristics, the engineer can focus on the higher-level analysis while the AI handles information retrieval.

Best Practices for Sensible AI Use

Based on current research and industry experience, several principles emerge for using AI responsibly in safety-critical engineering:

AI as assistant, not authority. AI tools should augment human judgment, not replace it. Every AI-generated output in a safety context should be reviewed by a qualified engineer before being incorporated into official documentation or decisions.
Curate your knowledge base carefully. The quality of RAG outputs depends entirely on the quality of the underlying data. Use authoritative sources: recognized standards, verified component datasheets, and validated historical records. Establish clear processes for keeping this knowledge base current and accurate.
Maintain traceability. Any AI system used in safety workflows should clearly indicate which sources informed its responses. This enables verification and supports the documentation requirements of standards like ISO 26262.
Define clear boundaries. AI may be appropriate for information retrieval, draft generation, and pattern recognition across large datasets. It is not appropriate for making final safety determinations or replacing the engineering judgment required by functional safety standards.
Validate before deployment. Before integrating AI tools into safety workflows, establish evaluation criteria and test the system's performance on representative tasks. Monitor for hallucinations, particularly in edge cases.

Conclusion

AI offers genuine value for functional safety engineers, accelerating information retrieval, helping identify patterns in complex datasets, and reducing the manual burden of documentation. However, the probabilistic nature of LLMs and their tendency to hallucinate make them unsuitable for unsupervised use in safety-critical decisions.

Retrieval-Augmented Generation represents a sensible middle path: harnessing AI's capabilities while grounding its outputs in authoritative, traceable data. For activities like FMEA and FMEDA, where engineers must synthesize information from multiple sources, RAG-based tools can meaningfully improve efficiency without compromising the rigor that functional safety demands.

The key is approaching AI with clear eyes - recognizing both its power and its limitations - and designing workflows that leverage the former while protecting against the latter.

Tired of manual FMEAs? Streamline your automotive FMEA process with software built for speed and rigor.