Navigating a High-Stakes Scientific Platform

The client is a leading heavy-duty engineering manufacturer focused on building power systems and engines for submarines, trains, and large industrial machines. Their highly specialized manufacturing processes generate massive volumes of complex technical documentation (including damage reports, engineering diagrams, and HR policies) that are often scattered and difficult to navigate.

While vast amounts of valuable information existed, accessing it efficiently remained a persistent challenge. As the organization scaled, these limitations became more pronounced:

  • Highly Fragmented Data Ecosystem: Knowledge resided in siloed systems, legacy platforms, and inconsistent document formats, making unified access nearly impossible.
  • Unstructured Data: Real-world enterprise data included malformed PDFs, outdated protocols, and non-standardized documentation that traditional systems struggled to process.
  • Limited Contextual Understanding: Conventional search tools failed to interpret intent, returning incomplete or irrelevant results.
  • Inefficient Knowledge Workflows: Employees spent excessive time locating, validating, and cross-referencing information across systems.
  • Scaling Constraints: Existing solutions could not support growing data volumes or increasingly complex queries across multiple domains.

To fully leverage its knowledge assets, the organization required a modern AI-driven solution capable of delivering accurate, contextual insights at scale.

Engineering a Modular, Agentic RAG System

Addressing the client’s challenges required more than implementing a single AI model. It demanded a flexible, scalable architecture that could intelligently orchestrate multiple components while adapting to diverse enterprise use cases.

Our team engaged as a strategic AI engineering partner to design and deliver a next-generation knowledge retrieval system built on agentic RAG principles.

1. Parsing Module: Taming Document Complexity

The parsing service is the foundation of the entire platform. It classifies images into approximately 20 distinct categories to apply the most appropriate extraction method for each. 

For example, reading a flowchart in its correct directional sequence rather than treating it as a flat image. These categories were intentionally designed around the types of documents most frequently encountered in the client’s environment, with particular emphasis on technical documentation formats, ensuring the system could accurately interpret the structures and visual conventions typical of such materials. 

Handling the full spectrum of edge cases was non-trivial: real-world technical documents contain irregular multi-column layouts, mixed fonts, embedded diagrams at unusual orientations, and structural inconsistencies that cause standard parsers to silently drop or misrepresent content. 

Every such edge case was catalogued and explicitly addressed, because any gap at the parsing layer propagates as misinformation through every layer above it.

2. Single-Chunk Strategy for Critical Documents

Standard RAG systems split documents into small chunks, which scatters context and creates ambiguity when queries span multiple passages. 

For high-stakes documents like engine damage reports, the parsing module instead extracts all key information into a single, richly structured chunk. This preserves the full context of each report, including root cause, timeline, affected components, so the model is never reasoning from a fragment when it needs the whole picture.

3. Power Search Engine: Transparent Retrieval

The power search module provides an AI search experience designed as the first step in a two-stage workflow. Using AI-powered retrieval, it identifies the documents most relevant to a user’s query while showing exactly which text fragments matched the search, allowing employees to quickly validate why specific results were returned. 

Once the relevant documents are identified, users can move to the second step: conducting deeper analysis on selected materials, such as extracting data, comparing information across sources, or exploring them further through conversational AI.

4. Agentic RAG Service: Reasoning Without Hallucination

Agentic RAG Service is the most architecturally complex module, and the one where the risk of hallucination was highest. Rather than allowing the model to answer immediately from its parametric memory, the agentic reasoning engine first decomposes the user’s query into a structured plan of sub-queries. 

Each sub-query is routed to the appropriate tool (document retrieval, acronym expansion, mathematical calculation, or cross-document synthesis) and only verified, retrieved content is passed back to the model for final answer composition. The model’s own pre-trained knowledge is treated as inadmissible. 

The design directly counteracts the LLM’s natural tendency to produce fluent but “averaged” responses that blend information across contexts in ways that may be subtly wrong.

5. Excel Integration Module: Working with Structured Data

Structured data stored in spreadsheets comes with different challenges than unstructured content like PDFs, so we built a separate module to handle it. 

The service allows the platform to compare, aggregate, and cross-reference data across multiple Excel sheets, and combine those insights with information pulled from documents, which are often struggling to do reliably.

The result is a flexible AI foundation that evolves alongside business needs, rather than becoming another rigid system.

“The hardest part wasn’t building the AI; it was making it right. Anyone can wire up a language model and demo it on clean data. The real work is in the edge cases: the malformed PDFs, the legacy system that speaks a protocol nobody remembers, the diagram that breaks every assumption your parser was built on. We don’t come in to execute a spec, we come in to understand how a business actually operates, where its knowledge lives, where it gets lost, and then build something that survives contact with that reality. The modular architecture was a strategic one: we knew that what the client needed on day one would look different from what they’d need in year two.”

Bartłomiej Grasza

Principal AI Engineer – Addepto

Transforming Enterprise Knowledge into Actionable Intelligence

The partnership between our team and the client transformed fragmented, difficult-to-access knowledge into a unified, intelligent system capable of delivering real-time, context-aware insights.

By combining modular architecture with agentic RAG, the organization moved beyond basic search and static AI responses to a dynamic, reasoning-driven knowledge platform.

Before

  • Manual search through thousands of scattered, 30-page PDFs to find historical engine issues.
  • Images, flowcharts, and technical diagrams were ignored by text-based search systems.
  • Fear of data leaks prevented the use of public LLMs for daily tasks like summarizing emails.
  • Standard searches failed when queries required math, acronym expansion, or combining multiple documents.
  • Any new use case required a bespoke solution built from scratch.

After

  • Instant, structured summaries of root causes and damaged parts based on natural language queries.
  • Visual data is categorized into ~20 types and fully searchable, with text and image context intelligently linked.
  • A secure, internal Azure environment ensures enterprise data never leaves the isolated infrastructure.
  • Agentic RAG dynamically invokes tools to calculate, expand, and synthesize, only .
  • Modular architecture allows new capabilities to be plugged in as use cases emerge without rearchitecting the platform.

Employees can now access accurate information faster, make better decisions, and operate more efficiently across complex systems. At the same time, the scalable architecture ensures the solution can grow with the organization, supporting new data sources, workflows, and AI-driven capabilities.

The transformation demonstrates that the true challenge in enterprise AI is not just building models, but engineering systems that work reliably in real-world environments.

Ready to build your intelligent AI-powered knowledge platform? Contact us today!

Ready to build your intelligent AI-powered knowledge platform?