LangChain-based AI Retrieval Integration Guide

Safetronic AI Trust Gateway integrates into LangChain-based AI retrieval systems to enforce cryptographic provenance, dataset authority, and runtime integrity verification across the Retrieval-Augmented Generation (RAG) lifecycle.

Modern AI retrieval pipelines allow language models to reason over large bodies of external knowledge stored in vector databases. While this improves accuracy and domain relevance, it also introduces a fundamental security problem: the model cannot inherently distinguish authorised knowledge from unauthorised or tampered content.

The Safetronic AI Trust Gateway addresses this problem by introducing a chain-of-trust architecture around the RAG knowledge lifecycle.

Through cryptographic signing, dataset admission controls, and runtime verification, the AI Trust Gateway ensures that AI systems reason only over knowledge that has been explicitly authorised, cryptographically verified, and provenance-traceable.

Rather than trusting the vector database implicitly, trust is asserted and verified at each boundary where knowledge enters or is consumed by the system.

Where does Safetronic Integrate

Integration occurs directly inside the LangChain retrieval pipeline.

After documents are retrieved from the vector database, the Safetronic verification layer performs a series of trust checks before the information is passed to the language model.

Verification of dataset authority
Validation of cryptographic signatures on retrieved content
Confirmation that the document originates from an authorised dataset
Integrity verification of the underlying source files

Only content that successfully passes these checks is admitted into the LLM context window.

This admission step ensures that the language model operates only on authorised knowledge rather than merely retrieved knowledge.

Purpose of this Guide

LangChain and LangGraph are used in this guide as reference frameworks for illustrating the integration points within a Retrieval-Augmented Generation pipeline. Safetronic AI Trust Gateway is not limited to LangChain-based implementations and can be integrated with other orchestration frameworks, custom retrieval pipelines, or enterprise AI platforms that follow similar retrieval and inference patterns.

This document provides technical guidance for integrating Safetronic AI Trust Gateway into a customer-controlled AI retrieval environment built using LangChain technologies such as LangGraph.

It describes:

Where trust controls are applied across the RAG lifecycle
How Safetronic signing and verification operations are integrated
How dataset authority and ingestion admission are enforced
How runtime verification prevents unauthorised knowledge from influencing model output

This guide focuses exclusively on trust enforcement within the retrieval pipeline.

It does not cover model selection, embedding strategies, vector database configuration, or application-specific prompt design.

The intended audience is AI architects, platform engineers, and security engineers responsible for designing retrieval systems in regulated or sovereign environments where knowledge provenance, authorisation, and auditability are required.

What This Integration Demonstrates

Selected integration excerpts are included to demonstrate how signing, verification, and dataset authority checks are introduced within the retrieval pipeline.

The integration introduces three trust boundaries within the AI retrieval lifecycle:

Dataset Authority — Knowledge datasets are signed by a designated data custodian before they are admitted into the AI knowledge domain.
Curator Admission Control — A curator verifies dataset authority and signs the ingestion event before knowledge is admitted to the vector database.
Runtime Knowledge Verification — Each retrieved content chunk is verified cryptographically before it is passed to the language model.

This establishes a complete end-to-end chain of trust from source system export through to AI reasoning.

LangChain Execution Graph with Safetronic Verification

In LangChain-based AI retrieval systems orchestrated with LangGraph, the execution pipeline is defined as a graph of processing nodes. Safetronic introduces verification nodes within this graph to enforce trust boundaries before retrieved knowledge is admitted into the model context window.

LangGraph Node Integration

Safetronic verification is introduced into a LangChain-based AI retrieval system by inserting trust enforcement nodes directly into the LangGraph execution graph.

To integrate Safetronic AI Trust Gateway into a LangGraph workflow, the retrieval pipeline must include two additional trust enforcement nodes: verify and chain_of_trust.

The verify node performs dataset authority validation, cryptographic signature verification, and document integrity checks on retrieved content. The chain_of_trust node then generates a verifiable provenance report describing the authorised knowledge used to produce the AI response.

retrieve

verify

chain_of_trust

llm

verify and chain_of_trust are the Safetronic trust enforcement nodes.

Runtime Knowledge Verification
verify node

The verify node validates retrieved knowledge before it is admitted into the model context window.

Within the verify node implementation, each retrieved document chunk is validated against the authorised dataset manifest and its Safetronic cryptographic signature is verified. Only chunks that pass these checks are admitted into the retrieval context; any chunk with an invalid signature, unauthorised origin, or integrity mismatch is rejected and excluded from the AI response.

The verification process also ensures that each retrieved chunk originates from a file explicitly listed in the dataset manifest. Chunks referencing unauthorised or unknown source files are rejected before further verification.

By enforcing admission at retrieval time, the verify node prevents tampered or unauthorised content from influencing the language model, providing a deterministic mitigation against RAG poisoning and knowledge manipulation.

In the code examples, the fields chunk_sig and kid are stored as metadata alongside each document chunk in the vector database during ingestion.

During ingestion, each chunk of source content is cryptographically signed by the RAG Curator using Safetronic. The resulting signature (chunk_sig) and the identifier of the curator’s signing key (kid) are stored as metadata alongside the chunk in the vector database.

When the verify node executes at runtime, these metadata fields are retrieved together with the document chunk and used to validate the signature through Safetronic verification services.

Before signature verification, the system recomputes a hash of the canonicalised chunk content and compares it to the stored content_hash value recorded during ingestion. This ensures that the retrieved content has not been altered prior to cryptographic verification.

In the code examples, the helper function safetronic_verify invokes the Safetronic signature verification API, which internally calls the endpoint POST /safetronic/signatureVerification/doCscRaw to verify the raw signature against the supplied content.

/safetronic/signatureVerification/doCscRaw

Input Parameter	Integration Context
transactionIdentifier	Identifier generated by the RAG pipeline to track the verification operation.
content	Base64 encoding of the canonicalised chunk content retrieved from the vector database.
contentIsDigest	Set to false because the original chunk content (not a digest) is provided.
signature	Retrieved from vector DB metadata as `chunk_sig`. Created during ingestion when the RAG Curator signs the chunk.
keyLabel	Retrieved from vector DB metadata as `kid`. Identifies the curator signing key.
signatureType	Set to `(4) RAW_SHA256_RSA`.

Output Parameter	Integration Context
status	Verification result returned by Safetronic. GOOD → chunk admitted ERROR → chunk rejected REVOKED → chunk rejected

The Safetronic verification service validates the RSA signature over the supplied content using the authorised signing key associated with the curator.

The request includes the retrieved chunk content, the stored signature (chunk_sig), and the identifier of the signing key (kid). Safetronic verifies the signature using the authorised signing key and returns the verification result.

Chunks with valid signatures are admitted into the retrieval context, while any chunk that fails verification is rejected and excluded from the AI response.

In addition to verifying individual chunk signatures, the verification process also validates the integrity of the underlying source documents. The original file is reloaded and its SHA-256 hash is compared against the value recorded in the dataset manifest to ensure that the source content has not been modified since ingestion.

If all retrieved chunks fail verification, the system raises an error and does not proceed to response generation. This prevents the language model from operating without authorised context.

Provenance Trust Reporting
chain_of_trust node

The chain_of_trust node generates a verifiable report describing the authority and provenance of the knowledge used to produce the AI response. Rather than returning an answer alone, the system can also produce a structured trust record showing the source system from which the knowledge originated, the dataset authority that approved its use, the curator admission event during ingestion, and the results of runtime verification for each retrieved document chunk.

This report provides auditable evidence of the information that influenced the model output, allowing organisations to verify that responses were generated only from authorised knowledge. The example on the right illustrates a typical Chain-of-Trust report produced by a LangChain-based retrieval system integrated with Safetronic AI Trust Gateway.

Constructing the
Chain-of-Trust Report

The chain_of_trust node constructs the report by consolidating verification evidence from multiple stages of the RAG lifecycle, including dataset authority established during ingestion, curator admission of the dataset into the AI knowledge domain, and runtime verification results produced by the verify node during retrieval.

The dataset identifier used to locate these manifests is obtained from the metadata attached to the verified document chunks returned by the verify node. This metadata includes the datasetId and ingestion event identifier recorded when the dataset was admitted into the AI knowledge domain.

To assemble the report, the node needs to load the dataset manifests created during ingestion and verifies their signatures using Safetronic services. These manifests provide the authoritative record of which source documents were approved for inclusion in the AI knowledge domain.

Source system information included in the report (system name, environment, owner, export timestamp, and export reference) is read from the origin section of the dataset manifest created during ingestion.

dataset_manifest = load_json("./data/<datasetId>/dataset_manifest.json") curator_manifest = load_json("./data/<datasetId>/curator_manifest.json")

The dataset manifest contains the dataset origin metadata captured during ingestion, including the source system, environment, and export reference. This information is read from the origin section of the dataset manifest and does not require loading a separate origin file at runtime.

In the examples shown in this guide these manifests are loaded from the local dataset directory used during ingestion. In production deployments these artefacts would typically be stored in a secure repository or object store and retrieved by the service at runtime.

The signatures attached to these manifests are validated using the Safetronic verification API. Internally this invokes the endpoint /safetronic/signatureVerification/doCscRaw to confirm that the manifest content matches the signatures generated during dataset approval and curator admission.

safetronic_verify( content_bytes = dataset_manifest_bytes, signature_b64 = dataset_manifest["signature"]["signature"], kid = dataset_manifest["signature"]["kid"], trans_id = "verify-dataset-authority" )

After the manifest signatures are validated, the node resolves the signing certificates associated with each Safetronic key. This step retrieves the certificate subject and serial number so the report can identify the dataset custodian and RAG curator responsible for approving the knowledge.

/safetronic/signature/doGetCertificate

Input Parameter	Integration Context
keyIdentifier	Identifier of the Safetronic signing key whose certificate should be resolved. In the examples shown in this guide this corresponds to the `kid` value stored with signed artefacts such as dataset manifests or curator manifests.
extraData	Optional metadata passed to the Safetronic service.

Output Parameter	Integration Context
certificateSubjectDN	Distinguished Name of the certificate associated with the signing key. Included in the Chain-of-Trust report to identify the approving authority (e.g. dataset custodian or RAG curator).
certificateSerial	Serial number of the signing certificate used during approval. Recorded in the report to provide a verifiable reference to the signing credential.

The response returns certificate attributes such as the subject DN and certificate serial number. These values are included in the report to identify the dataset custodian and RAG curator responsible for approving the knowledge.

The retrieved chunk identifiers included in the report are obtained from the verification results produced by the verify node. These correspond to the chunk indexes assigned during dataset ingestion and allow the report to identify exactly which sections of a document influenced the AI response.

The resulting Chain-of-Trust report links the AI response to the authorised dataset, the approving custodians and curators, and the verified documents that influenced the model output, providing an auditable provenance record for the answer.

Dataset Ingestion &
Admission

Before knowledge can be used by the AI retrieval system, it must first be admitted into the AI knowledge domain through the ingestion process. This process establishes dataset authority, records curator admission, and cryptographically signs each retrieval chunk so that it can be verified later during runtime execution.

The ingestion workflow produces three key artefacts that form the foundation of the Chain-of-Trust model:

Dataset authority manifest approved by the dataset custodian
Curator admission manifest recording dataset ingestion
Cryptographically signed retrieval chunks stored in the vector database

Dataset Manifest Creation

The ingestion process begins by constructing a dataset manifest describing the documents included in the dataset and their cryptographic hashes. This manifest defines the authoritative set of files that are permitted to enter the AI knowledge domain.

dataset_manifest = { "datasetId": DATASET_ID, "generatedAt": timestamp(), "origin": dataset_origin, "files": [] }

The origin section records the provenance of the dataset at the point of export. These fields are embedded within the dataset manifest and are later used to populate the source system context in the Chain-of-Trust report.

"origin": { "sourceSystem": "Enterprise Knowledge Repo", "environment": "Production", "systemOwner": "Policy Management", "exportReference": "EXPORT_JOB_20260227_0458" }

Each document in the dataset directory is hashed and recorded in the manifest. These hashes are later used during retrieval to ensure that retrieved content matches the authorised dataset files.

The dataset origin metadata is loaded from a dataset_origin.json file located in the dataset directory. This record describes the system from which the knowledge was exported and provides the source system provenance later displayed in the SOURCE SYSTEM section of the Chain-of-Trust report.

Dataset Authority &
Custodian Approval

Once the dataset manifest has been created, it must be approved by the dataset custodian before the dataset can be admitted into the AI knowledge domain. This approval is performed through the Safetronic Mobile MFA authorisation process.

The ingestion system submits the manifest to the Safetronic authorisation API. The custodian receives a Mobile MFA approval request showing a summary of the dataset origin and manifest contents. The dataset is only admitted if the custodian explicitly approves the request.

Ingest

doAuthorise

Mobile MFA Approval

{
  datasetId: KB_2026_02,
  generatedAt: 2026-02-27 07:39,
origin: {
  sourceSystem: Enterprise Repo,
  environment: Production,
  systemOwner: Policy Management,
  exportReference: 20260227
},
files: [
  path: SCB221_enterprise_policy..pdf,
  sha256: d43a4e27...
  path: SCB221_operational_stand..pdf,
  sha256: cf3cc492...

Decline

Signed Manifest

The ingestion process submits the dataset manifest to Safetronic for custodian approval via the doAuthorise API. The dataset custodian receives a Mobile MFA request showing a summary of the manifest and source system metadata. Only after the custodian approves the request does Safetronic sign the manifest and return the dataset authority record.

/safetronic/signature/doAuthorise

Input Parameter	Integration Context
transID	Identifier generated by the ingestion process to track the dataset approval transaction. In the examples shown in this guide this corresponds to the dataset authorisation request (e.g. `dataset-<datasetId>`).
content	Base64 encoding of the dataset manifest submitted for approval. This manifest describes the authorised dataset files and their cryptographic hashes.
keyLabel	Identifier of the Safetronic signing key used by the dataset custodian. In the ingestion workflow this is the `dataset-custodian` key reference.
extraData	Optional metadata supplied with the approval request.

Output Parameter	Integration Context
status	Indicates the current state of the authorisation request. Typical values include `PENDING`, `APPROVED`, or `REJECTED`.
transactionIdentifier	Identifier returned when the authorisation request is submitted. This value is used to retrieve the final approval result once the Mobile MFA approval is completed.
signature	Cryptographic signature generated by Safetronic after the dataset custodian approves the manifest via Mobile MFA. Present only when the request status is `APPROVED`.
kid	Identifier of the Safetronic signing key used to sign the approved dataset manifest.
approvalMethod	Indicates the approval mechanism used for the authorisation request. In the examples shown in this guide the approval method is `MOBILE_MFA`.
approvedAt	Timestamp recorded when the dataset custodian approved the request.
approvalId	Unique identifier assigned to the Mobile MFA approval event.
tokenSerial	Serial number of the authentication token used during the Mobile MFA approval process.

The examples in this guide present the doAuthorise operation as a synchronous call for simplicity. In production environments, Mobile MFA approval is typically asynchronous. The authorisation request returns a transaction identifier, and the system retrieves the final approval result once the custodian completes the Safetronic Mobile MFA approval.

After the custodian approves the request via Safetronic Mobile MFA, Safetronic signs the dataset manifest using the custodian’s signing key and returns a signature record describing the approval event.

sign_result = safetronic_authorise( data_bytes = manifest_bytes, trans_id = "dataset-<datasetId>", key_label = "dataset-custodian" )

The resulting signature record contains the signature, signing key identifier, approval timestamp, approval identifier, and the token serial used during the Mobile MFA approval.

This signature record is embedded directly inside the dataset manifest and forms the dataset authority record. In addition to the signature itself, it includes the Safetronic signing key identifier, Mobile MFA approval metadata, and the token serial associated with the custodian approval event.

Before continuing with ingestion, the curator verifies the custodian signature on the dataset manifest using the Safetronic verification API. This step confirms that the manifest approved via Mobile MFA has not been modified and that the signature corresponds to the authorised custodian key. If the verification fails, the ingestion process aborts and the dataset is not admitted into the AI knowledge domain.

Curator Dataset Admission

After the dataset authority has been established, the RAG Curator admits the dataset into the AI knowledge domain. This step records the ingestion event and cryptographically binds the dataset admission to the dataset manifest.

The curator manifest contains the hash of the approved dataset manifest (datasetManifestHash), ensuring that the curator admission is cryptographically bound to the exact dataset approved by the custodian.

curator_manifest = { "datasetId": DATASET_ID, "datasetManifestHash": sha256(dataset_manifest), "ingestEventId": INGEST_EVENT_ID, "generatedAt": timestamp() }

The curator manifest also records the total number of chunks generated during ingestion, providing additional context for audit and validation purposes.

The curator manifest is then signed using Safetronic. This signature confirms that an authorised curator admitted the dataset into the AI retrieval system.

curator_signature = safetronic_sign( data_bytes = curator_manifest_bytes, trans_id = "curator-" + INGEST_EVENT_ID, key_label = "rag-curator-signing-key" )

/safetronic/signature/doSignature

Input Parameter	Integration Context
transID	Identifier generated by the ingestion process to track the signing operation. For curator admission this corresponds to the ingestion event identifier (e.g. `curator-<INGEST_EVENT_ID>`).
dts	Base64 encoding of the content to be signed. In the ingestion workflow this corresponds to either the curator manifest or the canonicalised retrieval chunk content.
signatureType	Signature algorithm used for signing. In the example shown in this guide this is `RAW_SHA256_RSA (4)`.
isDigest	Indicates whether the supplied data is a digest or the original content. In the ingestion example the original content is provided, so this value is `false`.
extraData.keyLabel	Identifier of the Safetronic signing key used to perform the operation. For curator admission this is `rag-curator-signing-key`.

Output Parameter	Integration Context
signature	Cryptographic signature generated by Safetronic for the supplied data. This signature is stored in the signed artefact (e.g. curator manifest or chunk metadata).
kid	Identifier of the Safetronic signing key used to produce the signature. This value is stored with the signed artefact and later used during runtime verification.
certificateSubjectDN	Distinguished Name of the signing certificate associated with the curator key. This information later appears in the Chain-of-Trust report to identify the approving authority.
certificateSerial	Serial number of the certificate used for signing. Included in the Chain-of-Trust report for auditability.

Chunk Signing &
Vector Database Metadata

After admission, each document is split into retrieval chunks. The canonical text of each chunk is signed by the RAG Curator using Safetronic.

Before signing, the chunk content is canonicalised to ensure that insignificant formatting differences such as newline variations do not change the resulting signature. This guarantees that signatures remain stable across different environments and ingestion pipelines.

In the examples shown in this guide canonicalisation normalises text by converting all line endings to \n and removing leading and trailing whitespace before computing the signature. This ensures that identical content produces identical signatures regardless of the source document parser or operating environment.

sign_result = safetronic_sign( data_bytes = canonical_chunk, trans_id = "chunk-" + content_hash, key_label = "rag-curator-signing-key" )

The resulting signature and signing key identifier are stored as metadata alongside the chunk when it is inserted into the vector database.

metadata = { "content_hash": content_hash, "chunk_sig": sign_result["signature"], "kid": sign_result["kid"], "chunk_index": i + 1, "datasetId": DATASET_ID, "ingestEventId": INGEST_EVENT_ID }

After the chunks have been signed and their metadata recorded, embeddings are generated and the documents are inserted into the vector database. The signed chunk metadata remains attached to each vector entry so that retrieved chunks can later be validated by the verify node during runtime execution.

vectorstore = Chroma.from_documents( documents = signed_docs, embedding = embeddings, persist_directory = CHROMA_DIR, collection_name = COLLECTION_NAME )

This metadata creates the cryptographic link between ingestion and runtime verification. When a chunk is later retrieved during a RAG query, the verify node uses the stored signature (chunk_sig) and key identifier (kid) to validate the chunk using the Safetronic verification API.

Ingestion Trust Chain

Together the dataset origin record, custodian-approved dataset manifest, and curator admission manifest form the ingestion trust chain. These artefacts are later validated by the verify and chain_of_trust nodes during runtime execution to reconstruct the complete provenance record for each AI response.

Together these steps establish the trust chain for the dataset:

dataset_origin ↓ dataset_manifest (custodian Mobile MFA approval) ↓ curator_manifest (dataset admission) ↓ chunk signing ↓ vector database metadata ↓ runtime verification ↓ Chain-of-Trust report