Sovereign Document Intelligence Platform

Your documents. Your servers.
Your intelligence.

SDIP ingests any document — PDFs, Word files, transcripts, code, configs, research — and transforms it into a searchable, graph-mapped knowledge base with automatic sensitivity classification. Every byte stays on your hardware.

Zero Data Leakage — By Architecture, Not Policy

SDIP runs entirely on your infrastructure. Document chunking, sensitivity scanning, and LLM classification all happen locally — no API calls to OpenAI, Google, or any third party. Your competitive intelligence, client data, legal documents, and proprietary research never leave your building. This isn't a toggle in a settings menu. It's how the system is built.


01 — Pipeline

Five layers, from raw file to actionable intelligence.

Each layer operates independently and can be run on its own schedule. The pipeline is idempotent — re-running any stage only processes what's changed.

1

Ingest

Walk any directory or vault. Detect file type, compute content hash, register in PostgreSQL with full metadata. Incremental mode skips unchanged files. Supports Markdown, JSON, HTML, Python, shell, SQL, YAML, DOCX, and 20+ formats.

2

Chunk

Intelligent semantic chunking — not arbitrary character splits. Markdown splits on headers, then paragraphs, then sentences. Python splits on function and class definitions. JSON splits on top-level keys. Small files stay whole. Every chunk gets a parent heading, word count, and position index.

3

Classify

Two-layer sensitivity scanning. Layer 1: regex patterns for SSNs, API keys, credentials, financial data, medical terms — with false-positive filtering. Layer 2: local LLM classification for context-dependent sensitivity. Every finding recorded with method, confidence score, and redacted sample.

4

Graph

Neo4j knowledge graph. Documents become nodes. Topics are extracted and linked. Systems are identified and connected. Cross-references between documents become edges. Sensitivity propagates through the graph — if a restricted document references yours, that relationship is tracked.

5

Serve

FastAPI membrane with clearance-based access control. Four tiers: PUBLIC, INTERNAL, SENSITIVE, RESTRICTED. Under-clearanced requests get automatic redaction. Natural language query endpoint. Full audit log on every content access — who, what, when, served or blocked.


02 — Technology

What it runs on.

Production-grade open source. Every component self-hosted, every dependency auditable, every data path on your network.

Storage
PostgreSQL
Vectors
pgvector
Graph
Neo4j
LLM
Ollama (local)
API
FastAPI
Console
Textual TUI
Embeddings
384-dim vectors
Cloud calls
Zero

03 — Applications

Who this is for.

🜃

Law Firms & Compliance

Client documents, contracts, and case files searchable without leaving your network. Automatic PII detection. Audit trail on every access. Sensitivity propagation catches indirect exposure.

🜂

Media & Content Companies

Research archives, interview transcripts, source documents — indexed, cross-referenced, and queryable in natural language. Know what you have before you duplicate the work.

🜁

Engineering Teams

Codebases, architecture docs, runbooks, post-mortems — chunked and graph-mapped so tribal knowledge becomes searchable institutional knowledge. New hires find answers instead of asking.

🜀

Research Organizations

Papers, datasets, field notes, grant applications — topic extraction surfaces connections across projects. The graph sees relationships that keyword search misses.


04 — Comparison

SDIP vs. the alternatives.

CapabilityNotion AI / ChatGPTEnterprise RAG (cloud)SDIP
Data stays on your hardwareNoNoAlways
Zero cloud API callsNoNoBy design
Automatic sensitivity detectionNoPartialRegex + LLM
Knowledge graph mappingNoRareNeo4j topology
Clearance-based redactionNoSome4 tiers + audit
Sensitivity propagationNoNoGraph-based
Cross-document referencesNoLimitedAuto-detected
Runs on a single serverNoNoFull stack
Per-seat pricing$10–20/mo$$$None — you own it
05 — Next Step

Your documents deserve better than a cloud folder.

SDIP is deployed on your infrastructure, configured for your document ecosystem, and handed over with full documentation. No ongoing license. No per-seat fees. The system is yours.

Start a Conversation