AI.DI by imkore  ·  The Document Intelligence Framework  ·  Built ground up in the AI era, for the AI era
Documents are the common denominator across every system, workflow, and transaction.
Yet they remain one of the least governed and least intelligent assets inside the enterprise.
We built AI.DI to change that permanently.
Every other document platform is a dumping ground. Every revision, every draft, every outdated attachment, every half finished version your team never circled back to. The documents that actually run your business are buried in the noise. AI.DI is built for the documents that matter. The contracts. The certificates. The disclosures. The agreements. The records that are the currency of your organization. We ingest them with intent, certify what is real, and turn the intelligence inside every document into something your people, your systems, and your AI agents can actually use. Past, present, and future. Yes, we clean up the mess you already have. More importantly, we make sure the mess never forms again.
6
Integrated Engines
ANY
Doc Types
30
ML Engines
0
Lines of Legacy Code
Any
Industry · Any Scale
Sentry Document Assurance Abstract.DI Extraction AI.DI Document Warehouse AI Orchestration & Agent Gateway Document Gateway Millennia FileStar Continuous Transaction Readiness
Overview · Tab 01
Built for the Documents That Matter. Not as Another Dumping Ground.
Box, SharePoint, M-Files, Egnyte. Every vendor in the market is building a bigger dumping ground. Every revision, every draft, every half finished version, every outdated attachment your team forgot about, all of it saved, all of it searchable, none of it distinguished from the documents that actually run your business. AI.DI is built on a different principle. Documents are the currency of your organization. The contracts that define your deals. The certificates that prove your insurance. The disclosures that keep your regulators satisfied. Those documents deserve to be read, certified, and made available to your people and your systems as open intelligence. Not buried as attachments in someone else's folder tree. This platform ingests them with intent, certifies what is real, extracts the intelligence inside every one of them, and distributes that intelligence where it needs to go. Past, present, and future.
Why We Built AI.DI

Other systems organize documents. AI.DI yields the intelligence inside them. Every field, every date, every party, every obligation, every number buried in every document becomes structured, queryable, and routable data your business can actually act on. We clean up the mess already sitting in your storage platforms. More importantly, we prevent the mess from ever forming again. The documents that matter stop being closed interpretations and start being open intelligence.

The Full Document Intelligence Platform
DOCUMENT SOURCES Box / Egnyte SharePoint ERP / HRIS SAP / Workday Email / Outlook DocuSign Salesforce Windows Share Any REST API ADVISORY SERVICE imkore Blueprint — Document Intelligence Audit + Readiness Roadmap · $50K to $150K · 60–90 days TRUST & GOVERNANCE LAYER TRUST ENGINE Sentry Document Assurance Fingerprint · Deduplicate · Certify · Search Zero doc storage · Fingerprints flow to Warehouse GOVERNANCE ENGINE Millennia FileStar Document Fabric · Workflow · Compliance Governs docs · Syncs with Warehouse OPERATIONAL ENGINES — RUN ON THE WAREHOUSE EXTRACTION ENGINE Abstract.DI Any doc · Any industry · 100K batch Extracts intelligence → Stores in Warehouse EXCHANGE ENGINE Document Gateway Ingest · Validate · Distribute · Track Certifies docs · Routes to Warehouse AI AGENT ENGINE AI Orchestration MCP · Agent Gateway · Q&A · RAG Queries Warehouse · Returns certified answers DATA & ANALYTICS Snowflake Databricks Power BI / BigQuery LAYER 1 — FOUNDATION INFRASTRUCTURE AI.DI Document Warehouse Documents · Extracted Data · Metadata · Fingerprints · Audit Trail · CTR Score The hub all engines connect through · Queryable by any AI or analytics system INTEGRATIONS REST API · JDBC MCP · SDK Webhooks · Snowflake AI & ANALYTICS CONSUMERS Snowflake / BigQuery Copilot / ChatGPT Claude / Gemini Power BI / Tableau Custom AI Agents MCP Clients / SDK DEPLOYMENT: Azure Cloud AWS On Premise Hybrid Single Tenant Multitenant Any File Type · Any Industry · Any Org Size Every engine has standalone value · Modular adoption · No rip and replace required
Why Storage Vendors Cannot Become Intelligence Platforms
The Architecture Difference
Storage Vendors Bolted AI On. AI.DI Was Built as Intelligence From Day One.

Box, SharePoint Copilot, M-Files, and Hyland all share the same starting point. A file storage system built in a different era, with AI features layered on top. Layered AI produces generic summaries. Native AI produces structured intelligence from every field of every document. The kind you can actually query, export, certify, and act on.

The AI.DI Document Warehouse cannot be retrofitted onto a file system. It is not a feature. It is the foundation. Building it requires starting over. AI.DI did. That is the entire difference.

The Structure Difference
AI.DI Brings Structure, Discipline, and Document Culture. The Others Enable Chaos.

Every other platform is a dumping ground. Anyone can upload anything, name it whatever they want, and create another folder no one else can find. Multiply that across thousands of users for ten years and you get the document chaos every organization is living with right now.

AI.DI is the opposite. It handles only the documents that matter. Every document is classified, structured, certified, controlled, and branded. Access is governed at the database. Discipline is enforced by the platform. Document culture replaces document chaos. This is what a real Document Intelligence framework looks like.

The Trust Difference
AI Reasoning Over Uncertified Documents Is Just Faster Hallucination.

Every enterprise AI deployment runs into the same wall. The documents feeding the model are unverified, duplicated, mislabeled, and structurally inconsistent. Copilot hallucinates because the underlying SharePoint is untrustworthy. The model is not the problem. The data is.

AI.DI certifies every document before it ever reaches an AI pipeline. Sentry fingerprints make it mathematically impossible to feed a falsified document into an answer. Every response is traceable to a certified version with a confidence score. This is what trustworthy enterprise AI looks like.

The Intelligence Flywheel. Why It Compounds.
Value That Grows the Longer You Use It

Documents flow into Document Gateway. Abstract.DI extracts intelligence from every one. Sentry fingerprints and certifies them. The Document Warehouse stores all of it as structured, queryable data. The Warehouse makes Abstract.DI more accurate. Better accuracy strengthens Sentry signals. Better signals make Document Gateway more valuable. More value drives more documents. After eighteen months you have an intelligence asset that no incumbent platform can offer at any price, because none of them are built to produce it.

The Complete Platform. Every Engine. Every Capability.
DG
Document Gateway — Exchange & Distribution Engine
Check-In Studio · Distribution Studio · Transaction Rooms · ML Learning Studio · 200+ components · 29 edge functions
Core OSAny industry
The central operating system for every document. Replaces Box, SharePoint, Egnyte as the primary system of record while connecting to any of them as migration sources. React/TypeScript/Vite + Supabase + Deno Edge Functions + Cloudflare R2 storage.
Check-In Studio
AI powered document intake with AbstractIQ auto classification, batch template mode, session history, rejection/resubmission pipeline, and external submitter portal.
Distribution Studio
Unified hub replacing 6 legacy distribution workflows. Standing distributions, serialized delivery, access tracking, client branding engine, full audit trail.
ML Learning Studio
30 self improving AI engines across 6 capability tiers. Org specific model weights. 4 map views. Continuous accuracy improvement.
AB
Abstract.DI — AI Extraction Engine
Any doc type · 94% day one confidence · 100K batch chunks · GPU OCR · Anomaly detection · Custom schema builder
AI nativeIntelligence engine
Reads every document and converts it into structured, queryable intelligence. Multi pass pipeline: OCR → classification → extraction → confidence scoring → anomaly detection → warehouse write. Different models optimized per document type.
Batch Engine
Process entire archives in 100K-document chunks. ZIP, Box, SharePoint, S3. Output to Excel, JSON, CSV, or Warehouse.
Any Document Type
Custom schema builder for proprietary types in hours, not months. Prebuilt schemas for legal, financial, compliance, healthcare, HR, and government.
GPU OCR
DocTR engine. CPU and GPU. 10x to 50x speedup on GPU. Selective OCR for maximum cost efficiency.
SE
Sentry Document Assurance. Trust, Certification, and Universal Search.
Connects every document system you operate · Zero document storage · GDPR/HIPAA/SEC by architecture · Find any document and every version in seconds
TrustCompliance
Deterministic mathematical fingerprinting. Zero document storage — only immutable fingerprints. Three types: Document Content, Document Data, Trusted Data Fingerprints (unique in market — fingerprint individual database rows, find every document referencing that entity).
Duplicate Elimination
40%+ industry average duplicate rate = 40% wasted AI spend. 30 to 50% LLM cost reduction immediately.
Cross System Search
Search SharePoint, OneDrive, Windows Share, email archives, ERP, FileStar simultaneously. No tags. No training.
PII Redaction
Auto-detects and redacts SSNs, financial IDs, tax IDs before fingerprint storage. GDPR data minimization by mathematics.
DW
AI.DI Document Warehouse — Structured Intelligence Layer
PostgreSQL · SQL/GraphQL/REST · Snowflake · Databricks · MCP · Vector embeddings · 6 query views
Live data layerBI connectors
Your documents become a living database. Every document Abstract.DI processes becomes structured rows in PostgreSQL. Every extracted field is queryable data. Every AI signal is persisted as a structured record. Your entire document estate finally answers questions instead of just sitting in folders.
6 Query Views
List · Library · Cube (pivot) · Time series · Schema · Scientist mode. Every dimension instantly explorable.
BI Connectors
Snowflake Data Share, Databricks, Tableau, Power BI, dbt, BigQuery, Python SDK. Zero ETL overhead.
Event Streaming
Webhook Manager fires events on every platform action. Real time pipeline triggers for any downstream system.
OA
AI Orchestration & Agent Gateway
LLM agnostic · MCP server · RAG foundation · OAuth2/OIDC · Zero hallucination
AI agentsRAG substrate
The AI layer that makes every enterprise LLM investment actually work. Not competing with LLMs — the prerequisite. Works with Copilot, GPT-4, Claude, Gemini, Llama, or any custom LLM. MCP server callable from Claude, Cursor, LangChain, AutoGen.
MCP Server
Available. Callable from any MCP compatible environment. No custom integration layer required.
LLM-Agnostic
Client chooses the AI model; AI.DI provides the trusted foundation. No vendor lock-in to any LLM.
RAG Foundation
Every chunk provenance tracked, every answer traceable to a specific certified document version. Zero hallucination.
FS
Millennia FileStar — Document Governance & Fabric
Founded 1996 · Trusted by enterprise clients · SSAE 18 certified · AI.DI integration pathway
Installed baseOn ramp
The governance engine powering the AI.DI platform. FileStar governs document lifecycle and syncs all metadata to the AI.DI Warehouse — turning every FileStar deployment into a warm on-ramp to the full AI.DI platform.
Document Fabric
Structured lifecycle governance for any document type. Configurable routing, approval chains, escalation paths.
AI.DI Integration
FileStar governs. The Warehouse stores. Sentry certifies. Abstract.DI extracts. All of it happens automatically on every FileStar document.
Installed Base
8–10 Phase 1 upgrade targets. 20+ year institutional trust that no competitor can replicate regardless of funding.
How You Adopt. Start Anywhere. Grow Into Everything.
DG alone
Day one structure. Day one discipline.
Replace the dumping ground with a real document home. Hierarchy that matches your organization, roles that govern access, and lifecycle rules that finally bring discipline to who can do what. Live in thirty days. No implementation project.
"For the first time, the documents that actually matter live in a structured, controlled, branded environment. No more chaos. No more guessing where things are."
DG+Abstract.DI
The intelligence upgrade.
Every document you upload becomes structured, queryable, searchable data the moment it lands. No manual tagging. No metadata projects. 94% accuracy on day one.
"Every document tells us what it says, who it is about, and when it expires. Without anyone lifting a finger."
DG+Sentry
The trust and compliance layer.
Keep your existing storage if you want. Sentry wraps it with continuous compliance monitoring, certification, and tamper detection. You finally know what is missing, what has expired, what has been altered, and what is real.
"We did not have to move a single file. The trust layer just appeared on top of everything we already had."
All six engines
The full Document Intelligence framework.
All six engines run together. The flywheel turns. Your documents become a continuously certified, structured, AI ready intelligence asset that gets more valuable every day you use it.
"This is no longer a document system. It is the most strategic data asset in our organization."
"Every legacy platform started as storage and is trying to add intelligence on top. AI.DI is the first platform built as intelligence from the very first line of code. That is the difference you feel from the very first document."
— The AI.DI design principle
Overview · Tab 02
The Document Intelligence Framework. Every Document That Matters, Treated Like It Matters.
Every document that crosses your desk carries value your business needs to access. The contract that defines the deal. The certificate that proves the insurance. The disclosure that keeps the regulator satisfied. The agreement that captures the terms your team negotiated for six months. Document Intelligence is the discipline of treating every one of those documents the way it deserves to be treated. Ingested with intent. Certified as real. Read and structured. Distributed where it needs to go. The intelligence inside becomes an asset your people, your systems, and your AI agents can use. Past, present, and future. This page defines the Framework. AI.DI is its first complete implementation.
The Definition
Category Definition

Document Intelligence is the discipline of treating every document your organization depends on as a certified, extracted, queryable, and routable asset. It replaces the thirty year old Store, Organize, Search model with a continuous loop of trust, structure, and distribution built for the way your AI agents actually work. The documents that run your business stop being closed files. They become open intelligence your people, your systems, and your AI can act on.

Certified
Known to be Real
Every document is cryptographically fingerprinted at ingest. Tampering is mathematically detectable. The document you pulled last week is the document you pull today. Trust is deterministic, not implied.
Extracted
Read by the Platform
Every field, date, party, obligation, and table cell is pulled from the document at day one accuracy, without training. The document becomes structured data the moment it arrives.
Queryable
Joined to the Enterprise
Every extracted field becomes a row in a structured warehouse. Joinable to ERP, HRIS, CRM, and BI. Accessible to AI agents via a native MCP server. A document estate becomes a queryable asset.
Routable
Moves with Intent
Secure transaction rooms. Signed distribution. Workflow across every system of record. Documents go where they need to, when they need to, under the governance they demand. The asset works, instead of sitting.
The Four Layers Every Document Intelligence Platform Must Ship
Evaluation Criteria

Any vendor that claims to deliver Document Intelligence must ship all four layers. Today, almost none do. IDP vendors ship Layer 1. Storage platforms ship a partial Layer 4. Nobody else ships Layers 2 and 3. AI.DI ships all four, standalone or integrated.

Layer
Capability
What It Delivers
AI.DI Engine
04
Orchestration
Secure routing · transaction rooms · system connectors
Documents move with governance across every system of record. The asset works instead of sitting.
Document Gateway
03
Warehouse
Extracted fields · Postgres · Snowflake zero copy · MCP
Every extracted field becomes a queryable row. Joinable to the enterprise. Accessible to agents.
AI.DI Warehouse
02
Certification
Deterministic fingerprinting · zero storage vault · blockchain anchoring
Every document is provably real. Tamper detection is deterministic, not probabilistic.
Sentry
01
Extraction
Classify · extract fields · confidence score · batch at scale
The document is read on arrival. Fields, dates, parties, obligations, and tables become structured data.
Abstract.DI
Standalone and Integrated. The Modular Adoption Principle.
Why Every Layer Has Standalone Value

Every layer of the Framework is a complete product in its own right. Enterprises rarely buy all four on day one. Every enterprise realizes it needs all four by year two. The compounding value of Document Intelligence only arrives when the four layers run as one. Extraction you can trust. Certification tied to structured data. Data accessible to agents. Routing that carries certified records to every system that needs them.

Modular Entry
Start With One Layer
Buy extraction only. Buy certification only. Buy the warehouse only. Buy routing only. Every engine ships ready to run as a standalone product with its own ROI. You do not need to commit to the full Framework on day one.
Compounded Value
Full Power at Integration
A certified extraction is stronger than an uncertified one. A queryable certified extraction is stronger than a siloed one. A routed queryable certified extraction is the actual enterprise asset. Each additional layer multiplies the value of the ones you already have.
Inevitable Expansion
The Four Layer Conclusion
Every enterprise that starts at Layer 1 asks for Layer 2 within twelve months. Every enterprise at Layer 2 asks for Layer 3 within eighteen. The modular entry is how customers arrive. The full Framework is where they end up.
AI.DI is the First Complete Implementation
The Framework exists independent of any vendor. It is a definition of the discipline, not a product. But as of today, AI.DI is the only platform that delivers every layer. Other vendors implement one, sometimes two. None implement all four. Evaluate any competitor on their own merits, then ask the same four questions of each.
Vendor Layer 1 · Extraction Layer 2 · Certification Layer 3 · Warehouse Layer 4 · Orchestration Complete Framework?
AI.DIAbstract.DISentryAI.DI WarehouseDocument GatewayYes, all four
AbbyyCore productNoneNoneNoneLayer 1 only
HyperscienceCore productNoneNoneNoneLayer 1 only
DatamaticsCore productNoneNoneNoneLayer 1 only
Iron MountainServices ledNoneNoneRecords managementPartial 1 and 4
BoxNoneNoneNoneStorage + workflowPartial 4 only
SharePoint / M365Copilot wrapperNoneGraph (partial)Teams integrationPartial 1, 3, 4
M-FilesAino (training)NoneNoneWorkflowPartial 1 and 4
EgnyteNoneNoneNoneHybrid storagePartial 4 only
Ten Questions to Evaluate a Document Intelligence Vendor
Any RFP against the Framework should include these ten questions. A vendor that cannot answer yes to at least eight is not delivering Document Intelligence. They are selling one layer and calling it the whole.
Question 01 ·Extraction
Can you extract fields from any document type on day one without supervised training? Anything less is a months long ML project, not a product.
Question 02 · Certification
Do you cryptographically certify every document at ingest with a deterministic fingerprint? Without this, the extraction is a claim — not a fact.
Question 03 · Tamper Detection
Can you detect a single character change after ingest, mathematically? Probabilistic tamper detection is not detection.
Question 04 · Queryability
Is every extracted field queryable as a row in a relational database? CSV and JSON exports are not a warehouse.
Question 05 · Data Joinability
Can the extracted data be joined to Snowflake or equivalent without ETL? Zero copy share is the bar. Everything else is a new pipeline.
Question 06 · Agent Access
Do AI agents access your intelligence via a native MCP server? If not, the AI stack has to build a wrapper around your product.
Question 07 · Routing
Does the extracted document route into my ERP, HRIS, and DMS natively? Partner ecosystem integration is not native routing.
Question 08 · Audit Trail
Is your audit trail infrastructure level immutable, not application level? If admins can rewrite it, it is not an audit trail.
Question 09 · Operations
Do I need an in house ML team to operate the platform? If yes, this is a framework masquerading as a product.
Question 10 · Deployment
Is production deployment measured in weeks, not months? A 6 month implementation is a category smell, not a feature.
The Analyst Categories Have Not Caught Up
Gartner IDP Magic Quadrant

Ranks extraction vendors. Covers Layer 1 only. The MQ is useful for extraction RFPs and Abstract.DI belongs on that shortlist. It does not rank certification, queryability, or routing, so it cannot evaluate a Document Intelligence Framework.

Gartner Content Services Platforms MQ

Ranks storage and collaboration platforms. Covers Layer 4 partially and nothing else. No CSP vendor ships Layers 1, 2, or 3 as a native capability. Microsoft 365 has the closest partial pattern. Copilot touches Layer 1 and Graph touches Layer 3. Neither is a complete Document Intelligence engine.

The Missing Magic Quadrant

There is no Gartner, Forrester, or IDC ranking for the complete Document Intelligence Framework, because the category has not yet been formally drawn by the analysts. AI.DI is the first platform built to deliver it. If you are evaluating document platforms today, the ten questions above give you a cleaner evaluation than any analyst category currently can. The ones that eventually arrive will rank the same four layers this page defines.

"The documents that matter to your business do not belong in a folder with every draft and every revision from the last decade. They deserve to be read, certified, and made available to the people and the systems that run your organization. That is what AI.DI does. For every document that matters. From the day you sign it forward, and for every one already buried behind you."
— The AI.DI operating principle
Overview · Tab 03
AI Intelligence. Built Ground Up. Built for AI. Built by AI.
Every other document platform bolted AI on top of a data model designed before anyone had heard of an LLM. AI.DI started over. It was architected in the AI era, with AI as the primary actor inside the platform from the very first line of code. 27 AI engines running today. 30 self improving ML models. An MCP server that plugs into any LLM you operate. This is not a roadmap. This is what is running for customers right now.
What You Get That Nobody Else Can Offer

Other vendors talk about AI on slides. AI.DI shipped it. Right now you can use 200+ React and TypeScript components, 29 live serverless edge functions, an ML Learning Studio with 30 self improving engines, a production MCP server, an AI Agent Gateway connecting to Claude, Copilot, GPT, and Gemini, and the AI.DI Studio running 27 active AI engines at the same time. This is the framework you have been waiting for. It exists. It is running. It is yours from day one.

The AI.DI Studio. 27 Active AI Engines.
documentgateway.ai
AI.DI Studio — Real Time Intelligence Infrastructure
Click to enlarge
AI
AI.DI Studio. The Intelligence Operating Room.
AI Intelligence · 27 Active Engines · 5 Capability Domains
Step inside the engine room of Document Intelligence. Twenty seven AI engines run together, in real time, across five capability domains. AI Core classifies, extracts, scores confidence, and feeds the self improvement loop that makes the platform smarter every day. Intelligence handles deep comprehension, cross document validation, expiry detection, and routes every new document to the right schema automatically. Process runs OCR, obligation extraction, approval routing, distribution rules, and workflow management with no human in the middle. Trust and Security enforces immutability through blockchain anchoring, continuous tamper detection, and database level access control. Data and Integration keep the warehouse, API gateway, registry, and storage optimization humming at any scale. Every node shows live metrics. Documents processed. Classifications made. Conflicts detected. Connections active. This is what an AI native Document Intelligence framework actually looks like in production.
HITL Reduction. The Self Improving Loop.
The Self Improvement Loop, Working for You

The HITL Reduction engine watches every other AI engine in the platform and quietly promotes classifications to auto approve as confidence climbs above your configured thresholds. Standard document types trend toward zero human review at twelve months. Edge cases and novel documents still surface to a human, because the goal is not zero humans. The goal is the right humans reviewing only the documents that actually need them.

Legacy Platform HITL at 12 Months
Standard contracts65%
Insurance certificates55%
Financial statements70%
Fixed models. No learning from your data. The same cost and the same error rate at month twelve as on day one.
AI.DI HITL at 12 Months
Standard contracts8%
Insurance certificates5%
Financial statements12%
Continuous learning from your real production data. Every correction retrains the model automatically. No ML engineers required.
Blockchain Engine. Immutable Audit Trail.
documentgateway.ai
AI.DI Studio — Blockchain Engine · On-Chain Document Integrity
Click to enlarge
AISENTRY
Blockchain Engine. Cryptographic Proof for Every Document.
AI Intelligence · Trust Engine · Ethereum / Hedera / Polygon
Every document that matters can be anchored on chain. Thousands already have been. Each one generates a Merkle tree hash committed to Ethereum, Hedera, or Polygon, creating immutable mathematical proof that the content has not changed since a specific moment in time. The engine runs 100% automated, because anchoring is deterministic. If the fingerprint matches, it anchors. No human judgment required. The Pages Powered By This Engine panel shows exactly where these certifications surface in the platform. Document Vault. Asset Vault. Verification Portal. This is more than a compliance checkbox. It is the infrastructure that lets you hand any document to a regulator, counterparty, or auditor with cryptographic certainty that what they see is what was signed. No other document platform ships this in the box.
Integration Studio. Connect Any AI Agent.
documentgateway.ai
Integration Studio — Live AI Agent Gateway
Click to enlarge
AIORCHESTRATION
Integration Studio. The Live AI Agent Gateway.
AI Orchestration · MCP Server + 3 Connected AI Systems
This is the moment enterprise AI deployment finally becomes real. Three AI systems are already connected. Claude. ChatGPT. FileStar. Each one has read only, row level secured access to your entire certified document corpus through six production tools. The MCP Server URL is a published endpoint. Connect Claude, Cursor, LangChain, or any MCP compatible environment and the AI instantly gains the ability to search certified documents, check compliance status, retrieve obligations, query the warehouse, navigate the hierarchy, and pull signed document links. Keys are tenant scoped and revocable in one click. Microsoft Copilot, Gemini, and Grok are listed and ready to plug in. Your entire AI vendor portfolio finally answers questions from the same trusted Document Intelligence foundation. This is the infrastructure that makes every LLM investment your organization has made actually work.
Integration Ecosystem. 28 Connectors.
documentgateway.ai
Integration Studio — 28 Connectors Across Every Enterprise System
Click to enlarge
AIORCHESTRATION
Twenty Eight Connectors. Every System You Already Run.
AI Orchestration · Full Connector Ecosystem
"We already use that" is no longer an objection. AI.DI connects to all of them at the same time. Claude and ChatGPT are integrated today. Copilot, Gemini, and Grok are ready to plug in. Your ERPs push operational data, financial reports, and contracts directly into the ingestion pipeline on a schedule or by event. No manual exports. No batch jobs. SharePoint, Google Drive, Box, OneDrive, and Dropbox connect as source systems, so AI.DI reads, certifies, and extracts from where your documents already live without you moving a single file. Your CRM pushes agreements and correspondence as structured records. Your data warehouses (Snowflake, Databricks, BigQuery, Redshift) receive extracted intelligence on the schedule you choose. Every connector is configured through a guided AI wizard. No IT project. No professional services. No custom code. AI.DI fits into your stack as it actually exists today.
Document IQ. Conversational AI Over Your Certified Corpus.
documentgateway.ai
Document IQ — AI Powered Document Intelligence Assistant
Click to enlarge
AIORCHESTRATION
Document IQ. The AI That Actually Knows Your Documents.
AI Orchestration · Conversational AI · Portfolio-Wide Access
This is what AI feels like when it has trusted data underneath it instead of raw PDFs. Ask "what is missing from the vault" and every gap across every asset surfaces in seconds. Ask "show critical risk items" and every violation flag and every expiry warning across the portfolio comes back ranked and ready to act on. Ask "what is expiring in the next thirty days" and you get a precise structured answer, not a keyword guess. Upload any file and Document IQ cross references it against your live vault in real time. Drop in a financial extract and it tells you which records are missing, which figures do not reconcile, and which documents need attention. This is not a chatbot bolted onto a document system. This is conversational AI with a real Document Intelligence framework underneath it.
ML Learning Studio. 30 Engines. 6 Tiers.
The Self Improvement Architecture

Every legacy document system runs on fixed classification models that require expensive, time consuming retraining projects. The AI.DI ML Learning Studio is the opposite. Thirty engines improve continuously from your own production data, automatically, with zero engineering intervention required. AI.DI gets cheaper and more accurate the longer you use it. The longer you wait, the bigger the lead becomes.

TierFocusExample EnginesHITL Trajectory
Tier 1 — FoundationDocument type classificationEnterprise Type Classifier, PE Type Classifier, Legal Type ClassifierNear-zero for covered types
Tier 2 — EntityNamed entity extractionParty Extractor, Property Identifier, Fund/Entity Linker5–15% at 6 months
Tier 3 — Date & ValidityTemporal signal extractionExpiration Detector, Effective Date Parser, Renewal ClassifierNear-zero for standard formats
Tier 4 — FinancialFinancial data extractionLoan Terms Extractor, Critical Data Extract Parser, Appraisal Value Extractor10–20% at 6 months
Tier 5 · ComplianceCompliance validationCoverage Gap Detector, Compliance Flag Engine, Signature Validator15 to 25%. Domain expertise retained.
Tier 6 · Cross-DocumentCross-document consistencyPortfolio Benchmark Engine, Anomaly Correlator, Reconciliation EngineComplex analysis. Strategic HITL.
"We didn't build a document platform and add AI. We built an AI platform that happens to manage documents. The difference is not semantic. It is architectural. And architecture determines destiny."
— AI.DI platform design principle
Overview · Tab 04
The Honest Comparison. Why the Difference Matters to You.
AI.DI does not pretend to win on every line. Storage vendors are very good at storage. What you need to know is what the other platforms can never give you, no matter how much they spend on AI. A clean data model built for structured extraction. A code base without twenty years of compatibility baggage. An AI engine designed into the product from the first line of code. A framework that ties trust, extraction, query, and distribution together as one platform. Those differences show up in everything you experience as a customer, from day one to year ten.
What This Means for the Platforms You Already Run

Every platform in this matrix was built to hold documents, not to understand them. Box, SharePoint, M-Files, and Egnyte all share the same constraint. Their data model was drawn before anyone had heard of an LLM, and they cannot rewrite it without breaking every customer already running on the old model. Every dollar they spend on AI is spent on top of that constraint. Every dollar you spend with AI.DI is spent free of it. That is why the gap between what a document platform can do and what your business actually needs keeps widening.

Full Capability Matrix
Capability AI.DI Platform BoxSharePointM-FilesEgnyte
Architecture & Philosophy
AI native architecture (built for AI, not adapted)Win 2024-2025. Zero compromise. AI is core, not a wrapper.Bolt-onCopilot wrapperAino, improving but bolted onMinimal
Zero legacy technical debtWin No codebase older than 18 months.2005 origin2001 origin2003 origin2009 origin
Edge compute architectureWin All compute at edge. Scale to zero or infinity.NoneAzure Functions (partial)NoneNone
Modular adoption (standalone or full suite)Win Every engine has standalone value.PartialModule-based but complexPartialPartial
AI & Document Intelligence
Structured data extraction from documentsWin Abstract.DI. Any type, 94% day one, 100K batch.NoneBasic Copilot extractionAino, requires trainingNone
Day one extraction accuracy (no training)Win 94%+ on prebuilt schemas. No training required.N/AN/AMonths of trainingN/A
GPU accelerated OCR pipelineWin DocTR. 10 to 50x speedup on GPU.NoneAzure OCR (limited)Basic OCRBasic OCR
Batch processing (100K+ archives)Win 100K-chunk batch. ZIP, Box, SharePoint, S3.NoneNoneLimited batchNone
30 self improving ML enginesWin Continuous production learning. No ML engineers.NoneGeneric CopilotLimited self-learningNone
HITL Reduction AI (autonomous meta-engine)Win Autonomous promotion of high-confidence classifications.NoneNoneNoneNone
Trust, Compliance & Security
Document fingerprinting (deterministic mathematical proof)Win Thousands of prebuilt fingerprints. Zero document storage required.NoneNoneNoneNone
Zero document storage compliance modelWin Only fingerprints stored. GDPR minimization by math.Full storageFull storageFull storageFull storage
PII auto detection and redaction pipelineWin Tokenization pipeline auto redacts at ingestion.NonePurview (partial)NoneDLP (partial)
Fraud / document manipulation detectionWin Deterministic. Single character change detectable.NoneNoneNoneNone
Blockchain audit trailWin On chain anchoring. 2,814+ documents on chain.NoneNoneNoneNone
Data & AI Infrastructure
Structured document intelligence warehouseWin Every extracted field is a queryable row. Unique.NoneNoneNoneNone
Snowflake Data Share (zero ETL)Win Zero-copy. Join doc intelligence with financial data.NoneNoneNoneNone
MCP server for AI agentsWin Production MCP. Claude, Cursor, LangChain. No wrapper.NoneNoneNoneNone
Vector embeddings on certified chunksWin Tied to certified versions. pg_vector native.NoneAzure AI Search (partial)NoneNone
CTR Score (Continuous Transaction Readiness)Win Live composite readiness score. Portfolio-wide.NoneNoneNoneNone
27 active AI engines in productionWin AI.DI Studio. Live engine map with real time status.NoneNoneNoneNone
Deployment & Integration
Unlimited hierarchy depth (any org structure)Win Enterprise → Group → Entity → Asset → Unit. Any depth.Folders onlySites/subsitesMetadata basedFolders/workspaces
30 day deployment (no implementation project)Win 30 days from contract to live. M-Files runs 3–6 months.Weeks–monthsMonths–years3–6 months typicalWeeks–months
Installed base / existing trust relationshipsWin 45 FileStar enterprise clients. 20+ year relationships. Zero CAC.Large (hard to access)Large (bundled)Existing clientsExisting clients
How AI.DI Compares to the Platforms You Already Run
Compared to Box
"Box stores it. AI.DI understands it."
Box is a fine place to store files. It will never tell you what is in them. Drop a folder of contracts into AI.DI and a structured workbook of fields, dates, parties, and obligations comes back the same day. You do not have to leave Box. You finally have to stop guessing what is inside it.
Compared to SharePoint
"Keep SharePoint. Add the intelligence layer."
SharePoint runs collaboration. AI.DI runs the intelligence layer on top of it. Extraction, certification, readiness scoring, and AI agent access are added without changing anything underneath. You start with zero disruption and immediate intelligence.
Compared to M-Files
"Day one intelligence, not month six."
M-Files Aino is a months long training project. Abstract.DI ships prebuilt schemas and delivers 90%+ confidence the first time you upload. You bring the documents. AI.DI brings the intelligence on day one.
Compared to Egnyte
"Egnyte knows where the files are. AI.DI knows what they say."
Egnyte is solid hybrid storage. It has no awareness of what is inside any document. AI.DI plugs directly into Egnyte as an intelligence overlay. The chaos slowly turns into a structured, certified, queryable Document Intelligence asset on top of the storage you already trust.
"The question has never been which storage platform to use. The question is whether the documents that actually run your business get the attention they deserve. AI.DI is the first platform built to give them that attention. Automatically. Continuously. At any scale."
— The AI.DI customer promise
Overview · Tab 05
Beyond IDP. Extraction Is Only the First Layer.
Intelligent Document Processing is the fastest growing segment of enterprise AI. A 30 to 40 percent CAGR market racing toward 10 billion dollars by 2030. If you are a CFO or a CIO, your organization is almost certainly evaluating an IDP vendor right now. What your team may not realize is that IDP solves only the first of the four layers the Framework defines. Extraction is the floor. Certification, query, and distribution are the three layers every business eventually needs and almost no IDP vendor ships. AI.DI delivers all four. One platform. One contract. One implementation.
The IDP Market in One Slide
Market Size
$10B+ by 2030
IDP reached roughly $2B in 2024 and is compounding at a 30 to 40 percent CAGR across every major analyst report including Grand View Research and Research and Markets. Every enterprise is buying document AI. The question is what they actually get for the spend.
What Buyers Get Today
Fields in a Spreadsheet
Traditional IDP delivers extracted data. It does not certify the document. It does not make the fields queryable at scale. It does not route the document anywhere. That is three missing layers between extraction and business outcome.
Where AI.DI Lands
IDP + Three Missing Layers
Abstract.DI competes head to head inside the IDP market. Sentry adds deterministic certification. The AI.DI Warehouse adds queryability. Document Gateway adds routing. One platform, not four vendors. One contract, not four integrations.
The Four Layer Stack. IDP Owns Layer 1.
What the IDP Leaders Actually Ship

Abbyy, Hyperscience, Datamatics, Iron Mountain and the rest of the IDP Magic Quadrant do one thing: they extract fields from documents. Some do it well. None of them ship a trust layer. None of them ship a warehouse layer. None of them ship a routing layer. If you need those — and every enterprise eventually does — you buy three more vendors and wire them together yourself.

Layer
Capability
Traditional IDP
AI.DI Platform
04
Orchestration & Routing
Transaction rooms · secure distribution · 28+ system connectors
Not provided
Document Gateway
03
Queryable Structured Warehouse
Every extracted field a row · Postgres · Snowflake zero copy · MCP for agents
CSV / JSON export
AI.DI Warehouse
02
Certification & Provenance
Deterministic fingerprinting · zero storage vaulting · fraud detection · blockchain anchoring
Not provided
Sentry
01
Extraction (OCR + IDP)
Classify · extract fields · confidence score · batch process at scale
Their whole product
Abstract.DI
Head to Head · AI.DI vs. the IDP Leaders
Capability AI.DI Platform Abbyy Hyperscience Datamatics Iron Mountain IDP Egnyte
Layer 1 · Extraction (IDP Baseline)
Day one accuracy without trainingWin 94%+ on prebuilt schemas.Document skills library, tuning requiredSupervised ML training requiredTemplate + ML trainingProfessional services projectNot an IDP product
Any document typeWin Ground up on prebuilt schemas.Win Mature catalogWin SupervisedWin Template basedStructured / semi structuredNone
GPU accelerated OCR (10–50x)Win DocTR native · edge compute.CPU pipelineCPU pipelineCPU pipelineVaries by engagementBasic OCR
Batch scale (100K+ archives)Win 100K chunk batch native.Win Enterprise scaleWin Volume provenPartner deliveredService engagementNone
Time to first extractionWin 30 days from contract to live.3–6 months3–9 months training2–4 monthsService projectN/A
Layer 2 · Certification & Provenance (The Missing Trust Layer)
Deterministic document fingerprintingWin Sentry · mathematical proof.NoneNoneNoneNoneNone
Row level Trusted Data FingerprintWin Unique in market.NoneNoneNoneNoneNone
Zero storage vaulting / GDPR minimizationWin Only fingerprints stored.Full storageFull storageFull storageFull storageFull storage
Fraud / tamper detectionWin Single character change detected.NoneNoneNoneNoneNone
Blockchain anchored audit trailWin 2,814+ docs on chain.NoneNoneNoneNoneNone
Layer 3 · Queryable Intelligence Warehouse (The Missing Data Layer)
Extracted fields queryable in PostgresWin Every field is a row.CSV / JSON exportCSV / JSON exportCSV / JSON exportCSV / JSON exportNone
Snowflake zero copy data shareWin Zero ETL join with financial data.NoneNoneNoneNoneNone
BI connectors (Tableau, Power BI, Databricks, dbt)Win Native via MCP.Via exportVia exportVia exportVia exportNone
MCP server for AI agentsWin Production MCP.NoneNoneNoneNoneNone
Layer 4 · Orchestration & Routing (The Missing Distribution Layer)
Transaction rooms / secure distributionWin Native.NoneNoneNoneRecords management onlyHybrid storage
Multi system connectors (SharePoint, Box, ERP)Win 28+ connectors.Partner ecosystemPartner ecosystemPartner ecosystemWin Services integrationWin Storage integrations
Continuous readiness scoring (CTR)Win Portfolio wide live score.NoneNoneNoneNoneNone
Strategic Fit
CategoryDocument Intelligence PlatformIDP LeaderIDP LeaderIDP + BPM ServicesRecords + IDP ServicesHybrid Storage
Core go to marketCertified, queryable document assetEnterprise extractionML led high volume extractionManaged services + toolingServices led IDP + physical recordsContent collaboration
In house ML team requiredNo Day one prebuilt.OftenYesPartner ledPartner ledN/A
Where AI.DI Sits in the Gartner Magic Quadrant
Abstract.DI plays inside the IDP Magic Quadrant

Abstract.DI competes head to head with Gartner MQ Leaders like Abbyy, Hyperscience, and the top IDP challengers. On day one accuracy without training, GPU accelerated OCR, and batch scale, Abstract.DI delivers Leaders class execution. On vision, prebuilt 94 percent schemas on any document type redefine what day one IDP should look like. If the only box you need filled is the IDP box, Abstract.DI belongs on your shortlist.

The Full AI.DI Platform Exceeds the IDP MQ

The Magic Quadrant ranks extraction. It does not rank certification, queryability, or routing, because no existing IDP vendor delivers those layers. AI.DI delivers every one of them as one platform. The IDP MQ cannot rank what the category has not yet drawn. Enterprises buying from today's MQ will still need three more vendors by next year. Enterprises buying AI.DI already have all four.

What This Means for a CIO or CFO
If you need only extraction
Any MQ Leader works
Abbyy and Hyperscience are proven at scale. Abstract.DI is price competitive and deploys in 30 days against their 3 to 9 month timelines. On a pure extraction RFP, Abstract.DI belongs on every shortlist.
If you feed AI with documents
IDP cannot certify the input
Sentry is the only deterministic fingerprinting layer on the market. Uncertified extraction plus an LLM equals faster hallucination. Any enterprise running agents or copilots against its documents needs Layer 2 before the AI output is trustworthy.
If documents drive business outcomes
You need Layers 2, 3, and 4
IDP tools export fields. AI.DI makes every field a row in your warehouse, joinable in Snowflake, queryable by Power BI and agents alike, and routed through Document Gateway. A queryable, certified, routed document asset is what the next decade of enterprise looks like.
"Extraction on its own is not enough. A document you cannot trust is not intelligence. A field you cannot query is not data. A record you cannot route is not an asset. AI.DI is built so every document you depend on becomes all four of those things automatically. The extraction you need. The certification that proves it. The query that delivers it. The distribution that puts it to work."
— The AI.DI four layer promise
Sources & References
Products · Tab 07
Document Gateway. Document Intelligence Infrastructure.
Document Gateway is not a document management system. It is not a data room tool. It is the infrastructure layer that finally makes the documents your organization holds work for you. Every document that enters gets read, understood, certified, governed, and acted upon. Automatically. Two hundred React and TypeScript components. Twenty nine live serverless edge functions. Six engines running together. One framework that finally replaces the Box, SharePoint, and Egnyte dumping ground with a real Document Intelligence backbone for your organization.
200+
Production Components
29
Live Edge Functions
30 days
Average Deployment
5 tier
Org Hierarchy
30
Self Improving ML Engines
Live
Standing Distribution Rules
The Problem Document Gateway Solves

Every transaction ready organization has the same invisible problem. Thousands of critical documents that no one has truly read. Leases. Loan agreements. Operating agreements. Amendments. Certificates. They sit in folders, filed correctly, but essentially opaque. When a lender asks "is the lease fully executed," someone has to open it and look. When a deal team needs all expiring leases for a portfolio review, someone spends a weekend building a spreadsheet. When a counterparty receives a document package, they get files, not understanding. Document Gateway changes this. We built a platform where every document that enters your organization gets read, understood, and acted upon, automatically. Compliance monitors itself. Distributions assemble themselves. Counterparties receive context, not just files. The scramble is over.

The Capability Nobody Else Has

Most platforms that touch documents store them. A few extract surface metadata. None of them connect extracted intelligence to automated distribution and continuous compliance at this depth. The combination of amendment chain recognition, expiry date extraction from real document content, execution status tracking across the portfolio, recipient facing intelligence, and rules driven autonomous distribution is not available anywhere else. It is the difference between a document repository and a Document Intelligence framework. A system where every document that enters is understood, and that understanding drives everything downstream.

How Document Gateway Works. Define. Collect. Distribute.
Three steps. One framework. The chaos of document scramble is replaced by a continuous, intelligent process that just works in the background of your organization.
Step 01 · The Steward
Define What Matters.

The Steward defines the documents that matter. What is required. Who must submit. When it is due. What rules apply.

  • Required document types per entity
  • Submitters and submission deadlines
  • Compliance obligations and CTR weights
  • Branded portal experience

Packaged as a Submission Package

Step 02 · The Submitter
Collect Continuously.

Submitters open one branded portal designed for them. Drag and drop. Nothing more. The framework handles everything else, automatically and intelligently.

  • One portal link, no account needed
  • Drag and drop interface
  • Abstract.DI extracts, indexes, and names every file
  • CTR Score updates live as documents arrive

CTR Score · Always Live

Step 03 · The Counterparty
Distribute With Total Control.

Open a Transaction Room. Share a link. The document never leaves your vault. The counterparty sees the live, certified, current version. Always.

  • Shared link, not a copied file
  • Time limited access, fully tracked
  • View only, in place, always current
  • Engagement signals captured for every interaction

Link, Not a Copy

"When a transaction surfaces, the documents are already there. Because you never stopped collecting them."
Continuous Transaction Readiness
The Three Parties. Each Sees Only Their World.
Document Gateway is built around three roles, each with a purpose built experience. The Steward runs the platform. The Submitter delivers documents through one frictionless portal. The Counterparty reviews through a controlled, time limited Transaction Room. The framework holds everything else.
Party 01 · Steward
The Nerve Center.

The Steward owns every document requirement, every submitter assignment, every distribution decision. Full visibility. Full control. Full intelligence at the fingertips.

  • Owns every document requirement
  • Assigns and monitors submissions
  • Controls who views and for how long
  • Sees CTR Score live, organization wide

Access · Full Platform

Party 02 · Submitter
The Document Supplier.

The Submitter receives a single purpose portal link. They see exactly what is required of them. Nothing more. No account. No training. No friction.

  • Receives a single purpose portal link
  • Sees only what is required of them
  • No account, no training, no friction
  • Branded to your organization, not ours

Access · Portal Only

Party 03 · Counterparty
The Deal Reviewer.

The Counterparty receives a link, not a file. They view documents in place, always current, always certified. Their access is time limited and every interaction is tracked.

  • Receives a link, not a copied file
  • Views in place, always current
  • Access is time limited and tracked
  • Every page view becomes signal

Access · Transaction Room

"Each party sees exactly their world. And only their world. The platform holds everything else."
Document Gateway Role Architecture
The Document Journey. From Requirement to Transaction Room.
One platform. One vault. Always live. Every document follows the same intelligent journey from the moment it is required to the moment it is reviewed by a counterparty. Nothing falls through the cracks. Nothing is sent. Everything is governed.
Stage 01 · Define
Submission Package
The Steward defines what documents are required. Twelve doc types. Three submitters. Four hierarchy nodes. Configured once. Active forever.
Stage 02 · Collect
Submission Portal
Submitters arrive at one branded portal. Drag. Drop. Done. No account needed. The framework handles everything from here.
Stage 03 · Process
AbstractIQ Engine
Every document is read, classified, indexed, named, and certified the moment it arrives. Fourteen structured fields per document at 94% confidence.
Stage 04 · Store
Asset Vault
A single source of truth for every document that matters. Documents never leave the vault. CTR Score updates automatically and stays live forever.
Stage 05 · Distribute
Transaction Room
Counterparties receive a link, not a file. View only. Time limited. Fully tracked. Control never leaves you. Documents never leave the vault.
Define Once. Collect Continuously.
The Steward sets requirements in a Submission Package one time. Submitters fulfill them on an ongoing basis. The vault stays current without anyone chasing documents. Document discipline becomes the default state of the organization.
AI Works Between Upload and Access.
Abstract.DI extracts, names, certifies, and indexes every document the moment it arrives. No manual tagging. No data entry. Fourteen structured fields per document at 94% confidence. Documents become structured intelligence the second they land.
Access Is Granted, Never Sent.
Counterparties view documents in place through a time limited link. Files stay in the vault. Always current. Always under your control. The CTR Score reflects reality at all times. Distribution stops being a risk surface and becomes an intelligence surface.
"Documents do not move. What moves is access. And only the access you choose to grant."
The Document Journey
Check-In Studio. AI Powered Document Ingestion.
documentgateway.ai
Check-In Studio — Intelligent Document Intake
Click to enlarge
GATEWAYAI
Check-In Studio — Intelligent Document Intake
Document Gateway · The AI Intake Engine
Every file dropped here enters a multi-stage AI pipeline that runs entirely without human instruction. AbstractIQ classifies the document by type, extracts key fields, scores confidence, checks for duplicates, detects anomalies, and routes to the correct steward queue — all before a human sees it. Required documents are surfaced as named cards organized by packet template, so a steward's view is not a list of files but a structured set of obligations: what's needed, what's fulfilled, what's outstanding, and what was rejected with AI identified reasons. The HITL Reduction AI continuously monitors which document types consistently reach auto certify confidence and promotes them to bypass human review entirely. As your document corpus grows, the percentage of documents requiring human attention trends toward zero for standard types. This is not document management — it is an autonomous compliance engine that happens to accept file uploads.
Check-In Engine. AI Thresholds and Real Time Performance.
documentgateway.ai
Check-In Engine Settings — Configurable AI Per Tenant
Click to enlarge
GATEWAYAI
Check-In Engine Settings — Configurable AI Per Tenant
Document Gateway · Per-Tenant ML Configuration
The thresholds in this panel decide exactly where human judgment enters the pipeline. And where the platform operates without it. The auto certify threshold means the majority of standard documents never reach a reviewer queue. They arrive, get classified, get extracted, get certified, and land in the vault without human contact. Documents in the middle band route to a reviewer with uncertain fields flagged and source passages highlighted. A reviewer corrects one field, not the whole document. Below the lower threshold triggers automatic rejection with a clear, AI generated explanation of which extraction criteria fell short and why. The OCR engine is fully selectable between AI native and open source options without any change to the rest of the pipeline. The Performance panel shows live metrics. Auto classify rate. Average confidence. Reviewer corrections in the last thirty days. Every correction is permanently written as labeled training data against your live corpus. No ML engineer is required. The model improves from your real production use, continuously, without a retraining project.
Check-In API and Webhook Integration.
documentgateway.ai
Check-In API — Full Programmatic Document Ingest
Click to enlarge
GATEWAYDEVELOPERS
Check-In API — Full Programmatic Document Ingest
Document Gateway · Developer Interface
The same AI pipeline that powers the visual Check-In Studio is fully accessible through a clean REST API. Any internal system, any existing workflow, any document management tool can push files directly into AI.DI without a user interface. Submit a single file or a ZIP bundle of up to ten thousand documents in one call. The response returns a job ID instantly. The full pipeline runs asynchronously and fires webhook events at every stage. Classified. Extracted. Named. Certified. Review Required. Rejected. Each event carries the full payload. Document type. Confidence score. Extracted fields. Routing decision. Final document name. Anomaly flags. Your existing systems get notified in real time the moment a document reaches any status. AI.DI is not just a beautiful UI. It is a Document Intelligence API that happens to have a beautiful UI.
Distribution Studio. The Operations View.
documentgateway.ai
Distribution Studio — The Unified Distribution Hub
Click to enlarge
GATEWAY
Distribution Studio — The Unified Distribution Hub
Document Gateway · Transaction Rooms · Packages · Share Links
Distributed documents are the highest-risk surface in any organization — they leave your control the moment they are sent, and most platforms give you no visibility after that. Distribution Studio makes that surface observable, auditable, and permanently traceable. Every active Transaction Room shows CTR Score progress against the required document set, expiry countdown on time-sensitive items, counterparty engagement data by document, and phase completion in a single view. Document Packages show who received which version, when they opened it, and which sections they accessed. Standing Distributions show which recipients are on automatic schedules and what they last received. Share Links show whether the recipient clicked, when, and from which device. Every distribution event is timestamped, version-locked, recipient-specific, and logged permanently — producing, and per-recipient download permissions. Every distribution event is immutable — timestamped, version-locked, recipient-specific, and fully auditable. The engagement data flowing from these rooms tells you more about your counterparty's interest level than any conversation: which documents they spent the most time on, which sections they returned to repeatedly, and which they never opened.
Distribution Studio. Builder and Templates.
documentgateway.ai
Distribution Builder — Three Wizard Modes
Click to enlarge
GATEWAY
Distribution Builder — Three Wizard Modes
Document Gateway · Distribution Wizard
Configuring a document distribution incorrectly — wrong NDA gate, wrong counterparty visibility, wrong expiry date, wrong access scope — is a compliance event, not an inconvenience. Distribution Builder eliminates misconfiguration risk by making the setup structural rather than manual. Selecting Transaction Room launches a 7-step guided workflow that auto-configures deal type, counterparty hierarchy, phase-based document structure, section level access matrix, NDA gate behavior, QA threading, and CTR Score gap alerts based on a single selection. Selecting Document Package configures bundling, per-recipient watermarking, and custom cover letter generation in 3 steps. Selecting Share Documents produces a tracked, expiring link in 2 steps. The platform applies the correct configuration for each distribution type — you choose the context, it builds the controls. The output is not just a sent document. It is a governed, auditable distribution event with full recipient behavioral tracking from the moment it opens. No configuration guesswork. No asking what settings to use. The platform knows.
documentgateway.ai
Distribution Analytics — Counterparty Intelligence
Click to enlarge
GATEWAYVALUE
Distribution Analytics — Counterparty Intelligence
Document Gateway · Deal Analytics
Counterparty intent has always been invisible — you send documents and wait for a phone call. Distribution Analytics ends that. Room engagement shows exactly how long each recipient spent on each document, which sections they returned to, and which they skipped entirely. A counterparty who spends 47 minutes on the indemnification schedule and ignores the financial statements is communicating something specific before any conversation happens. Phase completion rates surface where transactions stall across all active rooms simultaneously — giving teams an objective signal on process friction that no CRM captures. The document access heatmap shows which document types generate the most engagement per deal type, informing which materials to lead with in future transactions. When interest is concentrated in a document you expected to be routine, you know before you get on the call. This is behavioral intelligence over your entire distribution history — continuously updated, never requiring manual compilation.
Distribution Studio. Four Workspaces. One Distribution Framework.
Every distribution your organization runs lives in one place. Launch any kind of secure exchange in one click. See objective counterparty engagement the moment a recipient opens a document. Build from prebuilt templates that match how your industry actually works. And govern every aspect of who gets what, for how long, with what controls. The chaos of email attachments and ad hoc shares is finally over.
trusteddocs.ai
Distribution Studio launch
Click to enlarge
GATEWAY
Launch Any Distribution. Three Powerful Modes.
Document Gateway · Distribution Studio · Studio View
One workspace. Three distribution modes. Launch a Transaction Room for the full data room experience with phased access and counterparty analytics. Build a Document Package as a curated, branded bundle for a specific recipient. Send a Share Document link as a tracked, expiring link in two clicks. Every active distribution shows live status, recipient count, and engagement signal at a glance. Templates beneath the picker show your most used configurations ready to relaunch. Document distribution stops being an email scramble and becomes a governed organizational capability.
trusteddocs.ai
Distribution Studio analytics
Click to enlarge
GATEWAYVALUE
Analytics. See What Matters Before They Tell You.
Document Gateway · Distribution Studio · Analytics
Distribution data becomes deal intelligence. The twelve month engagement timeline shows recipient activity across every distribution your team has ever run. Unique recipients. Documents accessed. Most active room. Top engagement metrics. Recipient distribution by organization. Live activity feed of who is opening what right now. The signal that used to require an awkward "did you have any questions" email finally lives in the platform automatically. You see counterparty interest the moment it appears.
trusteddocs.ai
Distribution Studio templates
Click to enlarge
GATEWAY
Templates. Your Best Distributions, Ready to Relaunch.
Document Gateway · Distribution Studio · Templates
Every distribution your team runs becomes a launchable template. CRE refinancing. M&A and PE due diligence. Asset sale and disposition. LP investor reporting. JV and co investment. Quarterly financial packages. Lender compliance packages. Custom investor packages. Templates carry document type rules, recipient categories, access controls, NDA gates, and watermarking settings. The next time the same kind of deal comes up, the configuration is one click away. The platform remembers how your organization runs.
trusteddocs.ai
Distribution Studio governance
Click to enlarge
GATEWAY
Governance. Total Control Over Every Distribution.
Document Gateway · Distribution Studio · Governance
Who can create Transaction Rooms. Who can require approval before a room goes live. How long max retention runs. Who can create Packages. Who can create Standing Distributions. Whether to auto include documents on verification. Every governance decision sits in one place, controlled by Admins, enforced for everyone. No back doors. No exceptions. No "someone in legal sent that without telling us." Distribution discipline is finally enforced by the platform itself, not by hope.
Standing Distribution Rules. Distributions That Run Themselves.
This is where Document Gateway stops being a system and becomes infrastructure. Define rules on a standing distribution. Pick the document types, execution status, completeness threshold, and expiry horizon that matter. Document Gateway evaluates those rules against your live Document Intelligence database every hour. New documents that match get added automatically. Scheduled deliveries fire automatically with the current matching documents. The distribution stays current as your portfolio evolves. Nobody has to update document lists when a new lease is executed or a renewal is filed. The platform knows. The distribution reflects it. Forever.
Define Rules
Rules That Match Real Document Intelligence.
Rules support execution status (equals or not equals), document type (exact or contains), document category, minimum completeness percentage, and expiry within N days extracted from real document content. Not metadata. The actual dates Abstract.DI read inside the documents themselves.
Preview Matches
See What Will Be Distributed Before You Commit.
Preview Matches evaluates rules against the live database and shows exactly which documents match right now. Apply bulk adds matching documents to the room instantly. No surprises. No misfires. The deal team sees what the counterparty will see, before sending.
Autonomous Delivery
Delivery That Runs Itself, Hour After Hour.
The engine evaluates standing distributions every hour. New matches are added. Scheduled deliveries run with the current matching documents. Manifests are recorded. Next delivery dates advance. The Distribution Studio shows you the last run summary. A "Run Now" button triggers immediate evaluation any time.
Document Viewers. Where the Intelligence Meets the Document.
A document is not a static file. It is a living source of intelligence. The Document Gateway viewers reflect that. Every Steward, every internal user, and every counterparty experiences documents through purpose built viewers that show the document, the certification, the extracted intelligence, and the source citations together. Branded to your organization. Permission aware. Always current. Recipients no longer get files. They get understanding.
trusteddocs.ai
Counterparty viewer
Click to enlarge
GATEWAY
Counterparty Viewer. Branded to Your Organization.
Document Gateway · Counterparty Experience · Branded
When a counterparty opens a Transaction Room, this is what they see. A clean, branded, professional viewing experience that carries your organization's identity, not ours. A complete distribution package surfaces as a thumbnail grid of documents. The viewer shows time remaining on access. Download permissions are governed centrally. Dark and light modes are supported. Every page view, every document click, every minute of dwell time is captured automatically as engagement signal that flows back to your team. The counterparty experiences professionalism. You experience visibility.
trusteddocs.ai
Counterparty viewer with categories
Click to enlarge
GATEWAY
Categorized Viewer. Documents Organized for the Recipient.
Document Gateway · Counterparty Experience · Categorized
The counterparty viewer organizes the package by document category automatically. Bridge debt. Capital account. Quarterly reports. Portfolio summary. Categories are applied by the AI extraction the moment the documents land in the Vault. Recipients see exactly the structure they expect for the kind of package they received. They never wonder what is what. They never have to ask which file is the right one. The intelligence that organized the package is the same intelligence that classified every document in your corpus.
trusteddocs.ai
Certified document viewer
Click to enlarge
GATEWAYSENTRY
The Certified Document Viewer.
Document Gateway · Sentry Certified Documents
Every document in your platform can be viewed inside a fully branded, certified document viewer. The thumbnail strip shows every page in the document at a glance. The viewer header shows the certification badge, the unique document identifier, the version, and a "View in Gateway" link for permitted users. Page numbers, classification labels, and certification proof are visible to every viewer. Counterparties stop receiving files and start receiving certified documents of record. Trust is no longer something they have to take on faith.
trusteddocs.ai
Document with intelligence panel
Click to enlarge
GATEWAYABSTRACT
Intelligence Beside the Document. Always.
Document Gateway · Document Viewer · Intelligence Panel
Open any document and the Intelligence panel slides in beside it. Real time KPIs. Live engagement charts. Document tracking. Activity history. The full Abstract.DI intelligence summary. The document is no longer just something you read. It is something you understand at a glance. Internal users see the full intelligence and analytics. Counterparties see the slice they are entitled to see, branded to your organization. Same viewer. Different audience. One source of truth.
trusteddocs.ai
Document viewer with extracted fields
Click to enlarge
GATEWAYABSTRACT
Every Field Extracted. Every Source Cited.
Document Gateway · Document Viewer · Abstract.DI Inline
Open a Lease Agreement and the viewer shows you the document on the left and every field Abstract.DI extracted on the right. Identity. Parties. Dates. Premises. Term. Rent. Escalations. Assignment. Default. Insurance. Every field is grouped, scored for confidence, and clickable. Click a field and the document jumps to the source page where the value was extracted. No more hunting through eighty pages to verify a number. The extraction and the source live together, side by side, always.
trusteddocs.ai
Document viewer with risk flags
Click to enlarge
GATEWAYVALUE
Risk Flags. Surfaced Automatically. Color Coded.
Document Gateway · Document Viewer · Risk Intelligence
The same viewer surfaces the risks no human could possibly catch every time. Personal guarantees. Cross default clauses. Missing SNDA. Unusual terms. Buried obligations. The risk flags panel shows every flagged clause with severity color coding (red for high, amber for material, blue for medium). Each flag links directly to the source page where the language lives. Your legal, deal, and compliance teams stop spending weekends reading documents looking for problems. The platform finds them. Continuously. Forever.
Process Library. Prebuilt Transaction Workflows.
documentgateway.ai
Process Library — 11 Prebuilt Transaction Workflows
Click to enlarge
GATEWAYADMIN
Process Library — 11 Prebuilt Transaction Workflows
Document Gateway · Workflow Automation
Every complex document transaction follows a predictable phase structure — the documents required for a financing differ from those for an acquisition, a regulatory audit, or a counterparty onboarding, but each has a known sequence that most organizations rediscover from scratch every time. The Process Library ends that rediscovery cycle. A template defines the phases, the required documents per phase, the responsible roles, the estimated duration, and the distribution-ready package that assembles when all phases are complete. Launching a process creates a tracked instance with live phase completion, automatic CTR Score updates as each document is fulfilled, and stakeholder visibility throughout. Templates are versioned — when a better phase structure is identified, it becomes the new standard for all future instances immediately. A 7-phase template with 22 required documents and a 45-day estimated duration is not a checklist. It is institutional knowledge made repeatable, measurable, and improvable at organizational scale. Ad hoc document scramble becomes a managed, auditable workflow that gets faster every time the organization runs it.
Document Type Studio and Hierarchy Studio.
documentgateway.ai
Document Type Studio — Complete Document Vocabulary
Click to enlarge
GATEWAYADMIN
Document Type Studio — Complete Document Vocabulary
Document Gateway · Document Taxonomy
Document type is not metadata — it is the instruction set that governs everything else in the pipeline. The classification label on a document determines which extraction schema applies, which fields are required, which routing rules trigger, which compliance obligations are checked, and how the CTR Score is affected. The Document Type Studio manages this vocabulary for the entire organization. Essential types drive CTR Score calculations directly — a missing Essential document drops the score and surfaces the gap in the Command Center immediately. Elective types are extracted and tracked but do not penalize readiness scores. The AI Generate function analyzes your existing document corpus and suggests new types based on structural patterns it detects — the taxonomy grows with your organization without manual taxonomy work. Each type carries a predefined extraction schema: the exact fields Abstract.DI will look for, the confidence thresholds required per field, and the anomaly detection rules that flag outliers against your established corpus patterns. The Diligence library alone contains 62 document types across Essential and Elective categories — prebuilt from years of real-world document intelligence deployments.rpus to suggest new types your organization actually uses that aren't in the default catalog. The platform ships prebuilt taxonomies for every major industry — configurable, extensible, and learnable from your own document patterns.
documentgateway.ai
Hierarchy Studio — Any Org Structure, Any Depth
Click to enlarge
GATEWAYADMIN
Hierarchy Studio — Any Org Structure, Any Depth
Document Gateway · Organization Architecture
Every organization has a structure — and every node in that structure carries a different document obligation, a different set of authorized users, a different CTR Score calculation, and a different AI extraction schema. Hierarchy Studio maps that structure precisely, without code, without professional services, without architectural constraints. Each node type carries its own configuration: required document libraries, role assignments, process templates, extraction schemas, and compliance obligations all attach at the node level. A user provisioned at a division node cannot see assets outside their scope — enforced at the database layer, not the application layer, through PostgreSQL row-level security. Hierarchy nodes are first-class citizens in the Document Warehouse: every query, every CTR Score, every AI agent call resolves to the node hierarchy the authenticated user belongs to. Corporate entity trees, regulatory division structures, branch networks, fund hierarchies, agency organizations — any org structure configures without changing the platform architecture.ions, corporate legal departments, financial institutions, and government agencies all configure different hierarchies from the same studio. Every node created here becomes a first-class citizen in the Document Warehouse — queryable, scoreable, and connectable to any AI agent via the MCP server.
Platform Configuration
documentgateway.ai
Platform Masters — Document Status Workflow Engine
Click to enlarge
GATEWAYADMIN
Platform Masters — Document Status Workflow Engine
Document Gateway · Status Configuration
Document statuses are not labels — they are workflow triggers. Each status in this table drives a specific system behavior: Submitted auto routes to review queue, Approved fires the Approval Engine, Expired triggers the Violation Engine, Sentry Certified records an immutable fingerprint in vault_records. The drag-to-reorder interface sets the logical default sequence, but the real power is in the terminal and certified flags — terminal statuses cannot be manually overridden, and certified statuses can only be assigned by the Sentry fingerprinting pipeline, never by a user. The AI Suggestions panel on the right uses your industry and document patterns to propose status additions — "Add AI Flagged for low-confidence classifications" appears because the system detected classifications below the review threshold that currently fall through to Needs Revision without a distinct routing path. This is the platform configuring itself.
documentgateway.ai
Roles & Permissions — 9 Roles × 138 Features
Click to enlarge
GATEWAYADMINDEVELOPERS
Roles & Permissions — 9 Roles × 138 Features
Document Gateway · Identity & Access
138 features. 9 roles. 4 tiers. This is enterprise access control with the granularity that regulated industries require. The Role Matrix shows exactly which features each role can access — filtered by Actions, Data, or Pages — with color coded permission states (full access, limited, read only, none). The 4-tier structure (System, Tenant, Hierarchy, Node) means a Steward at a specific hierarchy node can only see documents and actions relevant to their assigned assets — not portfolio wide. Row Level Security enforcement happens at the database layer via Supabase RLS, not at the application layer — which means even direct API access or MCP agent connections respect the same access boundaries. No orphaned permissions. No over-provisioned service accounts. Security is structural, not configured.
Storage Management. Intelligent Lifecycle Automation.
documentgateway.ai
Storage Manager — Automated Document Lifecycle Policies
Click to enlarge
GATEWAYADMIN
Storage Manager — Automated Document Lifecycle Policies
Document Gateway · Storage Intelligence
Documents cost money to store, process, and query — and most organizations keep everything in hot storage indefinitely because moving things manually never happens. Storage Manager automates the entire lifecycle through policy rules that run on configurable schedules. Auto-Warm After Inactivity moves documents not accessed in 30 days from Hot to Warm storage automatically. Archive Certified Docs moves Sentry certified documents to Archive on an hourly schedule — certified documents are immutable by definition, so hot storage is wasteful. Hot Retention for Active keeps any document accessed in the last 7 days in Hot tier regardless of other rules. Each tier has a defined retention schedule: Hot is indefinite for active docs, Warm moves certified docs to Archive after 180 days, Archive retains compliance documents for 7 years minimum. The platform manages storage cost at scale without operational overhead.
White Label Branding
documentgateway.ai
White Label Branding — Full Enterprise Identity Control
Click to enlarge
GATEWAYADMIN
White Label Branding — Full Enterprise Identity Control
Document Gateway · Enterprise Branding
Every customer facing surface of the platform — Document Banner, Login Page, Email Templates, Certificates of Authenticity, Shared Viewer links — carries your organization's identity, not imkore's. Upload Master Logos once (light mode and dark mode variants) and they propagate automatically to all surfaces. Each surface can also be individually overridden with a custom logo if different contexts require different branding. The Logo Across Surfaces panel on the right shows a real time preview of exactly how your logo appears on each surface before you publish. For institutional clients sharing documents with investors, lenders, or regulatory bodies, the platform presents entirely as their own product. This is the infrastructure that allows any organization to present a Transaction Room to a counterparty with full institutional branding — no "Powered by imkore" anywhere in the counterparty experience.
Core Engines and Studios
Engine 01
Check-In Studio — AI Document Intake

Every file enters a multi-stage AI pipeline before a human sees it. Abstract.DI classifies the document type, extracts key fields, scores confidence at the field level, checks for duplicates, detects anomalies, and routes to the correct steward queue automatically.

  • Drag and drop, bulk upload, email ingestion, and API submission
  • ZIP auto-extract with recursive file processing
  • Required document templates show what is needed, fulfilled, outstanding, and rejected
  • HITL Reduction AI promotes document types to bypass human review once confidence thresholds are consistently met
  • Batch Template Manager for bulk ingestion workflows across multiple use cases
  • Rapid Review Mode for high-volume steward queues
Engine 02
Distribution Studio V5 — Governed Document Exchange

Three distribution modes with full audit trails and recipient access controls. Every document leaves the platform certified and tracked.

  • Shared Documents: individual files distributed to named recipients with expiry, watermarking, and view-only enforcement
  • Document Packages: curated sets of related documents delivered as a governed bundle with version locking
  • Transaction Rooms: fully white labeled deal rooms with custom branding, NDA gates, engagement analytics, and counterparty-facing views
  • Resend integration for transactional delivery receipts
  • Recipient access log with timestamps, IP, and engagement depth
Engine 03
Submitter Gateway — External Document Collection

A purpose built external submission portal that presents to counterparties as your own branded platform. No account creation required for submitters.

  • Invitation-only access via tokenized secure links
  • Required document checklists with real time status
  • Automatic routing into Check-In pipeline upon submission
  • Submission Packets define exactly what documents are expected per counterparty type
  • Notification system with automated reminders for outstanding items
Engine 04
Command Center — Portfolio Operations Dashboard

Real time operational view across the entire document corpus by entity, by division, by document type, or by compliance obligation.

  • CTR Score dashboard with per-entity readiness scores
  • Expiry tracking across all documents with escalation alerts
  • Outstanding obligation views by steward or entity owner
  • Anomaly feed showing AI-flagged discrepancies across the corpus
  • Executive reporting views with configurable KPIs
Engine 05
Document Vault — Governed Entity Repository

Five-tier organizational hierarchy providing structured, queryable document storage with role-enforced access at every level.

  • Configurable folder taxonomies per entity type and industry
  • Version control with full history on every document
  • Role-based access: Admin, Steward, Analyst, User, Viewer
  • Document Navigator for cross-entity search and bulk operations
  • Smart Folders with dynamic rule-based population
Engine 06
Approval Workflows — Governed Review Chains

Configurable multi-step approval chains for any document type or business process. Every workflow is auditable end to end.

  • Sequential and parallel approval routing with escalation paths
  • OnlyOffice JWT-enforced document review in-browser (DOCX, XLSX)
  • Annotation and comment threading per document version
  • Automated notifications at each workflow stage
  • Full audit trail on every approval, rejection, and comment action
29 Live Edge Functions. The Serverless Backbone.
Every Document Gateway operation is powered by a dedicated Supabase Deno edge function — deployed independently, versioned separately, and executable on demand. Zero shared infrastructure between functions. Each function enforces its own auth, rate limits, and error handling.
Document Processing
Ingestion and Extraction Functions
  • ingest-document — upload validation, storage routing, and pipeline trigger
  • abstract-document — Abstract.DI extraction orchestrator for any document type
  • ai-classify — standalone classification endpoint for document type inference
  • checkin-pipeline — full OCR → classify → extract → score → route pipeline
  • process-upload-link — handles tokenized external upload URLs for Submitter Gateway
  • quick-verify — fast Sentry fingerprint verification for incoming documents
  • parse-credentials — secure credential extraction from document headers and metadata
Intelligence and Query
AI and Warehouse Functions
  • document-qa — natural language question-answering against any document or corpus
  • warehouse-query — SQL and natural language query execution against extraction fields
  • warehouse-connector — sync manager for Snowflake, Databricks, BigQuery, and webhook targets
  • agent-gateway — AI agent request router with tool dispatch and row-level security
  • mcp-server — dual-protocol MCP (Claude) and REST/OpenAPI (ChatGPT) gateway with 17 tools
  • smart-folders — rule engine that dynamically populates folder views from extraction data
Operations and Delivery
Workflow, Notification, and Integration Functions
  • send-notification — transactional email via Resend for workflow steps and alerts
  • send-submitter-invitation — tokenized invitation emails for Submitter Gateway counterparties
  • create-invite-user — new user provisioning with role assignment and welcome email
  • erp-webhook — inbound event handler for ERPs, CRM, and enterprise platforms
  • submit-anchor — Submission Packet anchoring and counterparty session management
  • schedule-jobs — cron-triggered orchestration for batch pipeline runs
  • run-scheduled-reports — automated report generation and distribution
  • filestar-proxy — FileStar API bridge for existing Millennia Group clients
  • oo-jwt — OnlyOffice JWT token generation for in-browser document editing
  • deployment-health — infrastructure health monitoring and status reporting
  • migrate-infra — schema migration runner for incremental database updates
  • seed-demo-data / seed-pe-samples — demo corpus seeding for private equity verticals
  • generate-demo-blueprint — AI-generated Blueprint diagnostic reports for prospective clients
  • update-whitepapers — automated whitepaper content refresh pipeline
Role Architecture and Access Control
Five-Tier Role System

Every user action in Document Gateway is governed by a five-role permission model enforced at both the application and database layers via Supabase row-level security policies.

  • Admin — full platform configuration, user management, integration setup, and AI engine settings. Advanced Check-In mode.
  • Steward — document review, certification, approval chain management, and AI override authority. Advanced Check-In mode.
  • Analyst — read access to all extraction data, Warehouse Studio, and reporting. Advanced Check-In mode.
  • User — document submission, basic search, and personal workflow tasks. Basic Check-In mode.
  • Viewer — read only access to shared documents and approved views. Basic Check-In mode.
Tech Stack and Infrastructure

Zero legacy code. Entirely 2024 to 2026 stack designed for sub-30-day enterprise deployment.

  • Frontend: React 18 / TypeScript / Vite — 200+ components, dark and light theme, DM Sans / DM Mono typography
  • Backend: Supabase PostgreSQL with PostgREST, 29 Deno edge functions, row-level security on every table
  • Storage: Cloudflare R2 for all document binary storage — zero egress fees at scale
  • Document Editing: Native in-browser editing of DOCX and XLSX with full collaboration and access controls
  • Email: Resend for all transactional delivery with signed receipts
  • Deployment: Vercel auto-deploy — documentgateway.ai, trusteddocs.ai, imkore.ai
Deployment Model

Single-tenant, multitenant, Azure Cloud, AWS, on-premise, and hybrid deployments are all supported. Any file type. Any industry. Any org size. Average enterprise deployment: 30 days from contract to go-live. No professional services required for standard configurations.

Products · Tab 08
Abstract.DI. The Engine That Actually Reads Your Documents.
Every transaction ready organization has the same invisible problem. Thousands of critical documents that no human has truly read. Leases. Loan agreements. Contracts. Amendments. Certificates. They sit in folders, filed correctly, but essentially opaque. When a question comes up, someone has to open them and look. Abstract.DI changes this permanently. Give it a document. Any document. It reads the full text, understands what it is, extracts every meaningful field, recognizes which other documents it belongs with, and writes all of it into a structured database that the rest of your platform and your AI agents can query and act on. Any document. Any industry. 94% confidence the very first time you use it. No training required.
ANY
Document Type
94%+
Day One Confidence
4
Extraction Scopes
Auto
Document Families
Live
Standing Distribution Rules
Every
Surface in the Platform
Why Abstract.DI Was Built

Most platforms that touch documents store them. A few extract surface metadata. None of them close the loop between the intelligence inside a document and the actions that intelligence should drive across the rest of the organization. We built Abstract.DI to end that gap. Every document that enters your platform becomes structured intelligence the moment it arrives, and that intelligence flows everywhere it needs to go automatically. Approvals. Distributions. Compliance. AI agents. Counterparty experiences. Workflows. Everything downstream just works, because every document is finally understood.

Abstract.DI in Action. AI Powered Document Comprehension.
documentgateway.ai
Abstract.DI — Structured Field Extraction from Any Document
Click to enlarge
ABSTRACTAI
Abstract.DI — Structured Field Extraction from Any Document
Abstract.DI · AI Extraction Engine · Any Document Type
What you are seeing here is a document — any document — being converted into structured, queryable intelligence in seconds. The left panel shows the original document exactly as it arrived. The right panel shows every field Abstract.DI extracted: parties, dates, financial terms, obligations, conditions, signatures, execution status — organized into typed field groups with individual confidence scores. Every highlighted passage in the document is a live link: click any extracted field and the document scrolls to the exact source text it was derived from. This is not an AI summary — it is a structured database record created from an unstructured document, with full provenance tracing from field value back to source text. The 94%+ confidence score is field level, not document level — you know exactly which fields the AI is certain about and which need review. All extracted data is immediately written to the Document Warehouse as queryable PostgreSQL rows, available to any BI tool, API consumer, or AI agent the moment extraction completes. This is the engine that turns a folder of PDFs into a structured database.
Four Extraction Scopes. Match the Engine to the Stakes.
Not every document deserves the same level of attention. Abstract.DI lets you choose how deep to go on each one. Quick scans for high volume intake. Full extraction for the documents that matter. Maximum context for the dense and complex. And Expert mode for synthesizing entire amendment chains into one current state master abstract.
Scope 01 · Quick
Five Seconds. Half a Cent.
Fast scan for the key fields. Ideal for high volume intake where speed matters and a complete extraction is overkill. Run thousands of documents through Quick scope in minutes.
Scope 02 · Standard
Twenty Seconds. Full Extraction.
All fields. All groups. High fidelity. The default for most documents. This is what you run when you actually need to understand what the document says, end to end.
Scope 03 · Deep
Maximum Context. Highest Fidelity.
For dense, complex, or critical documents where every field matters and the document is too long for Standard scope to do justice. Maximum AI context window. Highest possible accuracy.
Scope 04 · Expert
Family Synthesis. Master Abstract.
Reads the entire amendment chain of a document family at once. Produces a single master abstract representing the current controlling state. Knows what changed in each amendment and what the relationship looks like today.
What Gets Extracted From Every Single Document.
Identity and Structure.
  • Document type and category, auto classified, no templates required
  • All parties and their roles
  • Document family key. Which amendment chain this document belongs to
  • Sequence role within the family. Original. Amendment. SNDA. Guaranty. Other
  • Source document, page numbers, and citations for every extracted field
Material Substance.
  • Every material date. Execution. Commencement. Expiration. Notice. Option. Renewal
  • Every financial term. Rent. Loan amounts. Interest rates. Guaranties. Escalations
  • Execution status. Fully Executed. Partially Executed. Unexecuted. Or Not Applicable
  • Key clauses extracted with source citations so users jump to the page in one click
  • Completeness score reflecting how thoroughly the document was extracted
Risk and Anomaly.
  • Severity rated risk flags. Personal guaranties. Cross default clauses. Missing SNDA
  • No cure provisions. Unusual terms. Outliers against your corpus patterns
  • Anomalies surfaced automatically against everything Abstract.DI has ever processed
  • Field corrections via simple hover edit. Stored as feedback, never overwriting the AI extraction
  • Every flag surfaces in the Portfolio Intelligence Dashboard, deduplicated portfolio wide
Document Families. Amendment Chains, Recognized Automatically.
A lease, a work letter, three amendments, an SNDA, and a guaranty are not five documents. They are one relationship. Abstract.DI is the only engine in the market that recognizes when documents belong together, builds the amendment chain automatically, identifies the controlling document, and lets you synthesize the entire relationship into one current state master abstract. Portfolio diligence finally stops being a pile of files. It becomes a clean, current, intelligible relationship for every contract you have ever executed.
Family Recognition
Documents Group Themselves.
Abstract.DI recognizes when documents belong together and builds the amendment chain automatically. Sequence roles are assigned. The controlling document is identified. The visual timeline shows the full relationship at a glance. As new amendments are abstracted, they join the family without anyone lifting a finger.
Family Viewer
See the Full Relationship.
The Family Viewer shows the complete chain in chronological order. Each member's execution status, effective date, and key summary. The Amendment Diff tab does field by field comparison between any two versions. The Master Abstract tab consolidates all flags from all members into one current state view. The current controlling document is marked clearly.
Expert Synthesis
One Master Abstract for the Whole Family.
Run Expert scope on a family and Abstract.DI reads every member document at the same time. The output is a single master abstract that represents the entire relationship as it stands today. Not just the last amendment. The accumulated effect of every amendment on the original controlling terms.
Where Intelligence Surfaces. Everywhere It Needs To.
Abstract.DI data does not live on an extraction page. It flows through every surface where documents appear in your organization. The same intelligence that powers a Steward approval also powers the Counterparty Shared Viewer, the Distribution Studio Smart Picker, the Portfolio Intelligence Dashboard, and the AI agents querying your data through the MCP server. One extraction. Every surface.
Document Navigator and Viewer
Select any document and the right panel shows execution status, completeness, scope used, AI classified type, and every extracted field with confidence and source page. Documents not yet abstracted show a clear "Re extract" prompt. The Global Viewer reads the user's preferred default scope from their settings on every open.
Steward Gateway Approvals
When a document arrives for Steward approval, Abstract.DI intelligence is already there. Document type. Execution status. Completeness. Top fields. High risk flag count. Stewards make informed approvals without ever opening the file. Approvals get faster. Decisions get better.
Check-In and Submitter Gateways
When a document is filed through Check-In, Abstract.DI runs automatically in the background. By the time the Steward reviews, the intelligence is already there. Submitters upload through their portal and the document is abstracted before the page reloads. Auto abstract is a single setting in user preferences.
Distribution Studio Smart Picker
Build a document package or a Transaction Room and every document in the picker shows its AI classified type and execution status. Filter to "Fully Executed only" and instantly see exactly what is signed. Deal teams build packages by what documents are, not just what they are named. Footer counts show abstracted and executed totals across the entire library.
Portfolio Intelligence Dashboard
A morning dashboard powered entirely by Abstract.DI. Five live KPIs. Eighteen month expiration timeline built from real document dates. Execution status distribution across the entire portfolio. AI classified document type breakdown. Deduplicated risk flags portfolio wide. An asset readiness table showing coverage and average completeness for every asset.
Shared Recipient View
When a counterparty opens a shared document link, they see Document and Intelligence tabs. The Intelligence tab shows them the document type, execution status, completeness, summary, key fields, and risk flags. Clean, branded to your organization, no login required. Your counterparties get context, not just files.
Standing Distribution Rules. Intelligence That Drives Action.
This is where Document Intelligence stops being a database and becomes infrastructure. You define rules on a standing distribution. "Include all Fully Executed Commercial Leases expiring within ninety days for this asset." Abstract.DI evaluates those rules against the live intelligence database every hour. When new documents are abstracted and match, they are added automatically. When the scheduled delivery date arrives, the distribution runs with the current matching documents, assembled by AI from your live portfolio. Distributions stay current as your portfolio evolves. Nobody has to update the document list when a new lease is executed or a renewal is filed. Abstract.DI knows. The distribution reflects it. Automatically. Forever.
Define Rules
Rules That Match What You Mean.
Rules support execution status (equals or not equals), document type (exact or contains), document category, minimum completeness percentage, and expiry within N days extracted from real document content. Not metadata. The actual dates Abstract.DI read inside the documents themselves.
Preview Matches
See What You Will Distribute Before You Commit.
Preview Matches evaluates all rules against the live intelligence database and shows you exactly which documents match right now. With execution status indicators and document type labels. Apply bulk adds the matching documents to the room instantly. No surprises.
Autonomous Delivery
Delivery That Runs Itself.
Behind the scenes, the engine evaluates all standing distributions every hour. New matches are added. Scheduled deliveries run automatically with the current matching documents. Manifests are recorded. Next delivery dates advance. The Distribution Studio shows you the last run summary. A "Run Now" button triggers immediate evaluation for testing.
Export to Excel. Your AI Read Document. In One Spreadsheet.
Every abstract Abstract.DI produces can be exported to Excel in a single click. Every field. Every group. Every value. Every source page. Every confidence score. The same document that took your team a week to read by hand becomes a clean, structured Excel workbook your CFO can open in seconds. Your team's expertise is finally captured as data, not as Word memos and email threads.
trusteddocs.ai
Abstract.DI Excel export
Click to enlarge
ABSTRACTVALUE
Excel Abstract. Every Field. Every Source. Every Confidence.
Abstract.DI · Export · Excel
Open the export and you see the whole document, distilled. Document type. Execution status. Identity and Signatories. Lessor. Lessee. Premises and Boundaries. Term. Rent. Operating Expenses. Insurance. Default. Assignment. Every group. Every field. The actual extracted value. The source page reference. The confidence score. Every row is one extracted fact, traceable back to the original document. Hand this workbook to a CFO, an auditor, an asset manager, or a deal lead and they understand the entire document in minutes. The lease that took a paralegal four hours to summarize lives forever in this workbook. And in the Warehouse. And in your AI agent's reach.
Batch Abstraction. Onboard a Portfolio in One Click.
Select any folder or any asset in Document Navigator and click "Abstract All." A pre flight check shows which documents already have abstracts (those are skipped automatically). Choose your scope and see the time and cost estimate per document and for the whole batch. Sequential processing with a live status list. Cancel any time. Completion summary with a direct link to the Portfolio Intelligence Dashboard. A portfolio of three hundred documents can be fully abstracted in a single operation. This is what makes Abstract.DI practical for onboarding existing document sets, not just new documents coming in.
Multi Pass Extraction Pipeline.
Abstract.DI — Document to Intelligence Pipeline
Step 1
Document Ingestion
PDF, DOCX, XLSX, PPTX, MSG/EML, CSV, ZIP, JPEG/PNG/TIFF, DB records
Step 2
Selective OCR
DocTR engine · GPU 10 to 50x speedup · Multilingual · 8s timeout with fallback
Step 3
Classification
Any doc type · Claude Haiku inference · "AI.DI Named" badge · Confidence scoring
Step 4
Field Extraction
Type specific schemas · Dates · Parties · Amounts · Obligations · Conditions
Step 5
Anomaly Detection
Cross-document consistency · Version comparison · Portfolio baseline deviations
AI.DI Document Warehouse — Structured PostgreSQL Rows
All extracted fields stored as structured, queryable data. Available to BI tools, APIs, and AI agents instantly.
"Box shows you a file. We show you what the file says. Run both side by side. The demo closes itself."
— Abstract.DI positioning principle
How the Extraction Pipeline Works.
Step 01
Ingest
File received via upload, email, API, or Submitter Gateway. ZIP files auto-extracted recursively. File validated and stored.
Step 02
OCR
Selective OCR applied only when it improves text completeness. GPU-accelerated at 10x to 50x CPU speed. Duplicate copies skipped.
Step 03
Classify
Document type identified from 5,700+ taxonomy entries. 78% auto-classify rate on day one without custom training.
Step 04
Extract
All meaningful fields extracted with individual confidence scores. Parties, dates, financial terms, obligations, signatures, clauses.
Step 05
Score and Route
Confidence checked against thresholds. Auto-certify, steward review, or flag. Duplicate and anomaly detection run in parallel.
Step 06
Warehouse
All extracted fields written as structured rows to PostgreSQL. Instantly queryable by SQL, natural language, BI tools, and AI agents.
AI Confidence Architecture. Three Zones.
Zone 01 — Auto Certify
Confidence at or above the auto-certify threshold
Document passes through the pipeline without steward involvement. Default threshold is 85%. Configurable per document type or tenant. As the ML feedback loop accumulates corrections, more document types graduate to this zone. The HITL Reduction AI tracks which types are consistently above threshold and promotes them automatically. The percentage of documents requiring human review trends toward zero for standard document types over time.
Zone 02 — Steward Review
Confidence between review and auto-certify thresholds
Document routed to steward queue for field level review. Stewards see exactly which fields are uncertain, with the source text highlighted in the original document. A single correction (changing a wrong date or confirming a party name) is fed back into the ML model as a labeled training example. Default review band is 60% to 85%. Every correction makes the next extraction more accurate.
Zone 03 — Flagged
Confidence below the review threshold
Document flagged for full manual review and possible reingestion. Default flag threshold: below 60%. Typically scanned documents with poor image quality, unusual layouts, or document types not yet in the training corpus. All flag events are tracked and used to prioritize which document types need additional training data. Flag rate declines as corpus grows.
OCR Engine Options
Default Engine
AI Native OCR.
  • Multilanguage architecture that handles any document, in any language, in any layout
  • GPU acceleration ten to fifty times faster than legacy alternatives
  • Optimized for complex layouts. Multi column. Rotated text. Tables. Handwritten annotations
  • Smart execution. OCR runs only when it materially improves the extraction
  • Deduplication aware. One file in a duplicate group is processed, every copy benefits
Visual Page Intelligence
Pages As Images, Not Just Text.
  • Scanned documents are detected automatically the moment they arrive
  • Pages are rendered as images and fed to the AI alongside the OCR text
  • Maximum extraction accuracy on non native PDFs and scanned originals
  • Signature blocks, stamps, and visual markers are captured, not just text
  • The same intelligence is applied whether the document was born digital or scanned from paper
Open Source Option
Engine Agnostic by Design.
  • Any OCR engine can be substituted to meet client contracts or compliance requirements
  • The pipeline output format is identical regardless of engine. Zero downstream changes
  • Open source engines are available out of the box for organizations that require them
  • You control the OCR. AI.DI controls the intelligence layer above it
What Gets Extracted. Field Schema by Category.
Core Document Fields

Extracted from every document type regardless of content: node_path, hierarchy_path, doc_type, workflow_status, added_at, original_name, storage_path, period. These fields form the backbone of the Document Warehouse schema and enable cross-entity search across the entire corpus.

AI Extracted Fields

Present for documents where Abstract.DI has completed extraction: ai_fields (JSONB), extraction_confidence (numeric 0 to 100), entity_party (primary counterparty), primary_value (lead financial figure), start_date, end_date. These fields are present across a standard deployment corpus of thousands of documents.

Financial Fields

Extracted from financial statements, loan documents, and operating reports: coverage ratios, loan-to-value metrics, net operating income, revenue, net income, return metrics, and performance multiples. All numeric fields stored with full precision for direct BI tool consumption without transformation.

Operational Fields

Extracted from contracts, utilization reports, and entity records: utilization_rate, total_units, primary_counterparty, anomaly_flag (boolean — AI detected discrepancy). The anomaly flag is computed by comparing extracted values against corpus patterns. A coverage ratio of 0.4 in a corpus where the median is 1.8 raises the flag automatically.

AI Model Performance and Learning
Per-Tenant Model Performance

Abstract.DI maintains per-tenant model statistics that improve continuously as stewards interact with the platform:

  • Auto-classify rate: 78% of documents classified without steward input on a standard tenant
  • Average confidence: 91% across all extracted fields on certified documents
  • Training corpus: 6,421 steward-reviewed documents feeding the active learning loop
  • Steward corrections (30 days): 47 field-level corrections generating new labeled training examples
  • Model retraining: triggered automatically when correction volume exceeds threshold
ML Feedback Loop — How the Model Improves

Every steward action is a labeled training example. The model does not require separate annotation workflows or data science involvement.

  • Steward accepts a field → positive signal for that extraction pattern on that document type
  • Steward corrects a field → negative signal plus the corrected value as ground truth
  • Steward rejects a document → classification correction that updates the type inference model
  • ML Learning Dashboard shows training corpus growth, accuracy trends, and retraining schedule
  • Pipeline settings control: minimum documents before auto-certification, review window days, rapid review mode
Custom Schema Builder

Abstract.DI ships with prebuilt schemas for over 5,700 document types across industries. For document types outside the standard taxonomy, the Custom Schema Builder lets admins define extraction targets directly. Specify the fields you need, provide three to five example documents, and the model learns the pattern. No code. No data science team. New document type schemas are typically operational within one business day.

Products · Tab 09
Sentry Document Assurance. Find Any Document. Anywhere. Instantly.
Sentry connects to every document system you operate. SharePoint. OneDrive. Box. Dropbox. Google Drive. Windows shares. Email archives. ERPs. Legal hold systems. Network drives. Legacy repositories. The moment Sentry connects, your entire organization becomes one virtual Document Intelligence warehouse. If a user knows how to search Google, that same user can find any document, and every version of it, instantly, across every system in your organization. Zero document storage. Zero PII exposure. GDPR, HIPAA, SEC, and APA compliant by architecture, not by configuration.
All
Document Systems Connected
0
Documents Stored
Instant
Universal Search
30 to 50%
AI Cost Reduction
Every
Version Always Findable
Zero
User Training Required
Why Sentry Was Built

Every organization runs ten or twenty or fifty different systems that hold documents. Nobody knows what is real, what is current, what is duplicated, or what is missing. Asking a simple question like "where is the latest signed version of this contract" turns into a multi day scramble across teams and systems. Sentry was built to end that, permanently. Connect every system once. Search like Google forever. Sentry becomes the unified document intelligence overlay across your entire organization without you having to move a single file.

Search That Feels Like Google. For Every Document in Your Organization.
If a person can use Google to learn how to bake a cake, that same person can use Sentry to find any document and every version of that document, instantly, across every system in your organization. No training required. No new login to remember. No new interface to learn. The same search box that returns the answer is the same search box that returns documents from SharePoint, Box, Google Drive, Outlook archives, network shares, and legacy systems all at the same time.
Universal Search
Find Any Document Across Every System.
Type a search. Sentry searches every connected system at once. Filenames. Document content. Metadata. Tags. Properties. The result list ranks by relevance the same way Google does. Click and the document opens from wherever it actually lives. Nothing has to be moved. Nothing has to be migrated. Your organization just becomes searchable.
Every Version, Always
See Every Version of Every Document.
Sentry groups every version of every document together automatically. The signed copy. The redlined draft. The countersigned amendment. The version someone emailed on Tuesday. The version saved to a personal drive. Every version of the document is found, ranked, and presented at the top of the result. You never lose track of the truth again.
Trusted Verification
Mathematical Proof of What Is Real.
Behind every search result, every document carries a deterministic mathematical fingerprint. Two identical documents always produce identical fingerprints. Any change at all produces a measurably different fingerprint. Zero false positives. So when Sentry tells you which version is signed, executed, or current, the answer is mathematically certain.
Sentry Cuts AI Cost by 30 to 50%, Immediately

Industry average: more than 40% of the files in any organization are duplicates. A document estate with 40% duplicates means 40% of every AI invoice computes the same content twice. Sentry identifies every duplicate across every connected system, consolidates them to canonical records, preserves all metadata from every duplicate instance, and quietly suppresses the duplicates from AI processing queues. AI compute costs drop 30 to 50% on day one. Your prompts do not change. Your models do not change. The duplication just goes away.

What Sentry Lets Your Organization Do.
For Decision Makers
Act on Document Intelligence, Not Document Volume.

Live analytics on documents across every system. By source. By format. By status. By owner. Sentry shows you exactly which documents matter and which do not, where remediation is needed, and where governance is failing.

Decision makers stop guessing. Compliance posture, data quality maturity, and risk exposure become measurable KPIs. Document intelligence becomes a board level metric, not a back office function.

For Operations
Eliminate the Duplication Crisis.

Sentry sees every duplicate across every connected system, all at once. Storage cost, compliance risk, and operational drag from redundant content become quantifiable. Defensible deduplication runs continuously without sacrificing auditability or chain of custody.

The redundancy that has been compounding for a decade gets resolved methodically, automatically, and safely.

For Records and Compliance
Total Cross System Transparency.

Unified discovery of every document across every silo. Sentry surfaces misplaced sensitive content, policy exceptions, and information governance gaps that no single system would ever expose on its own.

Strategic document migration, normalization, and records governance finally have the underlying intelligence to be done correctly, not blindly.

OCR That Actually Reads Every Document.
Sentry's OCR engine is the foundation that turns visual content into searchable, fingerprinted, structured intelligence. Pretrained AI models outperform traditional rule based OCR on accuracy, flexibility, and consistency on the complex document structures real organizations actually receive.
Open Architecture
Engine Agnostic by Design.
  • Multiple OCR technologies run simultaneously inside one pipeline
  • AI native engines prioritized for accuracy on complex layouts and multilingual content
  • GPU accelerated processing ten to fifty times faster than legacy alternatives
  • Selective OCR execution that reduces unnecessary processing and improves throughput
  • Any client preferred OCR engine can be substituted where contractually required
Smart Execution
Selective OCR That Saves You Money.
  • OCR runs only when it materially improves extracted text completeness
  • When one file in a duplicate group is processed, every remaining copy is skipped automatically
  • Selective OCR runs on embedded images even when native text already exists
  • Full document OCR is triggered automatically on scans and image dominant documents
  • Roughly 30% of total OCR compute is eliminated through smart execution alone
Multi Modal Intelligence
Pages As Images, Not Just Text.
  • Each page is also passed to the AI as a visual image, not only as extracted text
  • Signature blocks, stamps, seals, and visual markers are interpreted, not just transcribed
  • The AI sees the document the way a human would, with full visual context
  • Maximum extraction accuracy on non native PDFs, scanned originals, and historical paper
  • Born digital and scanned documents receive identical Document Intelligence
A Search Experience Anyone Can Use.
For Every User
Familiar. Fast. Universal.
  • The search experience your team already knows how to use. No training required
  • One unified search across every connected system and repository at the same time
  • Instantly searches titles, metadata, tags, filenames, and full document content
  • Database level queries return results in milliseconds even at enterprise scale
  • Filters and sorts that anyone can use to drill in across any metadata field
Find Every Version
Statistical Similarity and Version Discovery.
  • Documents are grouped by statistical similarity, not just keyword match
  • Matching versions, prior drafts, and comparable files all appear in one result set
  • Every historical version of the document is surfaced at the top of the results, automatically
  • Fraud detection becomes inherent. Near duplicates that differ only in critical fields are flagged immediately
  • The truth of which version is real is never in doubt again
For Knowledge Industries
Industry Knowledge Bases Built In.

For life sciences and academic clients, Sentry has imported, fingerprinted, and organized roughly thirteen million PubMed abstracts, with hourly update capability against the full thirty nine million record National Library of Medicine corpus.

Industry knowledge bases like this can be added to any client deployment, making Sentry a unified entry point to the documents your organization holds and the published research that informs them.

Sentry plus Document Gateway. Continuous Document and Data Readiness.

Sentry registers, processes, and fingerprints every document flowing through Document Gateway in both directions. Documents distributed externally are certified before they leave. Documents received are verified on arrival. Combined with Sentry's universal search across every connected system, your entire document estate becomes one trusted, queryable, fully visible intelligence layer with readiness scores that update in real time. This is the foundation of Continuous Transaction Readiness.

"The fingerprint never lies. The document might. Sentry always tells you the difference."
Sentry Document Assurance design principle
Products · Tab 10
AI.DI Document Warehouse. Your Documents Become a Living Database.
Every document Abstract.DI processes becomes a structured row in your Document Warehouse. Every extracted field becomes a queryable column. Every AI signal is persisted as a structured record. This is the foundation that turns a folder of files into the most strategic data asset in your organization. Your CFO queries it. Your data scientists query it. Your AI agents query it. Your BI tools query it. Your records team queries it. They all query the same intelligence layer, with the same answers, in real time.
PostgreSQL
Open Standard Database
9
Outbound Connectors
Zero
ETL Required
9 Tool
MCP Server for AI Agents
7
Workspaces by Persona
Live
Data Lineage Map
Why the Document Warehouse Was Built

For thirty years, every enterprise ran a document system and a data warehouse in parallel. The document system stored files. The data warehouse stored numbers. The actual information inside the documents (the obligation term, the coverage ratio, the commitment date, the counterparty agreement) lived in neither. It was trapped in PDFs that nobody could query. The AI.DI Document Warehouse ends that separation permanently. Every document is read by Abstract.DI, every field becomes a column, and the warehouse becomes a continuously enriched, AI maintained structured database of everything your organization has ever received, produced, or executed.

Why It Is So Open

Most document platforms lock your data inside their walls. We do the opposite. The Warehouse runs on PostgreSQL, the most widely used open source database in the world. It speaks SQL. It speaks REST. It speaks GraphQL. It speaks MCP. Every BI tool already knows how to talk to it. Every data scientist already knows how to query it. Every AI agent can connect through the published MCP server. Your Document Intelligence is not held hostage. It is yours, in your stack, accessible to every system you already operate. You can stand up a Snowflake share, push to Databricks, pull from Power BI, and run a Python notebook against the same data, simultaneously. We enable that. The competition does not.

What the Warehouse Lets You Do.
For the CFO and CIO
Run the Business on Document Truth.
Every executed contract, every expiring obligation, every committed financial term is a queryable row updated in real time. CFOs see covenants and commitments live. CIOs report on document risk to the board with hard numbers, not gut feel. Quarterly reviews, audits, and board reports stop being a fire drill and start being a query.
For Data Scientists
Build AI on Documents That Are Actually Trustworthy.
Notebook environments with native Python, SQL, and natural language query support. Field statistics, completeness scores, anomaly flags, and full provenance for every value. Train models on certified, deduplicated, structured intelligence with full chain of custody. The thirty year wait for usable enterprise document data is over.
For Records and Compliance
Total Visibility. Total Provenance.
The Data Lineage Map shows every document's full path from source through ingestion, deduplication, fingerprinting, extraction, certification, and storage to every downstream consumer. Stale or broken pipelines are flagged before anyone asks. Audit defense moves from weeks to minutes.
What the Warehouse Lets You Connect To.
The Warehouse does not just sit there. It pushes intelligence outward to every analytics, AI, and BI tool your organization runs. Most document platforms make you build pipelines. We deliver them as one click connectors with zero ETL.
Snowflake Data Share
Your Document Intelligence in Snowflake. Zero Copy. Zero ETL.
Push extracted fields to any Snowflake schema with upsert and append modes. Fifteen minute incremental sync. No pipeline to build. No ETL to maintain. Snowflake users join document intelligence with financial data in the same query. The most strategic data asset in your organization is finally available in the warehouse where the rest of your business intelligence already lives.
Databricks and Delta Lake
Stream Document Intelligence to Your ML Platform.
Native Databricks connector with Delta Lake streaming. Document Intelligence flows to your ML platform in real time, ready for model training, feature engineering, and downstream agent workflows. Data scientists work in the environment they already know with data they have never had access to before.
BigQuery, Redshift, BI Tools
Every BI Tool. Every Data Stack.
BigQuery export. Redshift connector. Native Tableau and Power BI connectors. dbt compatibility. Direct JDBC and ODBC access. REST API with full OpenAPI spec. Python SDK. Webhook event streaming. The Warehouse pushes outward to every tool your organization already runs. Without ETL projects. Without IT tickets. Without rebuilds.
What AI Agents Get From the Warehouse.
The Warehouse exposes a published MCP server that turns every Claude, Cursor, ChatGPT, LangChain, AutoGen, or custom AI agent into a fully informed Document Intelligence agent for your organization. Nine production tools available the moment you connect. Tenant scoped keys. Row level security enforced at the database. Every agent only sees what the connecting user is allowed to see.
Search and Query Tools.
  • search_documents. Full text search across the certified corpus
  • search_by_field. Structured query like "base rent above $50 per square foot"
  • natural_language_query. Ask the warehouse a question in plain English
  • query_extractions. Field value queries with confidence filtering
Family and Comparison Tools.
  • get_family. Pull the complete document family with all amendments
  • compare_documents. Identify field by field differences between any two versions
  • get_completeness. Portfolio wide data quality and completeness scoring
  • get_hierarchy. Navigate organizational structure programmatically
Risk and Forecast Tools.
  • find_anomalies. Anomaly detection with severity, status, and date filters
  • expiry_forecast. Thirty and ninety day expiration alerts portfolio wide
  • get_compliance_status. Real time compliance posture for any entity or program
  • get_obligations. Obligations and milestones extracted across the corpus
Inside the Warehouse. Every Document. Every Field. Every Answer.
This is the Warehouse Studio. The single environment that turns your entire document estate into a queryable, structured, AI ready intelligence asset for every persona in your organization. CFOs see live dashboards. Data scientists run notebooks. SQL analysts write joins. Records managers see lineage. AI agents pull through the MCP server. They all work in the same Warehouse Studio, on the same data, in real time. Click any screen below to step inside.
trusteddocs.ai
Document Warehouse list
Click to enlarge
WAREHOUSE
Every Document. In One Live Database.
Document Warehouse · Master Document List
This is your entire document estate as a single, queryable, structured database. Every document Abstract.DI has read sits here as a row, with extracted fields as columns, full text indexed, fingerprinted, certified, and ready for action. Filters, search, and document selection are all instant. Documents in any state (pending, certified, abstracted, archived, vaulted) live in one place. The right panel shows the inspector for any selected document. The chaos of files spread across ten systems collapses into one queryable, searchable, unified Document Intelligence asset.
trusteddocs.ai
Warehouse Studio engine map
Click to enlarge
WAREHOUSEABSTRACT
The Intelligence Engine Map. See How It Works.
Warehouse Studio · Abstract.DI Intelligence Engine Map
A single view that shows you exactly how Document Intelligence works in your platform. Live volumes flow across each engine. Total documents abstracted. Total fields extracted. Average confidence. Risk flags surfaced. Document families recognized. Below the metrics: the four scope levels (Quick, Standard, Deep, Expert), the engine flow stages, the database tables that store the output, and the systems consuming the data downstream. Records managers, CIOs, and data scientists see the entire pipeline at a glance. Total transparency. No black box.
trusteddocs.ai
Portfolio intelligence dashboard
Click to enlarge
WAREHOUSEVALUE
Portfolio Intelligence. Your Documents at a Glance.
Warehouse Studio · Portfolio Intelligence Dashboard
This is what your morning looks like with Document Intelligence. Documents abstracted. Fully executed count. Average completeness. Expiring within ninety days. High risk flag count. An eighteen month expiration timeline built from real document dates extracted by Abstract.DI. Execution status distribution across the entire portfolio. Document type breakdown. Deduplicated risk flags portfolio wide. Asset readiness scoring with extraction coverage. Every chart is real. Every number is live. Every CFO, COO, and Records Director sees the truth of their document estate before their first cup of coffee.
Query and Notebooks. Ask Anything. Get a Real Answer.
Three ways to ask the Warehouse a question. Plain English. SQL. Or a full data science notebook. Choose what fits the question. Get a real answer in seconds, not weeks. Every query runs against your live, certified, structured Document Intelligence corpus with full provenance. Every answer cites the source documents.
trusteddocs.ai
Natural language chat
Click to enlarge
WAREHOUSEAI
AbstractDI Chat. Ask in Plain English. Get the Answer.
Warehouse Studio · Natural Language Query
No SQL required. No training required. Just ask. "How many documents expire in the next ninety days." "How many documents have full execution." "Show me the documents with risk flags." "Total commitment value across the portfolio." The chat understands your corpus and answers from your live data with full citations. The answer is structured. The answer is queryable. The answer is right. Every business user in your organization just gained an analyst on demand.
trusteddocs.ai
SQL query editor
Click to enlarge
WAREHOUSEDEVELOPERS
SQL on Your Documents.
Warehouse Studio · Query Editor
Native SQL on your entire document corpus. Joins. Aggregations. Subqueries. Window functions. Anything PostgreSQL can do, you can do on your Document Intelligence database. Save queries. Share queries. Schedule queries. Export results to CSV, JSON, Parquet, or directly to Snowflake. Your SQL analyst already knows how to use this on day one. The thirty year wait for queryable enterprise documents is over.
trusteddocs.ai
Warehouse notebook analysis
Click to enlarge
WAREHOUSEDEVELOPERS
Notebooks. For Analysts and Data Scientists.
Warehouse Studio · Query and Notebooks
A full notebook environment with SQL, Python, and visualization cells running side by side against your live document corpus. Build a quarterly analysis notebook once and rerun it forever. Mix data science queries against extracted Abstract.DI fields with executive narrative cells. Share the notebook with your CFO, your data team, your auditor. The Document Intelligence corpus finally has a notebook environment built for the people who actually do the analysis.
trusteddocs.ai
Warehouse Studio SQL
Click to enlarge
WAREHOUSEDEVELOPERS
Saved Queries Library. Reusable Document Intelligence.
Warehouse Studio · Query Library
Every query your team writes becomes a reusable saved query. Build once, run forever. The right side panel shows the database schema, your saved queries library, and metadata for the active query. Share queries across the team. Schedule queries to run on a cadence. Pipe results to dashboards or downstream BI tools. Your organization stops doing the same analysis manually every month and starts running it on autopilot from a shared query library that grows over time.
Data Intelligence. Total Visibility From Source to Consumer.
Records managers, CIOs, and compliance officers need to see the full path of every piece of intelligence in the organization. The Data Intelligence suite delivers exactly that. Visual data lineage. Real time analytics. A unified schema registry. Live pipeline events. Field level extraction quality. Coverage profiling. Everything you need to govern your Document Intelligence estate, in one place.
trusteddocs.ai
Data Lineage Map
Click to enlarge
WAREHOUSEADMIN
Data Lineage Map. Provenance for Every Piece of Intelligence.
Warehouse Studio · Data Intelligence · Lineage
Every document. Every field. Every AI signal. Has a complete, traceable lineage from source through ingestion, deduplication, fingerprinting, extraction, certification, storage, and on to every downstream consumer. The visual lineage map shows the entire flow as nodes and connections, color coded by tier. Stale or broken pipelines are flagged before anyone has to ask. Audit defense moves from weeks of evidence gathering to a single click. When a regulator asks "where did this answer come from," you show them the map.
trusteddocs.ai
Analytics dashboard
Click to enlarge
WAREHOUSEVALUE
Live Analytics. The Health of Your Document Estate.
Warehouse Studio · Data Intelligence · Analytics
Total documents. Extraction value. AI confidence. Anomalies. Monthly volume by document type. Top extracted fields. Document type distribution. Real time analytics on the operational health of your Document Intelligence estate. Trends spotted before they become problems. Coverage gaps surfaced automatically. Quality metrics tracked continuously. The CIO finally has the document KPIs they could never get from the storage vendors.
trusteddocs.ai
Schema Registry
Click to enlarge
WAREHOUSEADMIN
Schema Registry. Every Schema. Every Document Type. One Catalog.
Warehouse Studio · Data Intelligence · Schema Registry
A unified registry of every extraction schema your organization uses. Lease agreements. Operating agreements. Insurance certificates. Title policies. Asset purchase agreements. Loan documents. Estoppels. Operating reports. Every schema shows version, document count, last updated date, status badge, and direct edit access. Add new document types in hours, not months. Edit a schema and Abstract.DI immediately starts using it. Records management finally has a real catalog of how the organization processes documents.
trusteddocs.ai
Per-field extraction quality
Click to enlarge
WAREHOUSE
Field Level Quality. Trust Every Number.
Warehouse Studio · Data Intelligence · Per Field Extraction Quality
Every extracted field has a measurable quality score. The Per Field Extraction Quality view shows instances, coverage, average confidence, and quality trend per field across the corpus. Field by field. Document type by document type. The CIO and the data scientist see exactly which fields are reliable, which need schema tuning, and which are improving over time. Data quality stops being a leap of faith and becomes a visible, governed metric.
trusteddocs.ai
Coverage and column profiler
Click to enlarge
WAREHOUSE
Coverage Heatmap. See Every Gap Instantly.
Warehouse Studio · Data Intelligence · Coverage
A coverage heatmap of every document type required by every entity in your hierarchy. Greens, ambers, and reds reveal where the corpus is complete and where the gaps are. The column profiler beneath shows uniqueness, density, anomaly rate, top values, and confidence per field. Records managers and compliance teams see the truth of their document estate at a glance. The questions you used to dread answering finally have answers, and they update in real time.
trusteddocs.ai
Document Ingest Pipeline visual
Click to enlarge
WAREHOUSEADMIN
Document Ingest Pipeline. Visualized Live.
Warehouse Studio · Data Intelligence · Pipelines
The full document ingest pipeline visualized as a flow. Ingest Trigger. Document Queue. AI Engine. Type Detection. Validation. Family Detection. Extraction. Warehouse Load. Fanout to consumers. Each stage shows live status and throughput. The activity log at the bottom shows real time events as documents move through the pipeline. Operations teams finally see the heartbeat of their Document Intelligence framework, end to end, with no opaque steps.
trusteddocs.ai
Pipeline events log
Click to enlarge
WAREHOUSEADMIN
Live Pipeline Events. Every Document. Every Stage.
Warehouse Studio · Data Intelligence · Pipeline Events
Every document that has ever flowed through your Document Intelligence framework leaves a complete event trail. Document uploaded. Classified. Extracted. Certified. Stored. Indexed. Distributed. Each event timestamped, attributed, and traceable. The full operational history is queryable, auditable, and never lost. When something needs investigating, the answer is right there. When a regulator wants evidence, the answer is right there.
BI Connectors. Push Document Intelligence to Every Tool You Use.
Document Intelligence does not stop at the platform edge. Push it outward to every analytics and AI tool your organization already runs. Snowflake Data Share. Databricks. Tableau. Power BI. BigQuery. Redshift. Python SDK. Jupyter. ChatGPT and Claude. Webhooks. The Warehouse becomes the freshest, most complete, most structured Document Intelligence layer available across every tool you operate.
trusteddocs.ai
BI Connectors
Click to enlarge
WAREHOUSEDEVELOPERS
Connect Anything. Every BI Tool. Every Data Stack.
Warehouse Studio · Connectors
Snowflake. Databricks. Tableau. Power BI. BigQuery. Redshift. dbt Cloud. Python SDK. Jupyter. Webhooks. ChatGPT GPT Actions. The connector dashboard shows every available outbound integration with one click configuration. Configure once. Stream Document Intelligence to every analytics, ML, and AI tool your organization already runs. Most platforms hold your data hostage. AI.DI does the opposite. Your Document Intelligence is yours, in your stack, accessible to everything you operate.
Document Warehouse. Live Query Interface.
documentgateway.ai
AI.DI Document Warehouse — Your Entire Document Corpus as Structured Data
Click to enlarge
WAREHOUSE
AI.DI Document Warehouse — Your Entire Document Corpus as Structured Data
Warehouse · 9,857 Documents · Live Query Interface
What looks like a document list is actually a live query interface into a structured database. Every row here is not a file reference — it is a record in PostgreSQL with typed columns for every field Abstract.DI extracted: classification type, confidence score, execution status, expiry date, party names, financial terms, compliance flags, and dozens more depending on document type. The Abstract.DI query bar at the top accepts natural language — "show all agreements expiring this quarter where extraction confidence is above 90%" returns structured results because the underlying data is structured, not a keyword search across unstructured text. The six view modes (List, Gallery, Library, Cube, Time Series, Schema, Scientist) let you slice the same corpus as a compliance officer, a data analyst, an AI engineer, or a CFO — each seeing exactly the view that matches their workflow. This is the first document management system where the documents are a side effect of the real product: a continuously enriched, AI maintained structured database of everything your organization has ever received, produced, or executed.
Warehouse Studio. BI Connectors.
documentgateway.ai
Warehouse Studio — Zero ETL Connections to Every Data Stack
Click to enlarge
WAREHOUSEDEVELOPERS
Warehouse Studio — Zero ETL Connections to Every Data Stack
Warehouse · Snowflake · Databricks · 9 Connectors
The Document Warehouse is not a destination — it is a source of truth that feeds every analytics system your organization already uses. Snowflake receives 891 rows on a 15-minute incremental sync via Data Share — zero copy, zero ETL, no pipeline to build or maintain. Databricks connects via Delta Lake for full and incremental refresh, enabling document intelligence to join with financial models, risk systems, and ML pipelines in the same compute environment. Webhooks fire on every document event — ingest, certify, extract, expire — enabling real time triggers to any downstream HTTP endpoint. BigQuery, Redshift, Tableau, Power BI, dbt Cloud, and a Python SDK are available with one-click configuration. The data model is fully documented — every extracted field, every confidence score, every audit event — so data engineers can join document intelligence with any other enterprise dataset without discovering the schema by trial and error. The Warehouse is not just queryable. It is the most current, most complete, most structured view of your document estate that has ever existed.
Data Intelligence. Full Data Lineage Observability.
documentgateway.ai
Data Lineage Map — Complete Provenance from Source to Consumer
Click to enlarge
WAREHOUSEDEVELOPERS
Data Lineage Map — Complete Provenance from Source to Consumer
Warehouse · Data Intelligence · 15 Nodes · 17 Connections
Every piece of intelligence in the platform has a traceable origin. The Data Lineage Map visualizes the complete path a document takes from source system through ingestion, processing, warehouse storage, and consumer delivery — with live status on every node. PDF Documents (693 ingested) flow through the Ingest Pipeline (1,000 processed), Deduplication (exact + fuzzy matching), and Fingerprinting (3 fingerprint types) before Abstract.DI extracts 13 abstraction fields. Those fields become Extraction Fields (queryable warehouse layer), Document Metadata (cross asset search index), and Query Engine (PostgreSQL + custom SQL). Consumers — MCP Server, Snowflake, Webhooks — pull from the warehouse in real time. Stale nodes are visually flagged (Bridge Sync shows 0 fields synced, triggering immediate attention). This is not just observability — it is the audit trail that answers "where did this AI answer come from?" for every query, every extraction, and every alert the platform produces.
"The Finance director ran one query. Years of contract values from thousands of documents into Excel in 30 seconds. He looked up and said: 'We've been paying people to do this manually for twenty years.' That was the moment the platform sale closed itself."
— Warehouse Scientist Mode proof moment
Warehouse Studio. Seven Workspaces.
Warehouse Studio is an IDE-like environment for document intelligence. A left dock switches between seven purpose-built workspaces — each designed for a different user persona and workflow. One platform serves the compliance officer, the SQL analyst, the data scientist, the AI engineer, and the executive simultaneously.
Workspace 01
Overview — Portfolio Intelligence Dashboard
Real time metrics across the document corpus. Total documents, extraction coverage, anomaly count, compliance status by entity, and ingestion trends. Time period selector (7d, 30d, quarterly, annual, all time). Coverage heatmap shows document type completeness across the hierarchy. You instantly see which entities are transaction ready and which still have gaps.
Workspace 02
Query Studio — SQL and Natural Language
Full SQL editor with schema autocomplete against all extraction tables. Natural language mode converts plain English questions to SQL automatically. Saved queries, query history, and result export to CSV or JSON. PostgREST endpoint and custom SQL both available. Context panel shows live schema with field types and occurrence counts.
Workspace 03
Data Explorer — Structured Field Browser
Browse the extracted field dataset as a structured table with filtering, sorting, and column selection. Field profiler shows null percentage, distinct value count, AI confidence distribution, and mini-histogram for every field. The mv_document_universe materialized view provides a single queryable source across all document types and extraction fields simultaneously.
Workspace 04
AbstractIQ Lab — Model Configuration and Testing
Interactive extraction testing environment. Drop any document, run it through the Abstract.DI pipeline, and inspect every extracted field with its confidence score and source text reference. Adjust confidence thresholds, toggle OCR engines, and compare extraction results across model versions. Schema builder for custom document type definitions.
Workspace 05
Notebooks — Persistent Analysis and Reporting
Jupyter-style analysis notebooks with SQL and Python cells. Save analysis work as persistent notebooks shared across the organization. Scheduled execution for recurring reports. Pre-built notebook templates for common analytics: contract data analysis, financial ratio monitoring, obligation expiry ladders, and coverage gap reports.
Workspace 06
Pipelines — Sync and Automation Management
Visual pipeline builder for data sync workflows. Define extraction-to-warehouse sync schedules, connector push cadences, and conditional routing rules. Pipeline status dashboard shows last run time, row counts, error rates, and next scheduled execution. Connects the schedule-jobs and run-scheduled-reports edge functions to a visual management interface.
Workspace 07
Connectors — BI and Data Stack Integration
Manage all outbound data connections from a single workspace. Configure, test, and monitor connections to Snowflake, Databricks, BigQuery, Redshift, Tableau, Power BI, dbt Cloud, webhook endpoints, and the Python SDK. Test connection modal with live credential validation. Sync log shows row counts, timestamps, and error details per connector.
Workspace 08
AbstractIQ Chat — Conversational Document Intelligence
Natural language chat interface that queries the entire document corpus. Ask any question about your documents and receive a structured answer with source citations. Powered by the MCP server layer with full row-level security enforcement. Every answer includes the document, field, and confidence score it was derived from.
Nine Outbound Connectors. Zero ETL Data Stack Integration.
Connected
Snowflake
Push extracted fields to any Snowflake schema. Upsert and append modes. 15 minute incremental sync cadence. Supports partitioned tables and schema on write patterns. Zero copy via Data Share. No pipeline to build or maintain.
Connected
Databricks
Delta Lake via Unity Catalog. Incremental and full refresh modes. Structured streaming support for real time pipeline integration. Used by data engineering teams embedding document intelligence into existing lakehouse workflows.
Connected
Webhooks
Push to any HTTPS endpoint on extraction events. Configurable event filters: document.ingested, document.extracted, anomaly.detected, compliance.updated, sync.completed. Retry logic with exponential backoff. Used for ERP integration and downstream automation.
Available
BigQuery
Google BigQuery streaming or batch load. Supports partitioned tables. Service account authentication. For organizations running GCP-native analytics stacks.
Available
Redshift
Amazon Redshift direct connector. COPY and INSERT modes. IAM-based authentication. Designed for AWS-native data warehouse environments.
Available
Tableau and Power BI
Tableau: live query or extract data source, connect via Tableau Server or Cloud. Power BI: DirectQuery or import dataset. Both provide immediate visualization access to the full extraction field schema without ETL.
Available
dbt Cloud
Source node for dbt models. Automatically generates YAML schema files for all extraction tables. Enables data engineering teams to build transformation models directly on top of AI-extracted document intelligence.
Available
Python SDK
pip install document-gateway. Pandas-native output — client.query("SELECT * FROM documents") returns a DataFrame directly. Async support. Used in notebooks, data science workflows, and custom analysis scripts.
Data Lineage. Five Tier Pipeline Architecture.
Every document that enters the system has a complete, traceable lineage from source through every processing stage to every downstream consumer. The interactive Data Lineage Map shows this as a graph — click any node to see its connections, status, row count, and detail. Color coding by tier. Bezier connections highlight the path from any selected node.
Tier 01
Sources
PDF, Word, Excel, Image, and other document types. Counted by MIME type from the upload table. Multiple source nodes shown dynamically based on actual corpus composition.
Tier 02
Ingestion
Ingest Pipeline (validation and routing), Deduplication (SHA-256 exact plus MinHash fuzzy), Fingerprinting (Simhash, pHash, and MinHash — 3 fingerprint types). All counts wired to live data.
Tier 03
AI Processing
Abstract.DI extraction orchestrator, Bridge Sync (JSONB to typed rows), Schema Engine (AI-learned plus user-defined field type inference). Processing count reflects completed abstractions.
Tier 04
Warehouse
Extraction Fields table (typed, indexed, queryable rows), Document Metadata materialized view (mv_document_universe), Query Engine (PostgREST plus custom SQL).
Tier 05
Consumers
MCP Server (17 tools for AI agents), Snowflake, Databricks, and Webhook connectors. Status indicators show active versus stale connections.
Why This Becomes Indispensable

Every other platform in the document space stores files. AI.DI stores intelligence. The gap between those two sentences is what makes AI.DI indispensable the longer you use it. A corpus of ten thousand documents with eighteen months of extraction history, anomaly signals, steward corrections, and financial time series data is not a folder of files anymore. It is your most strategic data asset. The deeper that asset goes, the harder it is to imagine running the organization without it.

Products · Tab 11
Millennia FileStar. The Document Warehouse of Record.
FileStar is a unified document warehouse of record that captures, verifies, and secures every critical document across all of your systems. Founded in 1996, imkore Millennia has delivered tailored document management solutions for complex enterprises across financial services, healthcare, pension administration, commercial real estate, government, and more — for nearly three decades. FileStar is the governance engine behind the AI.DI platform, providing the trusted document foundation that every other engine builds on.
The Command Center for All Your Essential Documents
A document warehouse is to documents what a data warehouse is to data — a governed, centralized environment where every critical document from across the enterprise is organized using a consistent structure, reliably searchable, and always ready for operational and analytical use. FileStar applies a deep, industry informed document taxonomy spanning more than 5,700 unique document types — covering HR, finance, legal, operations, compliance, and the full enterprise document lifecycle.
Component 01
FileStar ePort — Intelligent Document Capture

Effortlessly capture and ensure accurate classification from any application. FileStar centralizes all document types, paper or electronic, into a unified system with required fields and built in approval workflows that guarantee consistent, accurate archiving.

  • Scan and add documents using your own devices or multifunction peripherals
  • Upload single or multiple files in virtually any format with drag and drop simplicity
  • Email documents directly into the system for seamless capture from any source
  • Every PDF automatically converted to searchable format via OCR on upload
  • Required fields enforce completeness — nothing moves forward without the right metadata
Component 02
FileStar Workflow — Governed Approval Processes

FileStar enforces stringent controls and compliance with precision and accountability at every step. Complex workflows can be modeled to exact requirements with sequential and parallel routing, escalation paths, and automated notifications.

  • Seamless DocuSign integration with digital signatures built directly into workflows
  • Customizable automation rules streamline approvals across any business process
  • Mobile friendly access — approvals and routing from any device, anywhere
  • Comprehensive logging and reporting with full audit trails for compliance
  • Handles: Contract Administration, Wire Transfers, Accounts Payable, Journal Entries, Vendor Contracts, Benefit Requests, Budget Approvals, and more
Component 03
FileStar Archive — Secure Centralized Repository

Protect your critical documents in a centralized repository with security and compliance built in from the foundation. The Archive is the governed system of record. Every version, every action, every access event is logged and preserved.

  • Powerful flexible search — locate documents by type, asset, process, date, or any metadata field
  • Version control with full document history ensures accuracy across all revisions
  • Secure external sharing via trackable links with customizable expiration dates
  • Two factor authentication and Single Sign On for enhanced access control
  • Detailed system logs tracking all document access and actions for complete auditability
Open API Framework. Integration Ready by Design.
FileStar is built to live inside your existing technology stack, not beside it. The open API framework integrates directly with leading enterprise platforms so your documents flow automatically with the transactions and workflows that create them — captured, structured, and linked to source data in real time.
Integration
Enterprise ERP and Property Management
FileStar APIs connect directly with major ERP, accounting, and property management platforms. Documents are captured, structured, and linked to source data in real time. The result is a continuously accurate, compliant, and retrievable document of record that is always in sync with the systems generating the underlying transactions.
Integration
DocuSign — Digital Signature Workflows
Native DocuSign integration embeds digital signature workflows directly into FileStar processes. Executed agreements arrive pre classified, pre validated, and automatically archived into the correct location. No manual routing. No version confusion. No broken audit trail between signature and storage.
Integration
SharePoint and Enterprise Content Platforms
The SharePoint plugin allows workflows to be initiated and files to be added directly from SharePoint, making FileStar the governance and intelligence layer over existing content stores without requiring a migration. Documents in SharePoint become governed assets without leaving their existing location.
Document Warehouse. Key Concepts.
Document Schema — The DNA of the Warehouse

A document schema is the structured framework that defines how documents are identified, categorized, and related to one another. Just as a data warehouse relies on a data schema to bring order to large volumes of information, a document warehouse uses a document schema to create clarity, consistency, and predictable organization across all documents.

FileStar's schema spans more than 5,700 unique document types — giving it deep understanding of documents that support acquisitions, operations, financings, compliance, and every stage of the enterprise lifecycle. The schema automatically knows what a document is, how it should be classified, where it belongs, and what a complete document chain should look like. Documents are no longer scattered or mislabeled — they are organized consistently across systems and ready for audit, operations, and enterprise-wide decision making.

Metadata Extraction — Documents Become Structured Intelligence

Metadata is to documents what structured fields are to data. FileStar identifies and extracts key attributes (document type, parties, dates, asset identifiers, and relationships) transforming unstructured files into structured intelligence. Without metadata, documents behave like raw data with no schema. With metadata, they become organized, trustworthy knowledge assets that support search, governance, compliance, and AI.

Every document in FileStar is a governed asset aligned with a consistent taxonomy and storage structure — searchable through clear logical pathways by type, entity, process, source system, date, or business function. Dynamic views and dashboards give teams visibility into entire document collections, not just isolated files.

Auditability — Complete Chain of Custody

Auditability is a defining characteristic of a document warehouse. FileStar records every interaction with every document and preserves the full lineage of a record from its originating system through every update and review. Auditors can see exactly where a document came from, how it has been handled, and whether it remains complete and accurate.

FileStar also captures the source system, timestamps, authorship, and movement of each document — creating a verified chain of custody. This transparency builds trust across the organization and satisfies regulatory requirements without additional documentation work.

Security and Compliance — Built In, Not Bolted On

FileStar operates within an SSAE 18 certified hosting facility with annual SOC II audits. Role-based access controls ensure only authorized users and groups can access specific documents. All protocols comply with HIPAA and SOX guidelines for PII and PHI.

Compliance becomes easier when documents follow a consistent structure and lifecycle. FileStar enforces rules for document retention, validation, storage, and access — providing real time visibility into document completeness, timeliness, and accuracy. This makes it simpler to prove adherence to regulations and internal policies, and reduces the risk associated with missing or misplaced documents.

Services. Document Optimization and System Transformation.
Service
Document Optimization
Alignment and optimization of document workflows to ensure seamless integration, robust control, and improved productivity throughout the organization. We transform fragmented document ecosystems into a unified, cohesive framework, merging critical document silos into one streamlined system that aligns with your business objectives.
Service
Document Conversion — I:S3 Smart Scanning
From contracts to full size drawings, whether 10,000 pages or 10 million. imkore Millennia is the trusted source for seamless document conversion. The I:S3 Smart Document Scanning Service captures the contents of boxes or file cabinets and helps organizations decide what to Shred, Store, or Scan, on site or at the secure Chicago service bureau.
Service
System Transformation
Tailored strategies and expert guidance to unify disconnected systems into a streamlined, cohesive framework. FileStar is woven into existing ecosystems, enhancing efficiency, control, and integration across all workflows without disrupting current operations. From comprehensive assessments to implementing structured solutions, including data migrations, cleanup, and normalization.
The AI.DI Integration Pathway
FileStar as the AI.DI Governance Engine

FileStar governs documents. AI.DI makes them intelligent. FileStar managed documents automatically flow through Sentry certification and Abstract.DI extraction without any workflow change for existing users. All FileStar metadata syncs to the AI.DI Warehouse continuously.

Every FileStar client is one conversation away from the full AI.DI platform. No rip and replace. No migration project. No change management crisis. The upgrade path is a configuration change — the governance infrastructure is already in place.

Why imkore Millennia

imkore Millennia was founded in 1996 with a focus on tailored document solutions for complex requirements that standard document management software cannot easily meet. The combination of SaaS flexibility with customizable framework design means FileStar can be configured for specific industries, regulatory environments, and workflow structures without professional services for standard deployments.

  • SSAE 18 certified hosting facility with annual SOC II audit
  • HIPAA and SOX compliant protocols for PII and PHI handling
  • Pre-employment screening for all employees handling sensitive documents
  • Nearly three decades of enterprise document management expertise
  • Serving financial services, healthcare, pension administration, real estate, government, and more
"Our document processes were fragmented across multiple systems, making accessing information a constant challenge. With their unified framework, we now have one central platform — information is organized, accessible, and secure. Compliance has become much easier to manage, with everything traceable and stored in one place. imkore Millennia didn't just implement a solution — they transformed the way we work with our documents across the entire organization."
— Enterprise FileStar Client
Products · Tab 12
AI Orchestration and Agent Gateway. The Foundation That Makes Every LLM Actually Work.
AI.DI does not compete with LLMs. It is the layer they have always needed. Every organization deploying Copilot, GPT, Claude, or Gemini runs into the same wall. The AI is only as good as the documents it reasons from. Uncertified and unstructured documents produce hallucinations no matter how good the model is. AI.DI is the trusted Document Intelligence foundation that finally makes every LLM in your organization enterprise grade. Connect any AI agent. The agent immediately gains certified, structured, queryable access to every document that matters in your organization, with full provenance and zero hallucination.
Any
LLM. No Lock In.
14+
Production MCP Tools
Open
MCP and REST Protocols
RLS
Database Level Security
Zero
Hallucination Architecture
Live
Agent Provenance Tracking
Why AI Orchestration Was Built

Every CIO has seen the same pattern. The organization signs a deal for an AI assistant. The pilot goes well. The rollout begins. And then the AI starts hallucinating. Wrong contract terms. Made up dates. Confident answers that are factually false. The model is not the problem. The data is. The AI was reasoning over a chaotic, uncertified, duplicated mess of documents and producing exactly the answers that input deserves. AI Orchestration was built to solve this permanently. Every AI agent in your organization gets a single, trusted, certified, structured Document Intelligence foundation to reason from. Every answer is traceable to a specific certified document. Every fact has provenance. Every query respects access control. AI deployments finally graduate from pilot to production.

Why It Is So Open

Most AI vendors lock you into their model. We do not. Use Claude. Use Copilot. Use ChatGPT. Use Gemini. Use Grok. Use Llama. Use a model your team built. Use all of them at the same time. AI.DI is LLM agnostic by design. The published MCP server speaks Model Context Protocol for Claude, Cursor, and LangChain. The same endpoint serves REST and OpenAPI for ChatGPT GPT Actions and any HTTP capable agent framework. One endpoint. Two protocol surfaces. Every tool available on both. Your agents do not care which AI vendor your organization uses next.

What Your AI Agents Can Actually Do.
Connecting an AI agent to AI.DI is not a Q and A bot. It is the difference between a chatbot that pretends and an agent that knows. The agent can search certified documents, query extracted intelligence, find anomalies, forecast expirations, retrieve entire document families, compare versions, and answer questions in plain English with full provenance. Every action runs against your live, certified, structured Document Intelligence database.
Search and Discovery
Find Any Document. Answer Any Question.
  • search_documents. Full text search across the certified corpus
  • search_by_field. Find every document containing a specific extracted value
  • natural_language_query. Ask plain English questions, get structured answers
  • get_hierarchy. Navigate the entire org structure programmatically
Comparison and Family
Reason Over Document Relationships.
  • get_family. Pull the complete amendment chain for any document
  • compare_documents. Field by field comparison of any two versions
  • query_extractions. Field value queries with confidence filtering
  • get_completeness. Data quality scoring for any subset of the corpus
Risk and Compliance
Surface What Matters. Forecast What Is Coming.
  • find_anomalies. Anomaly detection with severity, status, and date filters
  • expiry_forecast. Thirty and ninety day expiration alerts
  • get_compliance_status. Real time compliance posture per entity
  • get_obligations. Live obligations and milestones extracted from documents
Security Built In, Not Bolted On

Every MCP key is tenant scoped. Every tool call is authenticated. Every query respects PostgreSQL row level security policies enforced at the database. An agent cannot access documents the connecting user is not authorized to see. Keys are revocable in one click. Every action is logged with full provenance. AI deployments meet enterprise security and audit requirements out of the box.

MCP Server. The AI Agent Gateway.
documentgateway.ai
Integration Studio — Live AI Agent Gateway
Click to enlarge
ORCHESTRATIONAI
Integration Studio — Live AI Agent Gateway
AI Orchestration · MCP Server + Connected AI Systems
This is the screen that enterprise AI teams have been waiting for. An MCP server exposes certified tools to any MCP compatible AI system — Claude, Cursor, LangChain, AutoGen, or any agent framework. The moment Claude.ai connects to this URL, it can search your certified document corpus, check compliance status on any asset, retrieve all obligations from any document set, run structured queries against the full Warehouse, navigate your org hierarchy, and retrieve signed access URLs for specific document versions. Every query enforces row level security at the database layer — the AI agent cannot access documents the connected user is not authorized to see. Keys are revocable instantly. Usage is logged. This is not middleware or a wrapper — it is a purpose built enterprise document intelligence API that treats your LLM as a trusted, auditable consumer of certified data rather than a summarizer of raw PDFs.
Twenty Eight Connectors. Every System You Already Use.
documentgateway.ai
Integration Studio — 28 Enterprise Connectors
Click to enlarge
ORCHESTRATIONAI
Integration Studio — 28 Enterprise Connectors
AI Orchestration · Full Connector Ecosystem
The platform connects to every system an enterprise already runs — which means "we already use X" has no purchase as an objection. Enterprise ERPs push operational documents, financial reports, and contract records directly into the AI.DI ingestion pipeline on a configured schedule or in response to events — contracts, invoices, compliance filings, and amendments arrive as first-class pipeline records rather than email attachments or manual uploads. Document management connectors (SharePoint, Google Drive, Box, OneDrive, Dropbox) make AI.DI additive: it reads, certifies, and extracts from existing storage without requiring a file migration. CRM platforms deliver agreements and correspondence as structured ingestion records. Observability and monitoring tools push operational documents as they are generated. Data warehouse connectors deliver extracted intelligence outbound to Snowflake, Databricks, BigQuery, and Redshift on configurable schedules. Every connection is configured through a guided wizard — no custom code, no IT project, no services engagement required.
Document IQ. AI Powered Portfolio Intelligence.
documentgateway.ai
Document IQ — Conversational AI Over a Certified Corpus
Click to enlarge
ORCHESTRATIONAI
Document IQ — Conversational AI Over a Certified Corpus
AI Orchestration · Portfolio-Wide AI Query Interface
What changes when AI reasons from a certified, structured corpus instead of raw files is not incremental — it is categorical. "What is missing from the vault?" is not a keyword search across folder names. It is a completeness calculation running against required document schemas for every entity in scope simultaneously, returning a ranked gap list with entity, document type, and days since last receipt. "Show critical risk items" is not a tag filter — it is an aggregation of violation flags, expiry warnings, anomaly detections, and compliance alerts across the entire corpus, sorted by risk magnitude. Upload any document and Document IQ cross-references it against vault records: matching party names, flagging version discrepancies, identifying superseded agreements, and surfacing every related obligation that touches the same entities. The difference between this and asking a general-purpose AI to analyze your documents is the difference between querying a continuously maintained structured database and asking someone who once read that database to remember what was in it. Trusted structure underneath the model is what makes every answerrtfolio into a single prioritized view that would take a compliance analyst days to compile manually. The file upload capability takes any document — a counterparty critical data extract, a vendor certificate, a financial statement — and cross references it against vault records in real time, identifying discrepancies, missing correlations, and data conflicts without a human pulling comparison reports. This is the AI experience that becomes the reason nobody opens a legacy platform again.
Data Lineage. Full Provenance for Every AI Answer.
documentgateway.ai
Data Intelligence — Data Lineage Map
Click to enlarge
ORCHESTRATIONWAREHOUSE
Data Intelligence — Data Lineage Map
AI Orchestration · End to End Data Provenance
When an AI agent answers a question using AI.DI data, every element of that answer has a traceable origin. The Data Lineage Map shows the complete pipeline from source document to consumer — enabling any data engineer, compliance officer, or auditor to trace exactly how a specific piece of intelligence was produced, what transformations it passed through, and which source document it ultimately came from. This is the infrastructure that eliminates LLM hallucination risk: every answer the AI returns is backed by a certified document, a specific extraction, a confidence score, and a provenance chain. The stale node indicators (Bridge Sync showing 0 fields synced) surface data freshness issues proactively — you know before an AI answer is delivered whether the underlying data is current. Provenance is not an afterthought in AI.DI. It is the foundation.
MCP Server. Seventeen Certified AI Agent Tools.
The AI.DI MCP Server serves dual protocols simultaneously: the Model Context Protocol for Claude and Cursor, and REST/OpenAPI for ChatGPT GPT Actions and any HTTP-capable agent framework. A single endpoint. Two protocol surfaces. All 17 tools available on both. Row-level security enforced at the PostgreSQL layer — the AI agent cannot access documents the authenticated user is not authorized to see.
Tool Category 01
Document Search and Retrieval
  • search_documents — full-text and metadata search across the certified corpus with relevance scoring
  • get_document — retrieve a specific document record with all extraction fields and a signed storage URL
  • list_documents_by_type — return all documents of a given classification type, filtered by entity or date range
  • get_document_versions — retrieve complete version history for any document including diff metadata
  • find_similar_documents — Sentry similarity search returning documents ranked by fingerprint distance
Tool Category 02
Extraction and Compliance Queries
  • get_extracted_fields — return all AI-extracted fields for a document with confidence scores and source references
  • query_extraction_fields — structured query against the extraction fields table with filters on any typed column
  • get_compliance_status — return the compliance posture for any entity: required documents, present, missing, expired
  • get_anomaly_flags — list all AI-detected anomalies across a specified entity or corpus scope
  • check_document_expiry — return documents expiring within a specified time window across any scope
Tool Category 03
Portfolio and Warehouse Intelligence
  • get_asset_hierarchy — navigate the org hierarchy from enterprise to entity level
  • warehouse_query — execute arbitrary SQL against the document warehouse with result pagination
  • get_ctr_score — retrieve the Continuous Transaction Readiness score for any entity or portfolio
  • get_portfolio_summary — aggregate document intelligence across a division or portfolio scope
  • list_extraction_schema — return the full field schema with types and occurrence counts for any document type
  • get_data_lineage — return the processing history of any document from ingest through warehouse
  • get_warehouse_metrics — return ingestion counts, extraction rates, and anomaly statistics for any time period
Edge Functions Powering the Orchestration Layer
agent-gateway — Intelligent Request Router

The agent-gateway edge function receives all AI agent requests and dispatches them to the appropriate tool handlers. It enforces authentication, validates the requesting agent's access scope, applies row-level security policies, and logs every tool invocation for the audit trail.

Supports Bearer token authentication for API clients and session-based auth for browser-connected agents. Rate limiting per API key. Tool-level permission grants — a key can be scoped to read only document retrieval without access to warehouse queries or compliance data.

mcp-server — Dual Protocol Gateway

A single Supabase Deno edge function serving both the Model Context Protocol (SSE transport for Claude and Cursor) and a REST/OpenAPI interface (for ChatGPT GPT Actions, LangChain, AutoGen, and any HTTP agent).

The same tool definitions, the same security model, the same data — two protocol surfaces from one deployment. ChatGPT integration operational. Deployed with --no-verify-jwt to support custom Bearer token auth independent of Supabase session auth.

erp-webhook — Inbound ERP Event Handler

Receives inbound webhook events from enterprise ERPs, CRM platforms, and any connected system. Validates payload signatures, routes events to the appropriate pipeline stage, and triggers document processing or metadata updates without human involvement.

When a contract is executed in an ERP, the erp-webhook fires the checkin-pipeline automatically — the document enters the AI extraction queue without anyone touching Document Gateway directly.

schedule-jobs + run-scheduled-reports — Autonomous Operations

Cron-triggered orchestration functions that run batch operations on a configurable schedule. Batch pipeline runs process large document queues during off-peak hours. Scheduled reports generate and distribute compliance summaries, expiry alerts, and portfolio intelligence reports automatically.

No human trigger required for ongoing operations. The platform monitors itself, processes new documents, updates CTR scores, and delivers reports on schedule — continuously.

Webhook Event Architecture
Event Type
document.ingested
Fires when any document completes the ingest pipeline. Validation passed. Stored. Queued for extraction. Payload includes document ID, file type, entity node, and submitter identity. Triggers downstream ERP updates or data warehouse prestaging.
Event Type
document.extracted
Fires when Abstract.DI completes field extraction on a document. Payload includes all extracted fields, confidence scores, and the document's workflow routing decision. Primary trigger for downstream analytics pipelines.
Event Type
anomaly.detected
Fires when Abstract.DI flags an extracted value as anomalous relative to corpus patterns. Payload includes the anomaly type, affected fields, expected range, and actual value. Used for real time alerting to portfolio managers or risk systems.
Event Type
compliance.updated
Fires when a document's compliance status changes. A new document arrives that completes a required set. A document expiry approaches threshold. An outstanding obligation is resolved. Triggers CTR score recalculation and stakeholder notifications.
Event Type
sync.completed
Fires when a connector sync run completes. Snowflake push. Databricks batch. Webhook batch delivery. Payload includes row counts, error counts, and sync duration. Used to confirm data freshness in downstream BI tools.
Config
Webhook Security Model
All outbound webhooks signed with HMAC-SHA256. Receiving endpoint validates signature before processing. Configurable per-event filtering. Retry logic with exponential backoff on 4xx and 5xx responses. Full delivery log available in the Connectors workspace.
Security, Audit, and Compliance Architecture
Row-Level Security — Database Enforced

Access control is not application layer middleware. Every Supabase table has PostgreSQL row level security policies that enforce which rows a given user can read, write, or delete, based on their role, their organization, and their specific entity permissions.

An AI agent authenticating with an API key receives exactly the same data access as the human user who created that key — not more, not less. Even if the agent constructs a warehouse query attempting to access data outside its scope, PostgreSQL silently returns only authorized rows. The restriction is invisible to the caller and unbypassable by any query construction.

API Keys, Audit Log, and Revocation

Every API key is scoped to a specific user, organization, and permission set at creation time. Keys can be restricted to specific tools, specific entities, or read only operations.

  • Instant revocation — key disabled at the database layer, all in-flight requests rejected immediately
  • Full audit log on every tool invocation: timestamp, user, tool name, parameters, result row count, latency
  • Usage analytics per key: call volume, top tools, error rates, and data volume
  • Key expiry with configurable TTL for time-limited integrations or contractor access
  • GDPR, HIPAA, SEC, and APA compliance maintained through architecture — no configuration required
AI.DI Is Not a Competitor to LLMs — It Is Their Prerequisite

Every enterprise deploying Copilot, GPT-4, Claude, or Gemini on their documents faces the same problem: the AI is only as good as the data it reasons from. Uncertified documents produce hallucinated answers. Unstructured files produce generic summaries. AI.DI is the certified, structured document foundation that transforms any LLM from a document summarizer into a reliable enterprise intelligence system.

Value & Strategy · Tab 13
Continuous Transaction Readiness™. The Score That Defines a New Category.
CTR is not just a feature. It is a brand new way to think about your documents. Your organization is always ready to respond to a capital call, close an acquisition, satisfy a regulator, onboard a counterparty, or distribute to a stakeholder, because AI.DI monitors, scores, routes, and maintains your entire document estate continuously, automatically, and in real time. You stop scrambling when the moment arrives. You are already ready.
The Primary Value Statement

AI.DI gives your organization Continuous Transaction Readiness — the state where every document across every system is accessible, authentic, current, and actionable at all times. Organizations that achieve this state lower their cost of capital, reduce audit risk, accelerate transactions, deploy AI with confidence, and eliminate the document scramble that precedes every critical business event.

Why CTR Changes the Game.
The Legacy Problem
Document Management Platforms Are Reactive. CTR Is Proactive.

Every document management platform ever built (M-Files, Hyland, Box, SharePoint, Laserfiche, OpenText) operates on the same passive model. A human asks a question. The system returns a file. The documents do not know they are incomplete. The system does not know a transaction is approaching. No one is told what is missing until the moment it actually matters.

CTR inverts this model. The platform continuously monitors the entire document estate against a dynamic requirement model, scores readiness in real time, and surfaces gaps before they become crises. The difference between reactive retrieval and proactive readiness is the difference between document management and document intelligence.

The Structural Barrier
CTR Requires Intelligence That File Storage Systems Cannot Generate.

To calculate a CTR score, you need to know: which documents are required, which are present, which are valid, which are current, which have changed, and which are expired. A file storage system knows none of this. It knows filenames and folder paths.

AI.DI knows all of this because Abstract.DI has read every document, Sentry has fingerprinted and certified every document, and the Warehouse stores every extracted field — including expiry dates, version identifiers, compliance flags, and obligation terms — as queryable structured data. CTR is computed from that data continuously. No competitor has that data. None can build it without starting over.

The Market Opportunity
Every Organization Has a Transaction in Its Future. None of Them Are Ready.

Every organization faces recurring high stakes document events. Regulatory audits. Financing processes. M&A due diligence. Partner onboarding. Contract renewals. Compliance filings. Board reviews. In every case the weeks before the event are consumed by the same document scramble. Finding files. Verifying versions. Hunting for missing certificates. Correcting outdated records.

CTR eliminates that scramble permanently. The organization is ready before the event is announced. That is not an incremental improvement. It is a fundamentally different value proposition — one that no existing platform can match because none of them understand what their documents say.

How CTR Is Calculated. Five Weighted Dimensions.
Sample Entity CTR Score
84/100
Near Ready — Minor Gaps
23/26
Docs Present
2
Expiring Soon
1
Violation
4.2d
Avg Response
Five Weighted Dimensions
Document Completeness88/100
23 of 26 required document types present and valid across this entity
Document Validity & Freshness76/100
2 regulatory certification documents expire within 45 days — alerts dispatched
Compliance & Regulatory Status71/100
1 active violation: Compliance Certificate version mismatch detected by Sentry fingerprint comparison
Distribution Readiness92/100
Complete document package deliverable to any counterparty within 2 hours from current state
Access & Permissioning Health97/100
All role assignments current. No orphaned access detected. Every stakeholder sees exactly what they should.
Score Interpretation
ScoreStatusTypical SituationTime to Transact
90–100Transaction ReadyAll documents present, current, and certified. No violations. Counterparty package deployable in hours.48 hours
75–89Near Ready1–3 documents missing or expiring. No active violations. Gaps identified and assigned.1–5 business days
55–74Attention RequiredMultiple gaps or 1–2 violations. Transaction possible but counterparty will surface issues.2–4 weeks
35–54Not ReadySignificant document gaps. Will not survive regulatory or counterparty diligence in current state.30–60 days
0–34CriticalSeverely incomplete or noncompliant. Immediate remediation required across multiple dimensions.90+ days
What CTR Delivers. Real Organizational Outcomes.
Outcome 01
Transactions Close Faster
The typical document scramble before a financing, acquisition, or regulatory filing takes 3 to 6 weeks. Teams chase files across drives, email chains, and vendor portals. Half the documents retrieved are wrong versions. AI.DI eliminates this entirely. A CTR score of 90+ means the counterparty package is ready before the counterparty asks.
Outcome 02
Audit Risk Drops to Near Zero
Regulators and auditors request specific documents with specific version requirements. AI.DI maintains a continuous, certified audit trail on every document — version history, access log, fingerprint certification, and extraction record. When the auditor requests a document from 18 months ago, the platform produces it in seconds, certified, with full provenance chain.
Outcome 03
AI Deployments Actually Work
Every major enterprise AI deployment is failing for the same reason: the documents feeding the model are unverified, duplicated, and structurally inconsistent. AI.DI solves this permanently. When your LLM reasons from AI.DI-certified documents, every answer is backed by a fingerprint-verified, extraction-validated, version-controlled source.
Outcome 04
Compliance Is Continuous, Not Cyclical
Most organizations achieve compliance for a moment — the audit, the filing deadline, the renewal date — then drift back into gaps. AI.DI makes compliance a continuous state, not a periodic sprint. Expiry alerts fire 90, 60, and 30 days before a document lapses. The CTR score reflects the current compliance posture at all times.
Outcome 05
Cost of Capital Improves
Lenders, investors, and ratings agencies price risk based in part on how prepared an organization is to respond to information requests. Organizations that can deliver complete, certified, structured document packages in hours demonstrate operational maturity that translates directly into better terms. The CTR score is a quantified, auditable measure of that maturity.
Outcome 06
The Document Scramble Is Eliminated Permanently
Every organization knows the document scramble: the all-hands search that precedes every critical business event. It is expensive, error-prone, and entirely avoidable. AI.DI eliminates it by maintaining Continuous Transaction Readiness as a permanent operational state. The organization does not prepare for the transaction. The organization is always prepared.
The Question Only AI.DI Can Answer Today.
Every legacy document management platform can be evaluated against a single question: can it tell you, right now, whether you are ready to transact? The answer is universally no — because transaction readiness requires knowing what your documents say, not just where they are stored.
CapabilityM-Files / Hyland / OpenTextBox / SharePointAI.DI
Real time readiness scoreNoneNoneCTR Score — continuous
Automatic gap detectionManual checklistNoneContinuous AI monitoring
Document content intelligenceMetadata tags onlyNoneFull field extraction
Expiry and validity trackingManual with remindersNoneAutomated from extracted dates
Counterparty package readinessManual assemblyManual assemblyPre-assembled, certified
Compliance posture visibilityPeriodic reportsNoneContinuous, real time
AI-ready data foundationRaw files onlyRaw files onlyCertified structured data
Version certificationVersion numbers onlyVersion numbers onlySentry fingerprint certified
"The question every organization needs to answer — and currently cannot — is: are we ready? AI.DI is the first platform that answers that question continuously, automatically, and with mathematical precision. CTR is not a score. It is proof that document intelligence has replaced document management."
— AI.DI platform design principle
Value & Strategy · Tab 14
For Data Scientists. The Document Intelligence Stack You Have Been Waiting For.
You have been asked to build AI on enterprise documents. You already know what that actually means. Unstructured PDFs. No provenance. Wrong versions. 40% duplicates. PII everywhere. No reliable way to trace any LLM answer back to a specific document. AI.DI is the infrastructure layer that solves every one of those problems through every interface you already use.
What You're Actually Getting

AI.DI is not a document management UI with an API bolted on. It is a document intelligence data platform: a PostgreSQL warehouse of structured document intelligence, a MCP server, a webhook event stream, a REST/GraphQL API, Snowflake Data Share, JDBC/ODBC direct access, vector embeddings on certified document chunks, and a 30-engine ML pipeline that improves continuously. Every document becomes structured, provenance tracked, certified data — available to any model, pipeline, or analytics tool you're running.

The Data Model You Are Querying.
TableContentsKey FieldsPrimary Use
document_recordsEvery document processedid, original_name, document_type, workflow_status, asset_id, classification_confidence, storage_pathDocument inventory, classification analysis
extracted_fieldsStructured extraction from Abstract.DIdocument_id, field_name, field_value, confidence_score, extraction_model, extraction_timestampContract analytics, financial extraction
sentry_fingerprintsCryptographic fingerprint recordsdocument_id, fingerprint_hash, fingerprint_type, certified_at, version_chain, similarity_scoresCertification, duplicate detection, fraud monitoring
hierarchy_nodesFull org hierarchyid, parent_id, node_type, node_name, industry, ctr_score, completeness_pctPortfolio analytics, CTR aggregation
document_activity_logEvery action on every documentdocument_id, event_type, actor_id, actor_role, timestamp, metadataAudit trail, access pattern analysis
vector_embeddingsEmbeddings on certified chunksdocument_id, chunk_id, certified_version_hash, embedding_vector, model_versionSemantic search, RAG retrieval, clustering
ctr_score_historyCTR Score time seriesnode_id, score, dimension_scores, calculated_at, delta_from_priorReadiness trending, portfolio benchmarking
Python SDK. Example Patterns.
from aidi import DocumentWarehouse
client = DocumentWarehouse(api_key="YOUR_KEY", tenant_id="YOUR_TENANT")

# Query all Q1 2027 lease expirations across a portfolio — certified docs only
expirations = client.extractions.query(
  document_type="commercial_lease", field="expiration_date",
  date_range=("2027-01-01", "2027-03-31"), certified_only=True
)

# Get version-locked embeddings for RAG pipeline
embeddings = client.vectors.get_certified_chunks(
  document_ids=expirations.document_ids(), version_locked=True
)

# Subscribe to certification events for real time model retraining
@client.events.on("document.certified", document_type="financial_statement")
async def on_new_financial_statement(event):
  extracted = await client.abstractions.get_fields(event.document_id)
  await my_model.retrain_incremental(extracted.to_feature_vector())
Value & Strategy · Tab 15
Wherever Documents Matter. AI.DI Matters.
Document Intelligence is industry agnostic. Every organization on earth is drowning in the same chaos. Critical documents scattered across systems. No one knows what is real, what is current, or what is missing until the moment it actually matters. AI.DI is the framework that ends that for every industry. The same platform that certifies a real estate portfolio runs a hospital compliance program, an insurance underwriting vault, a bank lending file, a law firm contract repository, or a manufacturing supplier corpus. The chaos is universal. So is the cure.
Built for Any Organization
Model your organization at any depth and any width. Enterprise. Division. Region. Entity. Program. Counterparty. A 500 asset fund, a 200 branch bank, a 50 hospital health system, a 15 plant manufacturer all map to the same hierarchy with zero configuration overhead.
Built for Any Volume
The same platform handles ten documents and ten million. Edge compute scales to zero when idle and to any volume on demand. No ops team. No provisioning. No performance cliffs as you grow into the framework.
Built for Any File Type
PDF, DOCX, XLSX, PPTX, MSG, EML, CSV, ZIP, JPEG, PNG, TIFF scans, and database records. No conversion. No preprocessing. A scanned fax and a native Word contract are processed with the same intelligence.
The Industries Living This Today.
Commercial Real Estate
Asset managers, sponsors, and operators across multifamily, office, industrial, retail, and mixed use
Live Industry
Document Types
  • Title policies and ALTA surveys
  • Lease abstracts and leases
  • Insurance certificates
  • Environmental studies (Phase I and II)
  • Appraisals and BPOs
  • Loan documents and notes
  • Certificates of occupancy
  • Property management agreements
Key Use Cases
  • Acquisition due diligence
  • Loan closing packages
  • Lender covenant compliance
  • Insurance renewal management
  • Portfolio disposition readiness
  • LP reporting distributions
CTR Impact
  • Diligence prep from six weeks to 48 hours
  • Insurance gap incidents eliminated
  • Assets pre qualified twelve months early
  • LP reports in one click, not two weeks
  • Refinancings closed in half the time
Private Equity and Asset Management
GPs, fund managers, and portfolio operations teams running fund and company level documentation
Live Industry
Document Types
  • Fund formation documents
  • LP subscription agreements
  • Cap tables and equity agreements
  • Material contracts
  • Audited financials
  • Board minutes and resolutions
  • Exit transaction documents
Key Use Cases
  • Portfolio company exit readiness
  • LP capital call packages
  • Annual audit preparation
  • Co investor reporting
  • Secondary transfer documents
CTR Impact
  • Exit prep begins eighteen months early
  • LP Q and A response under 24 hours
  • Audit cycle cut by 70%
  • Deal teams focus on diligence, not document hunting
Healthcare and Life Sciences
Health systems, hospitals, payers, and life sciences companies governing clinical, regulatory, and operational documents
High Impact Industry
Document Types
  • Provider credentialing files
  • Payer contracts and amendments
  • Clinical trial protocols and consents
  • HIPAA compliance documentation
  • FDA submissions and correspondence
  • Medical staff bylaws and policies
  • Vendor and BAA agreements
Key Use Cases
  • Joint Commission and CMS audit readiness
  • Provider credentialing turnaround
  • Clinical trial document control
  • Payer contract renewal cycles
  • HIPAA breach response evidence
CTR Impact
  • Audit binders assembled in hours, not weeks
  • Credentialing decisions in days, not months
  • Zero expired BAAs in production
  • Continuous regulatory readiness, not annual scramble
  • Every clinical document cryptographically certified
Insurance
Carriers, MGAs, brokers, and reinsurers handling underwriting, claims, and policy documentation at scale
High Impact Industry
Document Types
  • Policy applications and binders
  • Underwriting submissions
  • Claims files and adjuster reports
  • Loss runs and broker submissions
  • Reinsurance treaties
  • Producer agreements
  • Regulatory filings
Key Use Cases
  • Submission triage and routing
  • Claims documentation completeness
  • Loss run analysis at portfolio scale
  • Treaty renewal due diligence
  • State filing audit support
CTR Impact
  • Submission to quote cycle compressed by 60%
  • Claims fraud signals surfaced automatically
  • Treaty renewals start with full data, not blanks
  • Regulator examinations met with one click
  • Producer compliance monitored continuously
Banking and Financial Services
Commercial banks, credit unions, and capital markets firms managing lending, KYC, and regulatory documents
High Impact Industry
Document Types
  • Loan files and credit memos
  • KYC and CIP documentation
  • Customer onboarding packages
  • Collateral and lien documents
  • Regulatory filings (SEC, FINRA, OCC)
  • Trust and estate documents
  • Vendor due diligence files
Key Use Cases
  • Loan origination and underwriting
  • BSA and AML investigations
  • Examination preparation
  • Vendor risk reviews
  • Trust account documentation audits
CTR Impact
  • Loan packages assembled in hours
  • Exam responses delivered same day
  • KYC files always current and complete
  • Vendor reviews on schedule, every time
  • Continuous regulator readiness
Legal and Professional Services
Law firms, professional services firms, and corporate legal departments managing matter and contract repositories
High Impact Industry
Document Types
  • Contracts and amendments
  • Discovery productions
  • Matter files and pleadings
  • Engagement letters
  • Conflicts and intake documents
  • Closing binders
  • Privilege logs
Key Use Cases
  • Contract intelligence at portfolio scale
  • eDiscovery document review acceleration
  • Matter closing binder assembly
  • Conflicts checking
  • Engagement compliance monitoring
CTR Impact
  • Contract analysis in minutes, not weeks
  • Closing binders assembled automatically
  • Privilege calls supported by AI provenance
  • Matter intake friction eliminated
  • Every document searchable across the firm
Energy, Utilities, and Infrastructure
Oil and gas, renewables, utilities, and infrastructure operators managing land, regulatory, and asset documents
Industry Ready
Document Types
  • Land and lease agreements
  • Right of way and easement documents
  • Environmental permits and reports
  • Regulatory filings (FERC, EPA, state)
  • Asset and equipment records
  • Power purchase agreements
  • Title and division orders
Key Use Cases
  • Land position due diligence
  • Environmental compliance tracking
  • Permit renewal cycles
  • Asset transfer documentation
  • PPA and offtake contract management
CTR Impact
  • Land positions verifiable in real time
  • Permit lapses caught months in advance
  • Asset transfers close on schedule
  • Environmental audits answered with evidence
  • Continuous transaction readiness across the portfolio
Government and Public Sector
Federal, state, and local agencies governing records, FOIA responses, contracts, and regulatory filings
Industry Ready
Document Types
  • Procurement and contract files
  • FOIA request responses
  • Records management archives
  • Permit and license applications
  • Inspection and investigation reports
  • Grant documentation
  • Constituent correspondence
Key Use Cases
  • FOIA response acceleration
  • Records retention compliance
  • Procurement transparency
  • Inspector General readiness
  • Grant audit support
CTR Impact
  • FOIA responses delivered in days, not months
  • Audit findings cleared with documentation
  • Records retention enforced automatically
  • Public trust through cryptographic verification
  • Citizen response times transformed
Manufacturing and Supply Chain
Manufacturers, distributors, and supply chain operators governing supplier, quality, and compliance documentation
Industry Ready
Document Types
  • Supplier agreements and SOWs
  • Quality and inspection records
  • Certificates of origin and conformity
  • Material safety data sheets
  • ISO and regulatory certifications
  • Customs and trade documentation
  • Warranty and recall records
Key Use Cases
  • Supplier compliance monitoring
  • Quality audit preparation
  • Recall response coordination
  • Customs documentation accuracy
  • Tariff and trade compliance
CTR Impact
  • Supplier risk surfaced before disruption
  • Quality audits assembled automatically
  • Recall response time cut dramatically
  • Customs delays eliminated through clean docs
  • Continuous compliance across the supply chain
Higher Education and Research
Universities, research institutions, and academic medical centers governing student, research, and grant documentation
Industry Ready
Document Types
  • Student records and transcripts
  • Faculty and personnel files
  • Research grant documentation
  • IRB and research compliance files
  • Sponsored research agreements
  • Accreditation documentation
  • Donor and gift agreements
Key Use Cases
  • FERPA and accreditation readiness
  • Research grant audit preparation
  • IRB protocol documentation
  • Sponsored research compliance
  • Donor stewardship documentation
CTR Impact
  • Accreditation reviews answered with evidence
  • Grant audits closed in hours
  • Research integrity continuously certified
  • Donor reporting accurate and current
  • Institutional compliance never in doubt
Department Level Entry Points.
DepartmentAcute PainAI.DI Entry ProductExpansion Path
Legal / GCContract version disputes, discovery liability, GDPR complianceSentry certification + Document Gateway distributionFull Document Warehouse for corporate legal corpus
Finance / AccountingAudit prep fire drills, financial document reconciliationAbstract.DI batch (financial extraction) + Blueprint auditSentry certification + Warehouse integration to ERP
Compliance / RiskRegulatory filing tracking, compliance gaps, audit exposureSentry + Warehouse (compliance corpus) + CTR ScoreFull platform across regulated document types
Transactions / Deal TeamDue diligence prep time, data room chaosDocument Gateway + Distribution Studio + Transaction RoomsAbstract.DI batch for portfolio wide extraction
IT / Data EngineeringUnstructured data not in Snowflake; LLM hallucinationsDocument Warehouse + Snowflake + MCP ServerFull platform as enterprise document intelligence backbone
Operations / HREmployee records, policy tracking, onboarding complianceFileStar lifecycle governance + Abstract.DI HR extractionSentry certification + Document Gateway policy distribution
Get Started · Tab 16
Start With One Department. Get the Whole Platform.
AI.DI is not a stripped down pilot program. From your very first document, you have access to the entire platform. Every engine. Every view. Every integration. Full capability from day one is how we earn your full commitment. Start small if you want. The same infrastructure runs the largest portfolios in the industry and a single department deployment with equal power.
The imkore Philosophy — Do Some or Do It All

The world's largest institutional real estate portfolios run on the same platform as a 12-asset regional operator starting their first compliance program. A single compliance officer in one department gets the same AI intelligence, the same CTR Score, the same Warehouse, the same MCP server as a 500-person investment management firm running 20 funds. We built for scale from day one — which means the smallest client gets the most powerful platform available at any price point. No feature tiers. No locked capabilities. No "upgrade to get the real thing."

Three Ways to Start. All Paths Lead to the Same Platform.
Entry Path 01
Start with One Document Type
Pick your most painful document type — insurance certificates, leases, vendor contracts, financial statements. Run Abstract.DI on everything you have. Get a CTR Score on that category in 72 hours. See exactly what's missing, expiring, or wrong. The rest of the platform is right there when you're ready.
"We started with just our COIs. In three days we knew which assets were exposed. We hadn't done that audit in two years." — Property Operations Director
Entry Path 02
Start with One Department
Give legal, compliance, finance, or your deal team a standalone deployment. They get the full platform — just scoped to their hierarchy node and document types. No IT project. No enterprise rollout required. One steward, one asset group, full capability. When they prove ROI, the next department asks to join.
"Legal started it. Then finance wanted in. Then the deal team. We never ran a rollout — it spread itself." — Chief Operating Officer, PE Firm
Entry Path 03
Start with One Asset or Fund
Run a complete AI.DI deployment on a single asset or fund as a proof of concept with real production data. CTR Score goes live in 72 hours. Abstract.DI processes your existing archive in the first week. Distribution Studio sends your first LP package before the end of month one.
"We ran one asset. The CTR Score told us things we didn't know. That asset closed three months faster. Then we did the whole portfolio." — Managing Director, Enterprise Client
imkore Blueprint. The Highest Confidence Entry Point.
Advisory Service · $50K to $150K · 60–90 Days
imkore Blueprint — Document Intelligence Audit & Readiness Roadmap

Blueprint evaluates your entire document ecosystem (every repository, every system, every process) and delivers a scored readiness assessment plus a prioritized AI.DI product roadmap. Blueprint always reveals exactly which products you need and why. The roadmap we deliver is the AI.DI implementation plan for your organization.

01
Discover
Map all document repositories across all systems
02
Assess
Evaluate governance, structure, and integrity
03-04
Classify + Validate
Standardize taxonomies, confirm authenticity, remove duplicates
05-06
Structure + Certify
Apply metadata conventions, establish certified records
07-08
Enable + Optimize
Prepare for AI and automation, maintain Continuous Transaction Readiness
Pricing.
The Simplest Pricing in the Industry

You get the full framework. You only pay for what you use. Every customer gets the entire AI.DI platform on day one. Every engine. Every studio. Every integration. The only thing you ever pay for is the AI consumption your organization actually generates.

Principle 01
Full Framework Always Included
Every customer gets the complete platform from the very first document. No tiers. No feature gates. No "enterprise unlock." A team of three uses the same intelligence as a global enterprise.

+Document Gateway in full
+Abstract.DI extraction on every document
+Sentry certification and fingerprinting
+AI.DI Warehouse and BI connectors
+AI Orchestration and MCP server
+FileStar governance and integration
+CTR Score and ML Learning Studio
Principle 03
Pricing That Scales Honestly
The more documents AI.DI processes for your organization, the more value you get and the lower your per document cost. The framework gets smarter the more you use it. The pricing rewards you for using it.

+Volume tiers that reduce unit cost
+HITL Reduction lowers spend over time
+Annual prepay options available
+White label and API embedding pricing
+Strategic partner programs
+Enterprise support included
Talk to Partnerships
Frequently Asked Questions.

You get the full platform the moment you deploy. Every engine. Every view. Every integration. There are no feature gates, no capability tiers, and no "enterprise unlock" for core functionality. Your first document gets the same AI pipeline as document number one million. You see the full value of the framework immediately. You do not earn access to it through a ramp up process.

No. AI.DI layers over your existing infrastructure. Start with your highest priority document program or begin fresh with new documents. There is no requirement to migrate your entire historical archive before going live. The batch engine processes any legacy archive on its own timeline. You decide when and what to bring in.

Sentry generates a mathematical fingerprint, which is a unique hash derived from document content. Two identical documents always produce identical fingerprints. Any change produces a different fingerprint. The original document is never stored by Sentry. GDPR data minimization is achieved structurally. Your documents never leave your control.

The MCP server exposes six tools. search_documents. get_compliance_status. get_obligations. query_warehouse. get_hierarchy. get_document_url. Plug AI.DI into Claude, Cursor, LangChain, AutoGen, or any MCP compatible environment and your agents instantly gain certified document search and structured extraction queries. Authentication runs through OAuth2. Agents only access what the connecting user is authorized to see. Keys are revocable in one click.

Yes. The full platform deploys through Docker containers. No Kubernetes required. Azure, AWS, fully on premise, and hybrid (metadata in cloud, documents on premise) are all supported. Air gapped environments with no internet connectivity are also supported. Contact the enterprise team for deployment architecture details.

Snowflake Data Share (zero copy, no ETL). Databricks connector (Delta Lake, streaming). Tableau and Power BI native connectors. dbt compatibility. BigQuery export. Direct JDBC and ODBC access. REST API with full OpenAPI 3.0 spec. Python SDK. Webhook event streaming to any HTTP endpoint. SSO through SAML 2.0 and OAuth 2.0.

Request Access
Your next deal closes faster
when your documents are always ready.
Request a Simulator Key and explore a fully populated AI.DI demo environment with real document intelligence running on sample portfolio data — no implementation required.
Request Simulator Key Log In to Document Gateway
[email protected]  ·  documentgateway.ai  ·  imkore.ai