The knowledge that matters most — incident patterns, tribal expertise, research hunches, the thing your ops team figured out last Tuesday — lives in conversations and documents that vanish. LLMs can finally read all of it. Substrate is where they put what they find.
Your best engineer knows the payment service fails at 180 connections, not 200 like the docs say. A researcher noticed a link between two pathways nobody's published yet. Your ops team figured out that Tuesday deploys cause more incidents — it's buried in a Slack thread from March.
This is the most valuable knowledge your organization has. None of it is in any database.
It can't be — databases need clean, single-version-of-truth data.
Real organizational knowledge is contradictory, multi-source, and changes every week.
It breaks INSERT statements and ON CONFLICT clauses.
LLMs change this. They can read Slack exports, scan incident reports, process research papers. They can finally access this knowledge. But they need somewhere to put it that handles contradictions natively and tracks who said what.
"The East Palestine water is contaminated." Here's how three different databases store that:
When an agent extracts 500 facts from your Slack channels overnight, you need every write to carry its source. Not as metadata you hope someone fills in — as a hard requirement the engine enforces. If two sources contradict each other, both claims coexist. When one is discredited, you retract it and the other survives.
Watch knowledge build up from multiple sources — then see what happens when one turns out to be wrong. Each scenario auto-plays and loops.
This is the part that matters most.
Run Substrate for a week and you have structured notes. Run it for six months and you have a reality model — an emergent map of everything your organization knows, where the knowledge is deep, where it's thin, and where different domains connect in ways no single person noticed.
Topics emerge automatically from the claim graph. Your agents extracted 300 claims about your auth system and 4 about data center power capacity. That's not just data — that's a map of where you're knowledgeable and where you're blind. The insight engine finds connections across domain boundaries: auth failures always follow database connection issues, but nobody documented the dependency.
Two years of accumulated evidence can't be speed-run by a competitor. The topology gets richer. Cross-domain connections surface. The organization that started earlier has an advantage that compounds daily and is nearly impossible to replicate.
pip install substrate-db — single-file database, no server, no infrastructure.
Point it at a Slack export, a ChatGPT conversation, or a folder of documents —
the built-in extractor (heuristic or LLM-powered, 7 providers supported)
pulls out claims with provenance tracing to the exact message.
Heuristic mode needs no API keys.
Want a visual interface? pip install substrate-console gives you a browser
dashboard that connects to live Slack, Gmail, and Google Docs via OAuth.
Ingest your company's knowledge, explore an interactive graph, and ask natural-language
questions — all for ~$0.06 on Groq's free tier.
Provenance is required on every write — the engine rejects claims without a source,
whether the writer is a human or an agent. Batch mode handles millions of claims
via the Rust backend. db.at(timestamp) gives you point-in-time queries —
what the agents knew last Tuesday, before the new data came in.
For AI agents: the built-in MCP server lets Claude and other
MCP-compatible agents read and write Substrate directly. A REST API at /api/v1/
serves any HTTP client. Event hooks (db.on("claim_ingested", callback)) let you
build reactive pipelines that trigger when knowledge changes.
An agent reads 200 papers overnight and extracts findings. It notices your team has deep knowledge on kinase inhibition and almost nothing on the metabolic pathway that might connect to it — a gap no single researcher would see.
Agents ingest 18 months of Slack incident channels and postmortem docs. "What breaks if Redis goes down?" — answered from claims extracted across hundreds of incidents, each traceable to the person who figured it out.
If your agents produce knowledge — from customer calls, experiments, market research, code reviews — Substrate is the database layer that makes it compound instead of evaporate.
$ pip install substrate-db substrate-console $ substrate-console my_company.db
Opens a dashboard at localhost:8877.
Click Connect Slack — authorize your workspace.
Click Connect Google — authorize Gmail, Drive, Docs.
Go to Ingest. Pick your channels. Hit go.
No API keys. No OAuth apps to create. No environment variables. Your data flows directly between your machine and Slack/Google. Full quick start guide →