Substrate is a claim-native database. It doesn't just store data — it tracks where every fact came from, who corroborates it, and what to do when a source is wrong.
A traditional database stores rows. A knowledge graph stores nodes and edges. Neither knows if the data is true, where it came from, or what happens when a source turns out to be wrong.
Substrate stores claims — assertions with a source attached. Not "BRCA1 is linked to breast cancer." Instead:
db.ingest( subject=("BRCA1", "gene"), predicate=("associated_with", "associated_with"), object=("Breast Cancer", "disease"), provenance={ "source_type": "literature_extraction", "source_id": "pubmed:28536890", }, confidence=0.91, )
"PubMed paper 28536890 says BRCA1 is associated with breast cancer, confidence 0.91." That's a fundamentally different primitive.
Two independent sources say the same thing? Tracked automatically. Multi-source facts are stronger than single-source facts.
A source turns out to be wrong? Retract it. Everything downstream degrades — but facts with independent confirmation survive.
Feed it a Slack export or ChatGPT conversation. It extracts the claims itself, with provenance tracing back to the exact message.
What did we know last Tuesday? Query the knowledge base as it existed at any point in the past. Every claim carries a timestamp.
import substrate db = substrate.quickstart("knowledge.db", vocabs=["devops"]) # A Kubernetes manifest says api-gateway depends on Redis db.ingest( subject=("api-gateway", "service"), predicate=("depends_on", "depends_on"), object=("redis", "service"), provenance={"source_type": "config_management", "source_id": "k8s-manifest-v2.3"}, confidence=1.0, ) # An incident chat independently confirms it db.ingest_chat([ {"role": "user", "content": "What broke when Redis went down?"}, {"role": "assistant", "content": "API Gateway depends on Redis for session caching. It failed over."}, ], conversation_id="incident-42", extraction="heuristic") # Two independent sources now corroborate the same fact frame = db.query("redis", depth=2) print(frame.narrative) # The K8s manifest is outdated? Retract it. # The fact survives — the incident chat still supports it. db.retract("k8s-manifest-v2.3", reason="Outdated config")
Heuristic extraction works offline with no API keys. For nuanced text, add an LLM provider — 7 supported, each needs its own API key.
Most organizational knowledge isn't structured — it's in conversations, documents, incident reports, and people's heads. Substrate bridges that gap.
ChatGPT exports, Claude sessions, or any OpenAI-format messages. Claims are extracted per-turn with provenance tracing to the exact message.
Export your workspace ZIP. Substrate finds the knowledge in your channels and extracts it — every claim traces back to the channel and thread.
APIs, experiment logs, K8s manifests, monitoring configs, databases. Ingest one claim at a time or millions in batch.
You don't replace your existing databases with Substrate. You use Substrate for the knowledge that doesn't fit in a traditional database — the stuff that lives in conversations, documents, team discussions, and expert heads.
A knowledge graph is something Substrate produces. The graph is derived from claims. If you retract every claim, the graph is empty. The claims are the source of truth.
Not an optional metadata field. The engine rejects writes without a source. You always know where a fact came from.
pip install and go. Single-file database, no server, no config. Like SQLite, but for knowledge.
Heuristic extraction needs no API keys. Add LLM providers when you want deeper analysis, but the core engine works entirely locally.
Start with a few conversations. Scale to millions of claims with the Rust backend. Same API, same query interface.
Track findings across papers, discussions, and experiments. When a paper is retracted, Substrate knows what downstream conclusions are affected.
Map service dependencies from configs and incident chats. "What breaks if Redis goes down?" — answered with every source that says so.
Track what was tried, what worked, and why. Every experiment result carries its provenance — queryable, comparable, and retractable.
$ pip install substrate-db
import substrate db = substrate.quickstart("team.db", vocabs=["devops"]) # Ingest a Slack export — claims extracted automatically db.ingest_slack("slack_export.zip", extraction="heuristic") # What do we know about api-gateway? frame = db.query("api-gateway", depth=2) print(frame.narrative) # Where did we learn this? for rel in frame.direct_relationships: print(f" {rel.target.name} ({rel.provenance_count} sources, conf={rel.confidence:.2f})")