A database where every fact
has a source

Substrate is a claim-native database. It doesn't just store data — it tracks where every fact came from, who corroborates it, and what to do when a source is wrong.

$ pip install substrate-db

A Different Kind of Primitive

A traditional database stores rows. A knowledge graph stores nodes and edges. Neither knows if the data is true, where it came from, or what happens when a source turns out to be wrong.

Substrate stores claims — assertions with a source attached. Not "BRCA1 is linked to breast cancer." Instead:

db.ingest(
    subject=("BRCA1", "gene"),
    predicate=("associated_with", "associated_with"),
    object=("Breast Cancer", "disease"),
    provenance={
        "source_type": "literature_extraction",
        "source_id": "pubmed:28536890",
    },
    confidence=0.91,
)

"PubMed paper 28536890 says BRCA1 is associated with breast cancer, confidence 0.91." That's a fundamentally different primitive.

Because the Engine Knows Where Every Fact Came From

Corroboration

Two independent sources say the same thing? Tracked automatically. Multi-source facts are stronger than single-source facts.

Self-Correction

A source turns out to be wrong? Retract it. Everything downstream degrades — but facts with independent confirmation survive.

Extraction

Feed it a Slack export or ChatGPT conversation. It extracts the claims itself, with provenance tracing back to the exact message.

Time Travel

What did we know last Tuesday? Query the knowledge base as it existed at any point in the past. Every claim carries a timestamp.

See It Work

import substrate

db = substrate.quickstart("knowledge.db", vocabs=["devops"])

# A Kubernetes manifest says api-gateway depends on Redis
db.ingest(
    subject=("api-gateway", "service"),
    predicate=("depends_on", "depends_on"),
    object=("redis", "service"),
    provenance={"source_type": "config_management", "source_id": "k8s-manifest-v2.3"},
    confidence=1.0,
)

# An incident chat independently confirms it
db.ingest_chat([
    {"role": "user", "content": "What broke when Redis went down?"},
    {"role": "assistant", "content": "API Gateway depends on Redis for session caching. It failed over."},
], conversation_id="incident-42", extraction="heuristic")

# Two independent sources now corroborate the same fact
frame = db.query("redis", depth=2)
print(frame.narrative)

# The K8s manifest is outdated? Retract it.
# The fact survives — the incident chat still supports it.
db.retract("k8s-manifest-v2.3", reason="Outdated config")

Heuristic extraction works offline with no API keys. For nuanced text, add an LLM provider — 7 supported, each needs its own API key.

Feed It Anything

Most organizational knowledge isn't structured — it's in conversations, documents, incident reports, and people's heads. Substrate bridges that gap.

Conversations

ChatGPT exports, Claude sessions, or any OpenAI-format messages. Claims are extracted per-turn with provenance tracing to the exact message.

Slack Workspaces

Export your workspace ZIP. Substrate finds the knowledge in your channels and extracts it — every claim traces back to the channel and thread.

Structured Sources

APIs, experiment logs, K8s manifests, monitoring configs, databases. Ingest one claim at a time or millions in batch.

Not a Replacement — a New Layer

You don't replace your existing databases with Substrate. You use Substrate for the knowledge that doesn't fit in a traditional database — the stuff that lives in conversations, documents, team discussions, and expert heads.

A knowledge graph is something Substrate produces. The graph is derived from claims. If you retract every claim, the graph is empty. The claims are the source of truth.

Provenance Is Structural

Not an optional metadata field. The engine rejects writes without a source. You always know where a fact came from.

Zero Infrastructure

pip install and go. Single-file database, no server, no config. Like SQLite, but for knowledge.

Works Offline

Heuristic extraction needs no API keys. Add LLM providers when you want deeper analysis, but the core engine works entirely locally.

Grows With You

Start with a few conversations. Scale to millions of claims with the Rust backend. Same API, same query interface.

Built For

Research

Scientific Teams

Track findings across papers, discussions, and experiments. When a paper is retracted, Substrate knows what downstream conclusions are affected.

Operations

Infrastructure Teams

Map service dependencies from configs and incident chats. "What breaks if Redis goes down?" — answered with every source that says so.

ML/AI

ML Teams

Track what was tried, what worked, and why. Every experiment result carries its provenance — queryable, comparable, and retractable.

Get Started

$ pip install substrate-db
import substrate

db = substrate.quickstart("team.db", vocabs=["devops"])

# Ingest a Slack export — claims extracted automatically
db.ingest_slack("slack_export.zip", extraction="heuristic")

# What do we know about api-gateway?
frame = db.query("api-gateway", depth=2)
print(frame.narrative)

# Where did we learn this?
for rel in frame.direct_relationships:
    print(f"  {rel.target.name} ({rel.provenance_count} sources, conf={rel.confidence:.2f})")

Full quick start guide →