Setup

FunctionDescription
substrate.quickstart(path, vocabs=["bio"], curator="heuristic") Create a database with vocabularies and curator in one call.
substrate.open(path, backend="kuzu") Open or create a database. Backends: "kuzu", "rust", "auto".
db.configure_curator(model) Set the curator. "heuristic" (offline) or a provider name (see below).
db.register_vocabulary(namespace, vocab) Register entity types, predicates, and constraints for a domain.
db.close() Close the database. Also works as a context manager: with substrate.open(...) as db:

Adding Claims

MethodDescription
db.ingest(subject, predicate, object, provenance, confidence=None, payload=None) Add a single claim. Returns the claim ID.
db.ingest_batch(claims) Add many claims at once. Returns BatchResult with .ingested and .duplicates.
db.ingest_chat(messages, conversation_id="", extraction="heuristic") Extract claims from a conversation. Accepts OpenAI/Anthropic message format.
db.ingest_chat_file(path, platform="auto") Extract claims from a file: ChatGPT export ZIP, JSON conversation, or plain text.
db.ingest_slack(path, channels=None, extraction="heuristic") Extract claims from a Slack workspace export ZIP. Optionally filter by channel name.
db.ingest_text(text, source_id="") Extract claims from raw text.
db.curate(claims) Run claims through the curator before ingesting. Returns stored/skipped/flagged.

Extraction modes

ModeAPI Key?When to use
"heuristic"NoExplicit relational text ("X depends on Y"). Fast and free.
"llm"YesNuanced or implicit relationships. Deeper understanding.
"smart"YesLarge volumes. Heuristic first, LLM only for new content. Saves cost.

Querying

MethodDescription
db.query(entity, depth=2) Get a full picture of an entity: relationships, narrative summary, confidence scores.
db.explain(entity, depth=2) Same as query() but also returns timing and candidate counts.
db.claims_for(entity, predicate_type=None) Get raw claims for an entity. Optionally filter by predicate.
db.claims_by_content_id(content_id) Get all claims about the same fact (corroboration group).
db.list_entities(entity_type=None) List all entities. Optionally filter by type.
db.path_exists(entity_a, entity_b, max_depth=3) Check if two entities are connected.

Understanding Your Knowledge Base

MethodDescription
db.schema() What entity types, predicates, and relationship patterns exist, with counts.
db.stats() Entity count, claim count, index size.
db.quality_report() Health metrics: single-source entities, source distribution, knowledge density.
db.find_gaps(expected_patterns) Find entities that are missing expected relationships.
db.find_bridges(top_k=20) Predict potential connections between currently-unlinked entities.
db.find_confidence_alerts() Find entities with reliability concerns (single-source, stale data).

Provenance and Trust

MethodDescription
db.retract(source_id, reason) Mark all claims from a source as retracted.
db.retract_cascade(source_id, reason) Retract a source and mark anything that depended on it as degraded.
db.trace_downstream(claim_id) See what other claims depend on a specific claim.
db.at(timestamp) Query the knowledge base as it was at a specific point in time.

Research Questions

MethodDescription
db.ingest_inquiry(question, subject, object, predicate_hint="") Register a question you want answered.
db.open_inquiries() List all unanswered questions.
db.check_inquiry_matches(subject_id=None, object_id=None) Check if new claims match any open questions.

Backup and Restore

MethodDescription
db.snapshot(dest_path) Copy the database to a backup directory.
SubstrateDB.restore(src_path, dest_path) Restore a database from a snapshot.

LLM Providers

Set the environment variable for your provider, then configure:

ProviderEnvironment VariableConfigure
GroqGROQ_API_KEYdb.configure_curator("groq")
OpenAIOPENAI_API_KEYdb.configure_curator("openai")
AnthropicANTHROPIC_API_KEYdb.configure_curator("anthropic")
DeepSeekDEEPSEEK_API_KEYdb.configure_curator("deepseek")
GrokGROK_API_KEYdb.configure_curator("grok")
OpenRouterOPENROUTER_API_KEYdb.configure_curator("openrouter")
GLMGLM_API_KEYdb.configure_curator("glm")

No API key? Use "heuristic" mode — it works entirely offline.

Data Types

Adding a claim

db.ingest(
    subject=("name", "type"),          # e.g. ("api-gateway", "service")
    predicate=("relationship", "class"), # e.g. ("depends_on", "depends_on")
    object=("name", "type"),             # e.g. ("redis", "service")
    provenance={
        "source_type": "...",           # What kind of source
        "source_id": "...",             # Identifies the specific source
    },
    confidence=0.9,                    # 0.0 to 1.0 (optional)
    payload={...},                     # Any structured data (optional)
)

Query result

frame = db.query("redis")
frame.focal_entity           # The entity you queried
frame.claim_count            # Number of claims about it
frame.direct_relationships   # Connected entities with predicates and confidence
frame.narrative              # Human-readable summary

Batch input

from substrate import ClaimInput

claims = [
    ClaimInput(
        subject=("api-gateway", "service"),
        predicate=("depends_on", "depends_on"),
        object=("redis", "service"),
        provenance={"source_type": "config_management", "source_id": "k8s"},
    ),
    # ... more claims
]
result = db.ingest_batch(claims)