| Function | Description |
|---|---|
substrate.quickstart(path, vocabs=["bio"], curator="heuristic") |
Create a database with vocabularies and curator in one call. |
substrate.open(path, backend="kuzu") |
Open or create a database. Backends: "kuzu", "rust", "auto". |
db.configure_curator(model) |
Set the curator. "heuristic" (offline) or a provider name (see below). |
db.register_vocabulary(namespace, vocab) |
Register entity types, predicates, and constraints for a domain. |
db.close() |
Close the database. Also works as a context manager: with substrate.open(...) as db: |
| Method | Description |
|---|---|
db.ingest(subject, predicate, object, provenance, confidence=None, payload=None) |
Add a single claim. Returns the claim ID. |
db.ingest_batch(claims) |
Add many claims at once. Returns BatchResult with .ingested and .duplicates. |
db.ingest_chat(messages, conversation_id="", extraction="heuristic") |
Extract claims from a conversation. Accepts OpenAI/Anthropic message format. |
db.ingest_chat_file(path, platform="auto") |
Extract claims from a file: ChatGPT export ZIP, JSON conversation, or plain text. |
db.ingest_slack(path, channels=None, extraction="heuristic") |
Extract claims from a Slack workspace export ZIP. Optionally filter by channel name. |
db.ingest_text(text, source_id="") |
Extract claims from raw text. |
db.curate(claims) |
Run claims through the curator before ingesting. Returns stored/skipped/flagged. |
| Mode | API Key? | When to use |
|---|---|---|
"heuristic" | No | Explicit relational text ("X depends on Y"). Fast and free. |
"llm" | Yes | Nuanced or implicit relationships. Deeper understanding. |
"smart" | Yes | Large volumes. Heuristic first, LLM only for new content. Saves cost. |
| Method | Description |
|---|---|
db.query(entity, depth=2) |
Get a full picture of an entity: relationships, narrative summary, confidence scores. |
db.explain(entity, depth=2) |
Same as query() but also returns timing and candidate counts. |
db.claims_for(entity, predicate_type=None) |
Get raw claims for an entity. Optionally filter by predicate. |
db.claims_by_content_id(content_id) |
Get all claims about the same fact (corroboration group). |
db.list_entities(entity_type=None) |
List all entities. Optionally filter by type. |
db.path_exists(entity_a, entity_b, max_depth=3) |
Check if two entities are connected. |
| Method | Description |
|---|---|
db.schema() |
What entity types, predicates, and relationship patterns exist, with counts. |
db.stats() |
Entity count, claim count, index size. |
db.quality_report() |
Health metrics: single-source entities, source distribution, knowledge density. |
db.find_gaps(expected_patterns) |
Find entities that are missing expected relationships. |
db.find_bridges(top_k=20) |
Predict potential connections between currently-unlinked entities. |
db.find_confidence_alerts() |
Find entities with reliability concerns (single-source, stale data). |
| Method | Description |
|---|---|
db.retract(source_id, reason) |
Mark all claims from a source as retracted. |
db.retract_cascade(source_id, reason) |
Retract a source and mark anything that depended on it as degraded. |
db.trace_downstream(claim_id) |
See what other claims depend on a specific claim. |
db.at(timestamp) |
Query the knowledge base as it was at a specific point in time. |
| Method | Description |
|---|---|
db.ingest_inquiry(question, subject, object, predicate_hint="") |
Register a question you want answered. |
db.open_inquiries() |
List all unanswered questions. |
db.check_inquiry_matches(subject_id=None, object_id=None) |
Check if new claims match any open questions. |
| Method | Description |
|---|---|
db.snapshot(dest_path) |
Copy the database to a backup directory. |
SubstrateDB.restore(src_path, dest_path) |
Restore a database from a snapshot. |
Set the environment variable for your provider, then configure:
| Provider | Environment Variable | Configure |
|---|---|---|
| Groq | GROQ_API_KEY | db.configure_curator("groq") |
| OpenAI | OPENAI_API_KEY | db.configure_curator("openai") |
| Anthropic | ANTHROPIC_API_KEY | db.configure_curator("anthropic") |
| DeepSeek | DEEPSEEK_API_KEY | db.configure_curator("deepseek") |
| Grok | GROK_API_KEY | db.configure_curator("grok") |
| OpenRouter | OPENROUTER_API_KEY | db.configure_curator("openrouter") |
| GLM | GLM_API_KEY | db.configure_curator("glm") |
No API key? Use "heuristic" mode — it works entirely offline.
db.ingest( subject=("name", "type"), # e.g. ("api-gateway", "service") predicate=("relationship", "class"), # e.g. ("depends_on", "depends_on") object=("name", "type"), # e.g. ("redis", "service") provenance={ "source_type": "...", # What kind of source "source_id": "...", # Identifies the specific source }, confidence=0.9, # 0.0 to 1.0 (optional) payload={...}, # Any structured data (optional) )
frame = db.query("redis") frame.focal_entity # The entity you queried frame.claim_count # Number of claims about it frame.direct_relationships # Connected entities with predicates and confidence frame.narrative # Human-readable summary
from substrate import ClaimInput claims = [ ClaimInput( subject=("api-gateway", "service"), predicate=("depends_on", "depends_on"), object=("redis", "service"), provenance={"source_type": "config_management", "source_id": "k8s"}, ), # ... more claims ] result = db.ingest_batch(claims)