API Reference — Substrate: A Claim-Native Database

Setup

Function	Description
`substrate.quickstart(path, vocabs=["bio"], curator="heuristic")`	Create a database with vocabularies and curator in one call.
`substrate.open(path, backend="kuzu")`	Open or create a database. Backends: `"kuzu"`, `"rust"`, `"auto"`.
`db.configure_curator(model)`	Set the curator. `"heuristic"` (offline) or a provider name (see below).
`db.register_vocabulary(namespace, vocab)`	Register entity types, predicates, and constraints for a domain.
`db.register_predicate(predicate_type, constraints)`	Register a single predicate with subject/object type constraints.
`db.register_payload_schema(predicate_type, schema)`	Register a JSON schema for payload validation on a predicate type.
`db.close()`	Close the database. Also works as a context manager: `with substrate.open(...) as db:`

Adding Claims

Method	Description
`db.ingest(subject, predicate, object, provenance, confidence=None, payload=None)`	Add a single claim. Returns the claim ID.
`db.ingest_batch(claims)`	Add many claims at once. Returns `BatchResult` with `.ingested` and `.duplicates`.
`db.ingest_chat(messages, conversation_id="", extraction="heuristic")`	Extract claims from a conversation. Accepts OpenAI/Anthropic message format.
`db.ingest_chat_file(path, platform="auto")`	Extract claims from a file: ChatGPT export ZIP, JSON conversation, or plain text.
`db.ingest_slack(path, channels=None, extraction="heuristic")`	Extract claims from a Slack workspace export ZIP. Optionally filter by channel name.
`db.ingest_text(text, source_id="")`	Extract claims from raw text.
`db.curate(claims)`	Run claims through the curator before ingesting. Returns stored/skipped/flagged.

Extraction modes

Mode	API Key?	When to use
`"heuristic"`	No	Explicit relational text ("X depends on Y"). Fast and free.
`"llm"`	Yes	Nuanced or implicit relationships. Deeper understanding.
`"smart"`	Yes	Large volumes. Heuristic first, LLM only for new content. Saves cost.

Querying

Method	Description
`db.query(entity, depth=2)`	Get a full picture of an entity: relationships, narrative summary, confidence scores.
`db.explain(entity, depth=2)`	Same as `query()` but also returns timing and candidate counts.
`db.search(query, k=10)`	Semantic search via embeddings. Returns the top-k closest claims.
`db.resolve(entity_name)`	Resolve an entity name to its canonical normalized ID.
`db.get_entity(entity_id)`	Get an entity summary: name, type, claim count.
`db.claims_for(entity, predicate_type=None)`	Get raw claims for an entity. Optionally filter by predicate.
`db.claims_by_content_id(content_id)`	Get all claims about the same fact (corroboration group).
`db.list_entities(entity_type=None, min_claims=0)`	List all entities. Optionally filter by type or minimum claim count.
`db.path_exists(entity_a, entity_b, max_depth=3)`	Check if two entities are connected.
`db.find_paths(entity_a, entity_b, max_depth=3, top_k=5)`	Find the top-k paths between two entities with edge details.
`db.raw_query(cypher)`	Escape hatch: run a raw Cypher query against the underlying Kuzu store.

Understanding Your Knowledge Base

Method	Description
`db.schema()`	What entity types, predicates, and relationship patterns exist, with counts.
`db.stats()`	Entity count, claim count, index size.
`db.quality_report()`	Health metrics: single-source entities, source distribution, knowledge density.
`db.find_gaps(expected_patterns)`	Find entities that are missing expected relationships.
`db.find_bridges(top_k=20)`	Predict potential connections between currently-unlinked entities.
`db.find_confidence_alerts()`	Find entities with reliability concerns (single-source, stale data).
`db.knowledge_health()`	Quantified health score (0-100) with metrics: multi-source ratio, corroboration, freshness, source diversity, confidence trend.

Health score breakdown

Metric	Weight	What it measures
Multi-source ratio	30%	Fraction of entities backed by more than one source
Corroboration ratio	25%	Fraction of claims independently confirmed
Freshness	20%	Recency of claims (30-day half-life decay)
Source diversity	15%	Number of distinct source types
Confidence trend	10%	Whether confidence is improving over time

Knowledge Topology

Method	Description
`db.compute_topology()`	Run Leiden community detection on the claim graph. Returns a `TopologyResult`.
`db.topics()`	Get the topic hierarchy from the last topology computation.
`db.density_map()`	Get density metrics per topic: claim count, source diversity, coverage.
`db.cross_domain_bridges()`	Find entities that connect different knowledge domains.
`db.query_topic(topic_id)`	Get all entities belonging to a specific topic.
`db.generate_structural_embeddings()`	Generate SVD-based graph embeddings for all entities.
`db.generate_weighted_structural_embeddings()`	Generate confidence-weighted SVD graph embeddings.

Provenance and Trust

Method	Description
`db.retract(source_id, reason)`	Mark all claims from a source as retracted.
`db.retract_cascade(source_id, reason)`	Retract a source and mark anything that depended on it as degraded.
`db.trace_downstream(claim_id)`	See what other claims depend on a specific claim.
`db.at(timestamp)`	Query the knowledge base as it was at a specific point in time.

Research Questions

Method	Description
`db.ingest_inquiry(question, subject, object, predicate_hint="")`	Register a question you want answered.
`db.open_inquiries()`	List all unanswered questions.
`db.check_inquiry_matches(subject_id=None, object_id=None)`	Check if new claims match any open questions.

Backup and Restore

Method	Description
`db.snapshot(dest_path)`	Copy the database to a backup directory.
`SubstrateDB.restore(src_path, dest_path)`	Restore a database from a snapshot.

LLM Providers

Set the environment variable for your provider, then configure:

Provider	Environment Variable	Configure
Groq	`GROQ_API_KEY`	`db.configure_curator("groq")`
OpenAI	`OPENAI_API_KEY`	`db.configure_curator("openai")`
Anthropic	`ANTHROPIC_API_KEY`	`db.configure_curator("anthropic")`
DeepSeek	`DEEPSEEK_API_KEY`	`db.configure_curator("deepseek")`
Grok	`GROK_API_KEY`	`db.configure_curator("grok")`
OpenRouter	`OPENROUTER_API_KEY`	`db.configure_curator("openrouter")`
GLM	`GLM_API_KEY`	`db.configure_curator("glm")`

No API key? Use "heuristic" mode — it works entirely offline.

Data Types

Adding a claim

db.ingest(
    subject=("name", "type"),          # e.g. ("api-gateway", "service")
    predicate=("relationship", "class"), # e.g. ("depends_on", "depends_on")
    object=("name", "type"),             # e.g. ("redis", "service")
    provenance={
        "source_type": "...",           # What kind of source
        "source_id": "...",             # Identifies the specific source
    },
    confidence=0.9,                    # 0.0 to 1.0 (optional)
    payload={...},                     # Any structured data (optional)
)

Query result

frame = db.query("redis")
frame.focal_entity           # The entity you queried
frame.claim_count            # Number of claims about it
frame.direct_relationships   # Connected entities with predicates and confidence
frame.narrative              # Human-readable summary

Batch input

from substrate import ClaimInput

claims = [
    ClaimInput(
        subject=("api-gateway", "service"),
        predicate=("depends_on", "depends_on"),
        object=("redis", "service"),
        provenance={"source_type": "config_management", "source_id": "k8s"},
    ),
    # ... more claims
]
result = db.ingest_batch(claims)

Event Hooks

Subscribe to lifecycle events. Callbacks run synchronously after the operation completes. Errors in callbacks are logged, never propagated — your pipeline keeps running.

Method	Description
`db.on(event, callback)`	Register a callback for a lifecycle event.
`db.off(event, callback)`	Remove a registered callback.

Events

Event	Kwargs	Fires when
`"claim_ingested"`	`claim_id`, `claim_input`	After each `ingest()` call
`"claim_corroborated"`	`content_id`, `count`	A newly ingested claim matches an existing one
`"source_retracted"`	`source_id`, `reason`, `claim_ids`	After `retract()`
`"inquiry_matched"`	`inquiry_id`, `claim_id`	A newly ingested claim answers an open inquiry

Example

def on_new_claim(claim_id, claim_input, **kw):
    print(f"New claim: {claim_id}")

def on_corroboration(content_id, count, **kw):
    print(f"Corroborated! {count} independent sources")

db.on("claim_ingested", on_new_claim)
db.on("claim_corroborated", on_corroboration)

Agent Integration

Two ways for external agents to read and write Substrate — choose based on your agent framework.

MCP Server (Model Context Protocol)

For Claude Desktop, Claude Code, and any MCP-compatible agent. Ships as a CLI tool.

$ pip install substrate-db[mcp]
$ SUBSTRATE_DB_PATH=my.db substrate-mcp

Exposes 15 tools (ingest_claim, query_entity, search_entities, knowledge_health, retract_source, etc.) and 2 resources (substrate://entities, substrate://schema) over stdio transport.

Claude Desktop configuration

{
  "mcpServers": {
    "substrate": {
      "command": "substrate-mcp",
      "env": {
        "SUBSTRATE_DB_PATH": "my_knowledge.db"
      }
    }
  }
}

REST API

For web-based agents, custom integrations, or any HTTP client. Mounted at /api/v1/ in the Substrate Console.

$ pip install substrate-console
$ substrate-console my.db

Method	Path	Description
`POST`	`/api/v1/claims`	Ingest a single claim
`POST`	`/api/v1/claims/batch`	Bulk-ingest claims
`POST`	`/api/v1/claims/text`	Extract claims from text
`GET`	`/api/v1/entities`	List entities
`GET`	`/api/v1/entities/{id}`	Get entity summary
`GET`	`/api/v1/entities/{id}/claims`	Claims about an entity
`GET`	`/api/v1/entities/{id}/context`	Full context frame
`GET`	`/api/v1/paths/{a}/{b}`	Find paths between entities
`POST`	`/api/v1/retract`	Retract a source
`GET`	`/api/v1/schema`	Schema descriptor
`GET`	`/api/v1/stats`	Database statistics
`GET`	`/api/v1/health`	Knowledge health metrics
`GET`	`/api/v1/quality`	Quality report
`GET`	`/api/v1/insights/bridges`	Bridge predictions
`GET`	`/api/v1/insights/gaps`	Confidence alerts

Example

# Ingest a claim via REST
curl -X POST http://localhost:8877/api/v1/claims \
  -H "Content-Type: application/json" \
  -d '{
    "subject": ["api-gateway", "service"],
    "predicate": ["depends_on", "dependency"],
    "object": ["redis", "service"],
    "source_type": "k8s_manifest",
    "source_id": "deploy/prod"
  }'

# Query an entity
curl http://localhost:8877/api/v1/entities/redis/context

# Check knowledge health
curl http://localhost:8877/api/v1/health

Every method maps to one lifecycle

Setup

Adding Claims

Extraction modes

Querying

Understanding Your Knowledge Base

Health score breakdown

Knowledge Topology

Provenance and Trust

Research Questions

Backup and Restore

LLM Providers

Data Types

Adding a claim

Query result

Batch input

Event Hooks

Events

Example

Agent Integration

MCP Server (Model Context Protocol)

Claude Desktop configuration

REST API

Example