Chapter12
Chapter 12: Writing Your Own Backend\index{custom backend}¶
The eight-method contract is a complete specification. If you can answer each of the eight questions for a given graph store, you can write a BFS-QL backend for it, and everything above that layer -- traversal, filtering, caching, the six-tool MCP interface -- comes for free.
What "Correct" Means for Each Method¶
search_entities(query): Return a ranked list of EntityStub records
whose names or aliases match the query string. "Ranked" means most-likely
matches first. "Match" is implementation-defined: substring, vector
similarity, full-text score, or exact match are all valid. Return at most
10-20 candidates. Do not filter by entity type here -- that is the caller's
job. Return an empty list, not an error, if nothing matches.
edges_from(entity_id) / edges_to(entity_id): Return all outgoing
or incoming edges for the entity. "All" means all -- do not apply relevance
filters. BFS traversal needs complete topology; filtering happens at the
server layer. Return Edge records with canonical IDs for subject and
object; do not return metadata here. Raise KeyError if the entity does
not exist; return [] if it exists but has no edges.
get_node(entity_id): Return a Node record with the entity's ID
and type. This is the identity call, not the metadata call. It is fast.
Raise KeyError if the entity does not exist or is inaccessible.
metadata_for_node(entity_id): Return a dict of all available
metadata for the entity. Keys and types are backend-defined. Include
everything: names, synonyms, descriptions, external links, confidence
scores. The server passes this dict to the LLM as-is; the LLM decides
what is relevant. Do not omit fields to save space -- the topology mode
handles that at the server layer.
metadata_for_edge(edge): Return a dict of all available edge metadata.
Include provenance: text spans, source documents, extraction confidence,
creation timestamps. The server strips verbose provenance fields from BFS
results (returning them only through describe_entity) but the backend
should return everything and let the server decide what to expose.
entity_types() / predicates(): Return complete, stable lists. The
server caches these indefinitely; they must not change during a session.
Return them in a consistent order (alphabetical is conventional). Return
an empty list if the graph has no schema (though this makes describe_schema
useless and should be avoided).
A Worked Example: JSON-LD REST API Backend¶
To make the contract concrete, consider a JSON-LD REST API as a backend.
The API exposes entities at /entities/{id} and their relationships at
/entities/{id}/relations. A schema endpoint at /schema returns the
vocabulary.
class JsonLdBackend(GraphDbInterface):
def __init__(self, base_url: str) -> None:
self._base = base_url.rstrip("/")
self._session: aiohttp.ClientSession | None = None
async def _get(self, path: str) -> dict:
url = f"{self._base}{path}"
async with self._session.get(url) as resp:
return await resp.json()
async def get_node(self, entity_id: str) -> Node:
data = await self._get(f"/entities/{entity_id}")
return Node(id=data["id"], entity_type=data["@type"])
async def metadata_for_node(
self, entity_id: str
) -> dict[str, Any]:
data = await self._get(f"/entities/{entity_id}")
return {
k: v for k, v in data.items()
if k not in ("id", "@type", "@context")
}
async def edges_from(self, entity_id: str) -> list[Edge]:
path = f"/entities/{entity_id}/relations?direction=out"
data = await self._get(path)
return [
Edge(
subject=r["subject"],
predicate=r["predicate"],
object=r["object"]
) for r in data["relations"]
]
async def edges_to(self, entity_id: str) -> list[Edge]:
path = f"/entities/{entity_id}/relations?direction=in"
data = await self._get(path)
return [
Edge(
subject=r["subject"],
predicate=r["predicate"],
object=r["object"]
) for r in data["relations"]
]
async def search_entities(
self, query: str, node_types: list[str] | None = None
) -> list[EntityStub]:
params = f"?q={query}&limit=10"
if node_types:
params += "&types=" + ",".join(node_types)
data = await self._get(f"/entities{params}")
return [EntityStub(id=e["id"], entity_type=e["@type"])
for e in data["results"]]
async def metadata_for_edge(
self, edge: Edge
) -> dict[str, Any]:
path = (
f"/relations?subject={edge.subject}"
f"&predicate={edge.predicate}"
f"&object={edge.object}"
)
data = await self._get(path)
return data.get("metadata", {})
async def entity_types(self) -> list[str]:
data = await self._get("/schema")
return sorted(data["entity_types"])
async def predicates(self) -> list[str]:
data = await self._get("/schema")
return sorted(data["predicates"])
This is approximately 60 lines. It is incomplete -- there is no session
management, no error handling for missing entities, no JSON-LD context
resolution. But it illustrates the contract. Once these eight methods work
correctly, the backend can be passed to create_server() and immediately
served through the full BFS-QL interface: six tools, stub/full filtering,
multi-seed BFS, topology mode, LRU caching. None of that is in the backend.
All of it comes for free.
The Bar Is Low; the Payoff Is Immediate¶
The eight-method interface is deliberately small. Its purpose is not to constrain what backends can do -- they can expose arbitrary metadata, use any storage technology, call any external service. Its purpose is to define the minimum surface that the BFS-QL server needs to function.
A backend that correctly implements all eight methods gets, automatically:
- BFS traversal to any depth with concurrency across the frontier
- Stub/full node and edge filtering based on the caller's
node_typesandpredicatesparameters - Topology mode: pure structural skeleton with no metadata
- Multi-seed union: BFS from multiple seeds simultaneously
- LRU caching at the primitive level: no repeated round-trips for the same entity or edge within a session
- The full six-tool MCP interface (all six BFS-QL tools)
- Schema injection: valid
node_typesandpredicatesinjected into thebfs_querytool description when the schema is small enough
The cost is eight method implementations. The payoff is a fully functional LLM graph interface against any data store you can navigate.