Skip to content

Chapter12

Chapter 12: Writing Your Own Backend\index{custom backend}

The eight-method contract is a complete specification. If you can answer each of the eight questions for a given graph store, you can write a BFS-QL backend for it, and everything above that layer -- traversal, filtering, caching, the six-tool MCP interface -- comes for free.

What "Correct" Means for Each Method

search_entities(query): Return a ranked list of EntityStub records whose names or aliases match the query string. "Ranked" means most-likely matches first. "Match" is implementation-defined: substring, vector similarity, full-text score, or exact match are all valid. Return at most 10-20 candidates. Do not filter by entity type here -- that is the caller's job. Return an empty list, not an error, if nothing matches.

edges_from(entity_id) / edges_to(entity_id): Return all outgoing or incoming edges for the entity. "All" means all -- do not apply relevance filters. BFS traversal needs complete topology; filtering happens at the server layer. Return Edge records with canonical IDs for subject and object; do not return metadata here. Raise KeyError if the entity does not exist; return [] if it exists but has no edges.

get_node(entity_id): Return a Node record with the entity's ID and type. This is the identity call, not the metadata call. It is fast. Raise KeyError if the entity does not exist or is inaccessible.

metadata_for_node(entity_id): Return a dict of all available metadata for the entity. Keys and types are backend-defined. Include everything: names, synonyms, descriptions, external links, confidence scores. The server passes this dict to the LLM as-is; the LLM decides what is relevant. Do not omit fields to save space -- the topology mode handles that at the server layer.

metadata_for_edge(edge): Return a dict of all available edge metadata. Include provenance: text spans, source documents, extraction confidence, creation timestamps. The server strips verbose provenance fields from BFS results (returning them only through describe_entity) but the backend should return everything and let the server decide what to expose.

entity_types() / predicates(): Return complete, stable lists. The server caches these indefinitely; they must not change during a session. Return them in a consistent order (alphabetical is conventional). Return an empty list if the graph has no schema (though this makes describe_schema useless and should be avoided).

A Worked Example: JSON-LD REST API Backend

To make the contract concrete, consider a JSON-LD REST API as a backend. The API exposes entities at /entities/{id} and their relationships at /entities/{id}/relations. A schema endpoint at /schema returns the vocabulary.

class JsonLdBackend(GraphDbInterface):
    def __init__(self, base_url: str) -> None:
        self._base = base_url.rstrip("/")
        self._session: aiohttp.ClientSession | None = None

    async def _get(self, path: str) -> dict:
        url = f"{self._base}{path}"
        async with self._session.get(url) as resp:
            return await resp.json()

    async def get_node(self, entity_id: str) -> Node:
        data = await self._get(f"/entities/{entity_id}")
        return Node(id=data["id"], entity_type=data["@type"])

    async def metadata_for_node(
        self, entity_id: str
    ) -> dict[str, Any]:
        data = await self._get(f"/entities/{entity_id}")
        return {
            k: v for k, v in data.items()
            if k not in ("id", "@type", "@context")
        }

    async def edges_from(self, entity_id: str) -> list[Edge]:
        path = f"/entities/{entity_id}/relations?direction=out"
        data = await self._get(path)
        return [
            Edge(
                subject=r["subject"],
                predicate=r["predicate"],
                object=r["object"]
            ) for r in data["relations"]
        ]

    async def edges_to(self, entity_id: str) -> list[Edge]:
        path = f"/entities/{entity_id}/relations?direction=in"
        data = await self._get(path)
        return [
            Edge(
                subject=r["subject"],
                predicate=r["predicate"],
                object=r["object"]
            ) for r in data["relations"]
        ]

    async def search_entities(
        self, query: str, node_types: list[str] | None = None
    ) -> list[EntityStub]:
        params = f"?q={query}&limit=10"
        if node_types:
            params += "&types=" + ",".join(node_types)
        data = await self._get(f"/entities{params}")
        return [EntityStub(id=e["id"], entity_type=e["@type"])
                for e in data["results"]]

    async def metadata_for_edge(
        self, edge: Edge
    ) -> dict[str, Any]:
        path = (
            f"/relations?subject={edge.subject}"
            f"&predicate={edge.predicate}"
            f"&object={edge.object}"
        )
        data = await self._get(path)
        return data.get("metadata", {})

    async def entity_types(self) -> list[str]:
        data = await self._get("/schema")
        return sorted(data["entity_types"])

    async def predicates(self) -> list[str]:
        data = await self._get("/schema")
        return sorted(data["predicates"])

This is approximately 60 lines. It is incomplete -- there is no session management, no error handling for missing entities, no JSON-LD context resolution. But it illustrates the contract. Once these eight methods work correctly, the backend can be passed to create_server() and immediately served through the full BFS-QL interface: six tools, stub/full filtering, multi-seed BFS, topology mode, LRU caching. None of that is in the backend. All of it comes for free.

The Bar Is Low; the Payoff Is Immediate

The eight-method interface is deliberately small. Its purpose is not to constrain what backends can do -- they can expose arbitrary metadata, use any storage technology, call any external service. Its purpose is to define the minimum surface that the BFS-QL server needs to function.

A backend that correctly implements all eight methods gets, automatically:

  • BFS traversal to any depth with concurrency across the frontier
  • Stub/full node and edge filtering based on the caller's node_types and predicates parameters
  • Topology mode: pure structural skeleton with no metadata
  • Multi-seed union: BFS from multiple seeds simultaneously
  • LRU caching at the primitive level: no repeated round-trips for the same entity or edge within a session
  • The full six-tool MCP interface (all six BFS-QL tools)
  • Schema injection: valid node_types and predicates injected into the bfs_query tool description when the schema is small enough

The cost is eight method implementations. The payoff is a fully functional LLM graph interface against any data store you can navigate.