Skip to content

Chapter11

Chapter 11: The Neo4j Backend\index{Neo4j backend}

Neo4j is a property graph database, not an RDF store. The distinction matters for the implementation, though not for the BFS-QL interface. Where RDF graphs represent everything as triples of URIs and literals, property graphs attach key-value pairs directly to nodes and relationships. A node in Neo4j has a label (or multiple labels) and a set of properties. A relationship has a type and a set of properties. There are no blank nodes; everything is either a node or a named relationship.

For BFS-QL, the mapping from the property graph model to the GraphDbInterface is direct:

  • Node labels → entity_type
  • Relationship types → predicates
  • Node identity (Neo4j's internal ID or a canonical ID property) → entity ID
  • Node properties → metadata_for_node
  • Relationship properties → metadata_for_edge

The implementation requires one configuration decision: which node property holds the canonical ID. In a kgraph-derived Neo4j graph, this would be entity_id. In a general Neo4j graph, it might be id, uri, name, or something domain-specific. The backend is initialized with the canonical ID property name.

Cypher Traversal

edges_from and edges_to are natural Cypher traversals:

-- edges_from(entity_id)
MATCH (n {entity_id: $id})-[r]->(m)
RETURN n.entity_id AS subject,
       type(r) AS predicate,
       m.entity_id AS object

-- edges_to(entity_id)
MATCH (n)-[r]->(m {entity_id: $id})
RETURN n.entity_id AS subject,
       type(r) AS predicate,
       m.entity_id AS object

type(r) returns the relationship type as a string, which becomes the predicate. Neo4j relationship types are uppercase by convention (TREATS, INHIBITS, ASSOCIATED_WITH); BFS-QL predicates are lowercase by convention. The backend normalizes to lowercase at query time.

search_entities in Neo4j requires a full-text index. Unlike Postgres (which can fall back to ILIKE) or a SPARQL endpoint (which can use CONTAINS on labels), Neo4j has no built-in substring search on node properties. A full-text index must be created at graph construction time:

CREATE FULLTEXT INDEX entity_names
  FOR (n:Entity) ON EACH [n.name, n.synonyms]

With the index in place, search_entities becomes:

CALL db.index.fulltext.queryNodes("entity_names", $query)
YIELD node, score
RETURN node.entity_id AS id, labels(node)[0] AS entity_type
ORDER BY score DESC
LIMIT 10

The index requirement is a constraint on graph construction, not on BFS-QL. A Neo4j graph served through BFS-QL must have the index; graphs without it cannot support search_entities. The backend checks for index existence at initialization and raises a clear error if it is missing, rather than failing silently at query time.

entity_types and predicates

-- entity_types()
CALL db.labels() YIELD label RETURN label ORDER BY label

-- predicates()
CALL db.relationshipTypes() YIELD relationshipType
RETURN relationshipType ORDER BY relationshipType

Neo4j's db.labels() and db.relationshipTypes() procedures return the complete label and relationship type vocabularies without scanning the graph. They are fast, stable, and the natural implementation of entity_types and predicates. No SELECT DISTINCT required.