GQL and Cypher

🔗 2024-06-04

GQL ISO/IEC 39075:2024 is a new database query language for property graphs that was recently published (April, 2024) by ISO/IEC JTC 1/SC 32/WG 3 – the same committee that publishes and maintains the SQL query language standards. The scope of GQL is to provide a language for storing and querying Property Graph structured data.

A property graph can be described as a directed multigraph where the nodes and edges may have:

labels - like a set of “type names” associated with the target
properties - a set of name/value pairs

A simple way to conceptualize a property graph is with an entity-relationship model but with the addition of attributes for relations . For example, a very simple graph of movies, actors, and genres might start with an ER diagram as follows:

erDiagram

   Movie ||--o{ Person : actor
   Movie ||--o{ Genre : genre
   Movie {
      string title
   }
   Person {
      string name
   }
   Genre {
      string name
   }

Note:

Relations may also have attributes - if we go back to the original “The Entity-Relationship Model – Toward a unified view of data”, Chen, 1976, you will see: “The information about an entity or a relationship is obtained by observation or measurement, and is expression by a set of attribute-value pairs.” As such, an ER diagram a great way to describe a property graph, but diagramming tools (e.g., mermaid) don’t always help you show it that way. Also, while you can represent an n-ary relationship in an ER diagram, you can’t necessarily restrict the cardinality of a relationship in the property graph.

Querying property graphs

Property graphs are not new and there are many graph databases that provide query languages for property graphs (not exhaustive):

Many of these databases use a query language called Cypher which has an open source specification.

From the above list, some of these companies participated in the development in GQL. As such, it shouldn’t be a surprise that GQL resembles Cypher in many ways. In fact, some Cypher queries are valid GQL queries.

So, what does a GQL query look like (apart from Cypher-like)?

Moving from Cypher to GQL

You can read the openCypher language specification which contains many examples of queries and expressions you can use in Cypher. At the current time, you would be hard-pressed to find a similar primer on GQL with examples. The standard is also long and focus on the query syntax and semantics and so it also does not contain examples.

I did not personally participate on the GQL standards, but I have been trying to track its progress and so after ISO published the standard, I immediately purchase a copy (yes, that is how ISO works and, yes, I’m that much of a nerd). It is long and detailed specification and also not a casual read. After working through many of the grammars, I went about developing a full parser in python that is in a very “alpha” state at the moment.

Note:

Stay tuned while I work on a primer for GQL and towards releasing that parser in the near future. Those two go together as I need a good conformance test suite. Unless one exists somewhere … ISO?

I have been using my parser it to validate my forays writing hypothetical GQL queries. As such, I went through the openCypher specification and tried to translate the examples. Let’s go through a few examples from the introduction.

Example 1

“Nodes with name “John” who have more than three “FRIEND” relations.”

MATCH (n {name: 'John'})-[:FRIEND]-(friend)
  WITH n, count(friend) AS friendsCount
  WHERE friendsCount > 3
  RETURN n, friendsCount

Cypher Example 1

GQL doesn’t have the WITH construct but does have LET and FILTER and so this becomes:

MATCH (n {name: 'John'})-[:FRIEND]-(friend)
  LET friendsCount = count(friend)
  FILTER friendsCount > 3
  RETURN n, friendsCount

GQL Example 1 (version 1)

But GQL has some optionality in the syntax and so this may also be:

MATCH (n {name: 'John'})-[:FRIEND]-(friend)
  LET friendsCount = count(friend)
  FILTER WHERE friendsCount > 3
  RETURN n, friendsCount

GQL Example 1 (version 2)

Note:

I’m still reading through the semantics and so it isn’t clear whether FILTER WHERE exp is different from FILTER exp.

Example 2

Another example for mutating the graph:

MATCH (n {name: 'John'})-[:FRIEND]-(friend)
  WITH n, count(friend) AS friendsCount
  SET n.friendsCount = friendsCount
  RETURN n.friendsCount

Cypher Example 2

MATCH (n {name: 'John'})-[:FRIEND]-(friend)
  SET n.friendsCount = count(friend)
  RETURN n.friendsCount

GQL Example 2

Note:

I’m aggregating the friend count in the SET statement. I question whether the LET should be there to perform the aggregation outside the context of the SET even though I omitted it in the above. I will have to see as I dig deeper.

Understanding linear statements

In the published BNF, you’ll see that the above GQL queries are broken down into different statements that are chained together. These queries eventually end up being parsed as an ambient linear query statement and that is processed as a sequence of simple query statement productions followed by at primitive result statement.

So, the prior GQL examples turn into:

match + let + filter + return statements
match + set + return statements

And all of these statements are chained together by an implementing system.

---
title: ambient linear query statement
---
flowchart LR
  S[➡]
  E[➡]
  S --> simpleLinearQueryStatement
  S --> primitiveResultStatement
  simpleLinearQueryStatement --> primitiveResultStatement
  primitiveResultStatement --> E
  subgraph simpleLinearQueryStatement[simple linear query statement]
    simpleQueryStatement[simple query statement]
    simpleQueryStatement --> simpleQueryStatement
  end
  subgraph primitiveResultStatement[primitive result statement]
    returnStatement["return statement"]
  end

---
title: simple query statement
---
flowchart LR
  S[➡]
  E[➡]
  callQueryStatement[call query statement]
  S --> primitiveQueryStatement --> E
  S --> callQueryStatement  --> E
  subgraph primitiveQueryStatement[primitive query statement]
    direction LR
    S1[➡]
    E1[➡]
    matchStatement[match statement]
    letStatement[let statement]
    forStatement[for statement]
    filterStatement[filter statement]
    orderByPage[order by and page statement]
    S1 --> matchStatement --> E1
    S1 --> letStatement --> E1
    S1 --> forStatement --> E1
    S1 --> filterStatement --> E1
    S1 --> orderByPage --> E1
  end

What’s next?

With a working parser I can validate my syntactic assumptions of what is and is not GQL. The next steps are to map my expected semantics to the actual semantics of the language. I understand how Cypher works and so how does that translate to GQL? That requires a deeper understand of how results are built from statements and there is a specification for that!

Back to reading, … but I am going to be back with more after I peel back the next 🧅 layer.