1.011 Graph Database Clients#


Explainer

Graph Database Clients: For Technical Decision Makers#

Purpose: Help CTOs, architects, and product managers understand Python graph database client libraries without deep graph theory expertise.

Audience: Technical leaders evaluating graph database adoption, teams planning migrations, developers choosing between query languages.


What This Solves#

Graph database client libraries solve the connection and interaction problem between your Python application and graph database systems.

The Core Problem: You have relationship-heavy data (social networks, fraud rings, supply chains, knowledge graphs) that traditional SQL databases handle poorly. Graph databases excel at traversing multi-hop relationships, but you need a way for your Python code to:

  • Send queries to the graph database
  • Receive and process results efficiently
  • Manage connections, transactions, and errors
  • Abstract away database-specific protocols

Who Encounters This:

  • Startups building social features, recommendation engines, or knowledge bases
  • Enterprise teams implementing fraud detection, network analysis, or supply chain optimization
  • Data scientists constructing knowledge graphs for LLM-powered applications (GraphRAG)
  • SaaS developers adding relationship-driven features to existing products

Why It Matters: Choosing the wrong client library locks you into a specific database vendor, query language, and ecosystem. Migration costs can reach hundreds of thousands of dollars in engineering time. The choice made today determines your flexibility, operational costs, and feature velocity for years.


Accessible Analogies#

The Translator Analogy#

Think of your graph database as a foreign city, and the client library as your tour guide/translator:

  • Neo4j driver (Cypher): A local expert who speaks the native language fluently, knows all the shortcuts, and has deep cultural knowledge. Fastest and most natural, but only works in this one city.

  • gremlinpython (Gremlin): A professional translator who works across multiple cities in the same region (TinkerPop family). Slightly less fluent in each individual dialect, but you can move between cities (Neptune, Cosmos DB, JanusGraph) without finding a new guide.

  • TigerGraph (GSQL): A hyper-specialized guide for a unique city with its own invented language. Incredibly effective for that specific place, but you can’t take this guide anywhere else—total lock-in.

The Filing System Analogy#

Imagine organizing a library:

  • Relational databases (SQL): Books organized in strict categories with catalog cards. Finding related books means walking to different sections, checking multiple cards (JOINs). Slow for “show me all mystery novels written by authors who also wrote sci-fi and were influenced by Author X.”

  • Graph databases: Every book has strings connecting it to related books, authors, and genres. Following those strings (traversals) is instant. The client library is the tool that lets you pull strings and read the labels.

The Protocol Translator#

Your Python code speaks Python. Your graph database speaks a specialized protocol (Bolt, WebSocket, gRPC, HTTP). The client library is the interpreter that:

  • Translates your session.run("MATCH (n:User) RETURN n") into Bolt binary protocol
  • Manages the TCP connection pool
  • Deserializes results back into Python objects

Without it, you’d be manually crafting binary packets—impractical and error-prone.


When You Need This#

✅ You Need Graph Database Clients If:#

Relationship depth matters (3+ hops):

  • “Find friends-of-friends-of-friends who like this product” (social networks)
  • “Trace transaction chains 5 levels deep” (fraud detection)
  • “Show supply chain impact 4 tiers upstream” (risk analysis)

Pattern matching drives value:

  • Detecting rings, cycles, or suspicious subgraph structures
  • Knowledge graph reasoning (entity → relationship → entity chains)
  • Network influence propagation

Data model is naturally a graph:

  • Org charts with complex reporting structures
  • Infrastructure dependency maps
  • Recommendation engines with collaborative filtering

❌ You DON’T Need This If:#

Simple CRUD operations:

  • User profiles with basic lookups (use PostgreSQL)
  • Document storage without complex relationships (use MongoDB)
  • Time-series data (use InfluxDB, TimescaleDB)

All relationships are 1-2 hops:

  • Basic “user has many orders” (SQL foreign keys suffice)
  • Simple hierarchies (category → subcategory → product)

You’re already solving it well:

  • PostgreSQL with recursive CTEs handling your traversals adequately
  • Current solution meets performance SLAs and you’re not hitting scaling issues

Decision Criteria: When to Upgrade to Graph#

Your SituationRecommendation
10M+ nodes, 4+ hop traversals regularlyDedicated graph DB (Neo4j, Neptune)
<1M nodes, occasional 3-hop queriesPostgreSQL with AGE extension (graph as feature)
Multi-model needs (graph + documents)ArangoDB (BSL license limits apply)
AWS-committed, managed service preferredAmazon Neptune (Gremlin/openCypher)
Need semantic reasoning (RDF triples)rdflib (SPARQL ecosystem)

Trade-offs#

Query Language Lock-in Spectrum#

Most Portable ←----------------------------------→ Most Locked-In
   Gremlin          Cypher/GQL          GSQL         Proprietary
   (multi-DB)       (converging)      (TigerGraph)    (single vendor)

Gremlin (gremlinpython) - Portability Choice:

  • Pros: Works across Neptune, Cosmos DB, JanusGraph, DataStax
  • Pros: Apache governance, multi-vendor PMC
  • Cons: Imperative style is harder to read than declarative Cypher
  • Cons: Doesn’t work with Neo4j (market leader)

Cypher (neo4j) - Developer Experience Choice:

  • Pros: Most readable query language, largest community (44% market share)
  • Pros: Converging to ISO GQL standard (future-proofing)
  • Pros: Best GraphRAG and LLM integration (2025+)
  • Cons: Neo4j dominance means de facto lock-in
  • Cons: GQL standardization timeline uncertain (2026-2028 expected)

GSQL (pyTigerGraph) - Performance Choice:

  • Pros: Best performance for deep analytics (10+ hop traversals)
  • Pros: Turing-complete query language (complex algorithms)
  • Cons: Complete vendor lock-in (portability score: 2/10)
  • Cons: Smaller ecosystem and hiring pool

OGM (Object-Graph Mapper) vs Raw Driver#

OGM (neomodel) - Productivity:

  • ✅ Django-style models, faster CRUD development (30-50% time savings)
  • ❌ Hides query details, can obscure performance issues
  • ❌ Adds abstraction overhead

Raw Driver (neo4j, gremlinpython) - Control:

  • ✅ Full query optimization control
  • ✅ No abstraction overhead
  • ❌ More boilerplate code for simple CRUD

Recommendation: Start with raw driver, add OGM if CRUD patterns dominate.

Build vs Buy (Self-Hosted vs Managed)#

Self-Hosted (Community Edition):

  • ✅ Free for Neo4j Community, no usage limits
  • ✅ Full control over infrastructure
  • ❌ Operational burden (backups, scaling, monitoring)
  • ❌ No enterprise support or high availability (Community)

Managed Services (AuraDB, Neptune, Cosmos):

  • ✅ Zero operational overhead
  • ✅ Built-in backups, scaling, monitoring
  • ❌ $65-2,000+/month (can reach $10K+ at scale)
  • ❌ Cloud vendor lock-in (migration complexity)

Cost Considerations#

Direct Costs#

ServiceEntry PriceScale PriceNotes
Neo4j AuraDB$65/month$2,000+/monthManaged, auto-scaling
Amazon Neptune$0.10/hour + storageVariablePay per instance + I/O
Self-hosted Neo4j Community$0Infrastructure onlyNo clustering/HA
Self-hosted Neo4j Enterprise$36K+/yearLicense + infraHA, support, advanced features

Hidden Costs#

Learning Curve:

  • Cypher/Gremlin training: 1-2 weeks per developer
  • Graph modeling: 2-4 weeks for team to shift thinking
  • Query optimization: Ongoing (graph queries need different tuning than SQL)

Migration Lock-in Cost:

  • Cypher to Gremlin: 3-6 months for medium codebase (all queries rewritten)
  • GSQL to anything: 6-12 months (proprietary language, no migration tools)
  • Data export/import: 1-4 weeks depending on volume and schema complexity

Opportunity Cost Examples:

  • Choosing TigerGraph for a small project: If you outgrow GSQL lock-in, migration cost = $200K-500K in engineering time
  • Skipping GraphRAG opportunity: Companies report 300-320% ROI on knowledge graph implementations (LinkedIn: 63% ticket resolution improvement)

ROI Break-Even Analysis#

When graph DB pays for itself:

  • Query performance gains: 10-100x faster for 3+ hop traversals
  • Developer productivity: 40% faster feature delivery for relationship-heavy features (anecdotal, industry reports)
  • Fraud detection improvements: 50% better detection rates (finance industry data)

Typical payback period: 6-18 months for teams with clear relationship-heavy use cases.


Implementation Reality#

First 90 Days: What to Expect#

Weeks 1-2: Learning & Proof of Concept

  • Install local Neo4j or use AuraDB free tier
  • Team learns Cypher/Gremlin basics (online tutorials: 5-10 hours per developer)
  • Model one core use case (e.g., user → friend → product recommendation)
  • Build simple prototype showing multi-hop traversal

Weeks 3-6: Production Architecture

  • Choose managed vs self-hosted (decision driven by team expertise)
  • Set up connection pooling, error handling, retry logic
  • Migrate subset of production data
  • Benchmark queries against existing SQL approach (expect 10-50x improvement for graph queries)

Weeks 7-12: Integration & Optimization

  • Integrate with existing Python services
  • Tune indexes (graph indexes work differently than SQL—expect learning curve)
  • Handle schema evolution (graph schemas are more flexible, but migrations still needed)
  • Monitor performance under load

Team Skill Requirements#

Essential:

  • Comfortable with Python async/await patterns (if using async clients)
  • Willingness to learn declarative query language (Cypher is SQL-like, Gremlin is more programmatic)
  • Graph modeling mindset (thinking in nodes/edges vs tables/rows)

Nice to Have:

  • Prior experience with NoSQL databases (helps with schema flexibility concepts)
  • Understanding of graph algorithms (PageRank, community detection—libraries often provide these)

Hiring Impact:

  • Cypher developers: Moderate pool (Neo4j dominance means growing talent base)
  • Gremlin developers: Smaller pool (more specialized)
  • GSQL developers: Very small pool (lock-in includes talent availability)

Common Pitfalls#

1. “All graph databases are the same”

  • Reality: Neo4j (Cypher), Neptune (Gremlin), TigerGraph (GSQL) are fundamentally different query languages and data models.
  • Avoidance: Evaluate client library portability upfront.

2. “We can switch databases later”

  • Reality: Query language lock-in is real. Cypher → Gremlin migration = months of rewriting.
  • Avoidance: Choose based on 5-year horizon, not 6-month prototype needs.

3. “Graph databases are slow for everything”

  • Reality: 10-100x faster for multi-hop traversals, but slower for simple key-value lookups than Redis.
  • Avoidance: Use graph DBs for graph queries, not as a general-purpose database.

4. “OGMs always improve productivity”

  • Reality: OGMs excel for CRUD, but complex traversals often need raw Cypher/Gremlin for control.
  • Avoidance: Start with raw driver, add OGM selectively for CRUD-heavy patterns.

5. “Gremlin works everywhere”

  • Reality: Gremlin is TinkerPop-family only (Neptune, Cosmos, JanusGraph). Neo4j and TigerGraph do NOT support Gremlin.
  • Avoidance: Verify database compatibility before committing to Gremlin.

Success Metrics (Realistic Expectations)#

Performance:

  • 3-hop queries: 50-100ms (vs 500-2000ms in SQL with JOINs)
  • 5-hop queries: 200-500ms (vs timeouts in SQL)

Development Velocity:

  • Feature delivery for relationship-heavy features: 30-50% faster (once team is trained)
  • Query writing: Initially slower (learning curve), then faster (more expressive language)

Operational:

  • Managed service uptime: 99.9%+ (AuraDB, Neptune SLAs)
  • Self-hosted: Depends on team expertise (expect 2-4 weeks to stabilize production setup)

Common Misconceptions#

“Graph databases replace all databases”#

Reality: Graph DBs are specialized tools. Use them for relationship-heavy queries, not for simple CRUD, time-series, or full-text search. Most production systems use graph DBs alongside PostgreSQL, Redis, and Elasticsearch.

“GQL will solve all portability issues”#

Reality: ISO GQL standard (2024) is a major step forward, but:

  • Full vendor adoption: 2026-2028 expected
  • Gremlin is a separate family (no GQL convergence planned)
  • Legacy codebases won’t auto-migrate

“Self-hosting is always cheaper”#

Reality: Managed services ($65-2K/month) often cheaper than:

  • DevOps salary ($120K+/year)
  • Downtime costs (hours of engineer time debugging crashes)
  • Backup/DR infrastructure

Calculate total cost of ownership, not just license fees.


Decision Framework for Stakeholders#

Primary Recommendation: Neo4j + Cypher#

Choose neo4j (official driver) when:

  • Starting a new graph database project
  • Developer experience and time-to-market matter
  • GraphRAG or knowledge graph use case (best LLM integration)
  • Betting on GQL standardization (Cypher converging to ISO GQL)

Risk: Medium lock-in (acceptable for velocity gains, GQL migration path exists)

Alternative: TinkerPop + Gremlin#

Choose gremlinpython when:

  • Multi-cloud or multi-database strategy required (Neptune + Cosmos DB flexibility)
  • Vendor-neutral governance important (Apache Foundation)
  • Portability outweighs developer convenience

Risk: Low lock-in, higher learning curve

Niche: TigerGraph + GSQL#

Choose pyTigerGraph when:

  • Deep graph analytics (10+ hop traversals, complex algorithms) required
  • Existing TigerGraph investment
  • Performance justifies severe lock-in (portability: 2/10)

Risk: High lock-in, small hiring pool

Questions to Ask Before Committing#

  1. What’s our relationship depth? (1-2 hops → maybe Postgres, 3+ hops → graph DB)
  2. Do we need multi-database portability? (Yes → Gremlin, No → Cypher)
  3. What’s our scale projection? (<10M edges → any, >1B edges → TigerGraph/Neptune)
  4. Is semantic reasoning required? (Yes → RDF/rdflib, No → property graph)
  5. What’s our cloud commitment? (AWS → Neptune, Azure → Cosmos, Multi-cloud → self-hosted or Neo4j)

GQL Standardization (ISO/IEC 39075:2024)#

  • Published April 2024, Cypher converging to compliance
  • By 2028, expect broad vendor adoption (reduced lock-in)
  • Migration path: Cypher users have smooth transition, Gremlin separate family

GraphRAG (Graph + LLMs)#

  • Microsoft open-sourced GraphRAG (July 2024)
  • Knowledge graphs as LLM context (retrieval-augmented generation)
  • Neo4j leading integration (vector + graph hybrid)
  • Implication: Graph DB clients need vector embedding support (table stakes by 2026)

PostgreSQL AGE (Apache Graph Extension)#

  • Cypher queries in PostgreSQL (Apache incubator project)
  • “Good enough” graph for many use cases
  • Implication: Lowers barrier to entry, may slow pure graph DB adoption for simple use cases

Multi-Model Convergence#

  • ArangoDB (graph + document + key-value in one database)
  • DuckDB adding graph capabilities
  • Implication: Graph becomes a feature, not a separate database (reduces operational complexity)

Date compiled: February 5, 2026 Research ID: 1.011

S1: Rapid Discovery

S1 Rapid Discovery: Graph Database Python Clients#

Research Methodology#

Scope Definition#

This discovery focuses on client libraries for interacting with graph databases from Python, NOT the graph databases themselves. The goal is to evaluate developer experience, community health, and production readiness of each library.

Discovery Process#

  1. Initial Library Identification

    • Categorize by database: Neo4j, ArangoDB, TigerGraph, Amazon Neptune, Dgraph
    • Identify multi-database solutions: Apache TinkerPop (Gremlin), RDFLib
    • Note official vs community-maintained libraries
  2. Metrics Collection For each library, we gather:

    • GitHub metrics: stars, forks, contributors, open issues, last commit date
    • PyPI metrics: weekly downloads, latest version, Python version support
    • Maintenance signals: release frequency, issue response time
    • Documentation quality: quick scan of docs completeness
  3. First Impression Evaluation

    • Installation simplicity: pip install <package> should work cleanly
    • Quickstart availability: can a developer be productive in 10 minutes?
    • API design: does it feel Pythonic?

Libraries Evaluated#

DatabaseOfficial ClientAlternative/OGM
Neo4jneo4j (driver)neomodel (OGM), py2neo (EOL)
ArangoDBpython-arangopython-arango-async
TigerGraphpyTigerGraph-
Amazon Neptunegremlinpython + neptune-python-utils-
Dgraphpydgraph-
Multi-DB (Gremlin)gremlinpython-
RDF Graphsrdflib-
OrientDBpyorient- (stale)

Evaluation Criteria#

Tier 1 - Production Ready

  • 500k+ weekly downloads
  • Active maintenance (commits within 30 days)
  • Official/vendor support
  • Comprehensive documentation

Tier 2 - Mature Community

  • 50k-500k weekly downloads
  • Regular releases (quarterly or better)
  • Good documentation
  • Active issue resolution

Tier 3 - Emerging/Niche

  • <50k weekly downloads
  • May have gaps in documentation
  • Smaller community
  • Specialized use cases

Tier 4 - Caution Advised

  • Stale/EOL projects
  • No recent releases
  • Deprecated in favor of alternatives

Data Sources#

  • PyPI: https://pypi.org/ (version, dependencies, downloads)
  • PyPI Stats: https://pypistats.org/ (download statistics)
  • GitHub: Repository metrics and activity
  • Snyk Advisor: Package health analysis
  • Official documentation sites

Key Findings Summary#

See individual library files and recommendation.md for detailed analysis. The most widely adopted libraries are gremlinpython (5.7M weekly downloads), rdflib (1.4M), neo4j driver (520K), and python-arango (350K-1.2M).


gremlinpython - Apache TinkerPop Gremlin for Python#

Quick Facts#

MetricValue
Package Namegremlinpython
Latest Version3.8.0 (Nov 17, 2025)
Python Support3.10+
Weekly Downloads~5.7 million
GitHub Stars1,900+ (TinkerPop repo)
LicenseApache-2.0
MaintainerApache TinkerPop

Installation#

pip install gremlinpython

First Impression#

Strengths:

  • Universal graph query language (Gremlin)
  • Works with multiple databases: Neptune, JanusGraph, CosmosDB, etc.
  • Most downloaded graph Python library
  • Apache Foundation backing
  • Stable, mature codebase

Considerations:

  • Gremlin syntax differs from Cypher
  • Generic API may lack database-specific optimizations
  • TinkerPop 4.0 brings breaking changes (HTTP replacing WebSockets)

Compatible Databases#

  • Amazon Neptune
  • JanusGraph
  • Azure Cosmos DB (Gremlin API)
  • DataStax Enterprise Graph
  • IBM Compose for JanusGraph
  • OrientDB
  • TigerGraph (limited)

Quick Example#

from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.structure.graph import Graph

# Connect to Gremlin server
graph = Graph()
conn = DriverRemoteConnection('ws://localhost:8182/gremlin', 'g')
g = graph.traversal().withRemote(conn)

# Traverse the graph
people = g.V().hasLabel('person').values('name').toList()
print(people)

# Find friends of Alice
friends = g.V().has('person', 'name', 'Alice').out('knows').values('name').toList()
print(friends)

conn.close()

Amazon Neptune Usage#

from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

# With neptune-python-utils for IAM auth
from neptune_python_utils.gremlin_utils import GremlinUtils

gremlin_utils = GremlinUtils()
conn = gremlin_utils.remote_connection()
g = gremlin_utils.traversal_source(connection=conn)

Assessment#

Tier: 1 - Production Ready

gremlinpython is the standard for Gremlin-based graph traversal. Its massive adoption (5.7M downloads/week) reflects use in AWS Neptune and other cloud graph services. Essential for multi-database portability or when using Gremlin-compatible databases.


neo4j - Official Neo4j Python Driver#

Quick Facts#

MetricValue
Package Nameneo4j
Latest Version6.0.3 (Nov 6, 2025)
Python Support3.10, 3.11, 3.12, 3.13
Weekly Downloads~520,000
GitHub Stars1,000
Contributors58
LicenseApache-2.0
MaintainerNeo4j, Inc. (official)

Installation#

pip install neo4j

# With optional Rust extension for 10x performance
pip install neo4j-rust-ext

# With pandas/numpy integration
pip install neo4j[numpy,pandas,pyarrow]

First Impression#

Strengths:

  • Official vendor support with dedicated team
  • Production-stable with semantic versioning
  • Async support built-in
  • Rust extensions available for performance-critical workloads
  • Excellent documentation and examples
  • Type hints throughout

Considerations:

  • Python 3.10+ required (no legacy support)
  • Deprecated neo4j-driver package still in PyPI (causes confusion)

Quick Example#

from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))

with driver.session() as session:
    result = session.run("MATCH (n:Person) RETURN n.name LIMIT 10")
    for record in result:
        print(record["n.name"])

driver.close()

Assessment#

Tier: 1 - Production Ready

The official Neo4j driver is the clear choice for Neo4j integration. It has strong community adoption, official support, excellent documentation, and modern Python features. The Rust extension option makes it suitable for high-throughput workloads.


neomodel - Python OGM for Neo4j#

Quick Facts#

MetricValue
Package Nameneomodel
Latest Version6.0.0 (Nov 26, 2025)
Python Support3.10+
Weekly Downloads~25,000
GitHub Stars1,100
Contributors97
Open Issues49
LicenseMIT
MaintainerNeo4j Labs (community)

Installation#

pip install neomodel

# With Rust driver extension for performance
pip install neomodel[rust-driver-ext]

# With optional dependencies (Shapely, pandas, numpy)
pip install neomodel[extras,rust-driver-ext]

First Impression#

Strengths:

  • Django-style model definitions (familiar pattern)
  • Schema enforcement with cardinality restrictions
  • Full transaction and async support
  • Neo4j Labs project (good maintenance quality)
  • Django integration via django-neomodel plugin
  • Vector and full-text search support (v6.0+)

Considerations:

  • Abstracts away Cypher (less control for complex queries)
  • Learning curve for graph-specific concepts
  • Performance overhead vs raw driver

Quick Example#

from neomodel import StructuredNode, StringProperty, RelationshipTo

class Person(StructuredNode):
    name = StringProperty(required=True)
    friends = RelationshipTo('Person', 'FRIEND')

# Create and relate nodes
alice = Person(name="Alice").save()
bob = Person(name="Bob").save()
alice.friends.connect(bob)

# Query
for friend in alice.friends.all():
    print(friend.name)

Assessment#

Tier: 2 - Mature Community

neomodel is the recommended OGM for Neo4j, especially for developers coming from Django/SQLAlchemy backgrounds. It provides a Pythonic abstraction over the graph while still allowing raw Cypher when needed. Good choice for rapid development.


py2neo - End of Life (EOL)#

Quick Facts#

MetricValue
Package Namepy2neo
Latest Version2021.2.4
StatusEND OF LIFE
Last Meaningful Release2021
GitHub Stars~1,200 (archived)
LicenseApache-2.0

Status Warning#

py2neo has been officially declared End of Life as of April 2025.

The project is no longer maintained and will receive no further updates. The GitHub repository has moved to neo4j-contrib/py2neo but is effectively archived.

Migration Path#

Neo4j recommends migrating to:

  1. neo4j - Official Python driver for direct Cypher queries
  2. neomodel - For ORM-style object-graph mapping

Historically, py2neo offered:

  • Higher-level API than the raw driver
  • Built-in OGM (Object-Graph Mapper)
  • HTTP and Bolt protocol support
  • Cypher lexer for Pygments
  • Command-line tools

Migration Example#

Old (py2neo):

from py2neo import Graph
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))
graph.run("MATCH (n) RETURN n LIMIT 10")

New (neo4j driver):

from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))
with driver.session() as session:
    session.run("MATCH (n) RETURN n LIMIT 10")

Assessment#

Tier: 4 - Do Not Use

Do not start new projects with py2neo. For existing codebases, plan migration to the official neo4j driver or neomodel. Historical releases remain available on PyPI for legacy compatibility.


pydgraph - Official Dgraph Python Client#

Quick Facts#

MetricValue
Package Namepydgraph
Latest Version24.3.0 (Aug 5, 2025)
Python Support3.7 - 3.12
Weekly Downloads~8,300
GitHub Stars288
Forks90
Open Issues0
LicenseApache-2.0
MaintainerHypermode Inc. (steward of Dgraph)

Installation#

pip install pydgraph

First Impression#

Strengths:

  • Official client with gRPC protocol
  • Good Python version support (3.7+)
  • Clean, simple API
  • Connection string support for clusters
  • ACL (Access Control List) authentication

Considerations:

  • Smaller ecosystem compared to Neo4j
  • Dgraph uses GraphQL-like DQL, learning curve
  • gRPC dependency can cause build issues on older systems

Quick Example#

import pydgraph

# Create client
client_stub = pydgraph.DgraphClientStub('localhost:9080')
client = pydgraph.DgraphClient(client_stub)

# Set schema
schema = """
name: string @index(exact) .
friends: [uid] .
type Person {
    name
    friends
}
"""
op = pydgraph.Operation(schema=schema)
client.alter(op)

# Create data
txn = client.txn()
try:
    mutation = pydgraph.Mutation(set_nquads='_:alice <name> "Alice" .')
    txn.mutate(mutation)
    txn.commit()
finally:
    txn.discard()

# Query
query = """
{
  people(func: has(name)) {
    name
    friends { name }
  }
}
"""
res = client.txn(read_only=True).query(query)
print(res.json)

client_stub.close()

Connection String Format#

# Standard connection
client_stub = pydgraph.DgraphClientStub("localhost:9080")

# With authentication
client_stub = pydgraph.DgraphClientStub("dgraph://username:password@host:9080")

Assessment#

Tier: 3 - Emerging/Niche

pydgraph is the correct choice for Dgraph integration. While the community is smaller than Neo4j, the library is well-maintained with zero open issues. Suitable for applications needing Dgraph’s distributed graph capabilities and native GraphQL-like query language.


python-arango - Official ArangoDB Python Driver#

Quick Facts#

MetricValue
Package Namepython-arango
Latest Version8.2.5 (Dec 22, 2025)
Python Support3.9, 3.10, 3.11, 3.12
Weekly Downloads~350,000 - 1.2M
GitHub Stars466
Contributors32+
Open Issues0
LicenseMIT
MaintainerArangoDB (official)

Installation#

pip install python-arango

# For async support
pip install python-arango-async

First Impression#

Strengths:

  • Official vendor support
  • Excellent maintenance (zero open issues)
  • Clean, Pythonic API
  • Comprehensive AQL query support
  • Graph traversal, document, and key-value operations
  • Async alternative available

Considerations:

  • ArangoDB-specific (multi-model but single vendor)
  • Smaller community than Neo4j ecosystem

Quick Example#

from arango import ArangoClient

client = ArangoClient(hosts="http://localhost:8529")
db = client.db("mydb", username="root", password="password")

# Create a graph
graph = db.create_graph("social")
people = graph.create_vertex_collection("people")
friends = graph.create_edge_definition(
    edge_collection="friends",
    from_vertex_collections=["people"],
    to_vertex_collections=["people"]
)

# Insert vertices and edges
alice = people.insert({"_key": "alice", "name": "Alice"})
bob = people.insert({"_key": "bob", "name": "Bob"})
friends.insert({"_from": "people/alice", "_to": "people/bob"})

# AQL query
cursor = db.aql.execute("FOR p IN people RETURN p.name")
print([doc for doc in cursor])

Assessment#

Tier: 1 - Production Ready

python-arango is an excellent choice for ArangoDB integration. The library is well-maintained with official support, zero open issues, and comprehensive coverage of ArangoDB features. Particularly strong for applications needing multi-model (document + graph + key-value) capabilities.


pyTigerGraph - TigerGraph Python Client#

Quick Facts#

MetricValue
Package NamepyTigerGraph
Latest Version1.9.1 (Nov 4, 2025)
Python Support3.8+
Weekly Downloads~5,600
GitHub Stars34
Contributors22
Open Issues7
LicenseApache-2.0
MaintainerTigerGraph (official)

Installation#

# Core functionality
pip install pyTigerGraph

# With Graph Data Science / ML capabilities
pip install pyTigerGraph[gds]

First Impression#

Strengths:

  • Official vendor support
  • Graph machine learning integration
  • Async support (v1.8+)
  • DataFrame loading from Pandas
  • Good for analytics and ML workloads

Considerations:

  • Smaller community (niche database)
  • Less documentation compared to Neo4j/ArangoDB
  • TigerGraph-specific GSQL language

Quick Example#

import pyTigerGraph as tg

conn = tg.TigerGraphConnection(
    host="https://your-instance.i.tgcloud.io",
    graphname="MyGraph",
    username="tigergraph",
    password="password"
)

# Get auth token
conn.getToken(conn.createSecret())

# Run a query
results = conn.runInstalledQuery("find_friends", params={"person": "Alice"})

# Upsert vertices
conn.upsertVertices("Person", [
    {"id": "alice", "name": "Alice"},
    {"id": "bob", "name": "Bob"}
])

Graph Data Science Features#

# With pyTigerGraph[gds]
from pyTigerGraph.gds import featurizer

# Create graph features for ML
feat = conn.gds.featurizer()
feat.installAlgorithm("pagerank")
feat.runAlgorithm("pagerank", params={"v_type": "Person"})

Assessment#

Tier: 3 - Emerging/Niche

pyTigerGraph is the right choice when using TigerGraph, especially for graph analytics and machine learning use cases. The library has official support but a smaller community. Best suited for enterprise analytics workloads where TigerGraph’s performance advantages justify the ecosystem trade-offs.


rdflib - Python Library for RDF#

Quick Facts#

MetricValue
Package Namerdflib
Latest Version7.5.0 (Nov 28, 2025)
Python Support3.8.1+
Weekly Downloads~1.45 million
GitHub Stars2,400
Contributors189
Open Issues291
LicenseBSD-3-Clause
MaintainerRDFLib community

Installation#

pip install rdflib

First Impression#

Strengths:

  • Dominant library for RDF/semantic web in Python
  • Comprehensive format support (RDF/XML, Turtle, JSON-LD, N-Quads, etc.)
  • Full SPARQL 1.1 implementation
  • Mature, well-documented
  • Large ecosystem of extensions

Considerations:

  • Focus on RDF graphs (different paradigm from property graphs)
  • Not designed for high-performance graph traversal
  • Steeper learning curve for developers new to semantic web

Quick Example#

from rdflib import Graph, Literal, RDF, URIRef, Namespace

# Create a graph
g = Graph()

# Define namespace
FOAF = Namespace("http://xmlns.com/foaf/0.1/")
g.bind("foaf", FOAF)

# Add triples
alice = URIRef("http://example.org/alice")
g.add((alice, RDF.type, FOAF.Person))
g.add((alice, FOAF.name, Literal("Alice")))
g.add((alice, FOAF.knows, URIRef("http://example.org/bob")))

# SPARQL query
results = g.query("""
    SELECT ?name WHERE {
        ?person foaf:name ?name .
    }
""")
for row in results:
    print(row.name)

# Serialize
print(g.serialize(format="turtle"))

RDF vs Property Graphs#

RDF (rdflib)Property Graphs (Neo4j, etc.)
Triple-based (subject-predicate-object)Nodes and relationships with properties
URIs for identifiersInternal IDs
SPARQL query languageCypher, Gremlin, etc.
Semantic web / linked data focusApplication data modeling
Standards-based (W3C)Vendor-specific

Assessment#

Tier: 1 - Production Ready

rdflib is the standard for RDF processing in Python. Essential for semantic web applications, knowledge graphs with linked data, and SPARQL-based querying. Different use case than property graph databases but equally mature.


Graph Database Python Clients: Recommendations#

Quick Assessment Summary#

Tier Rankings#

TierLibraryDownloads/WeekUse Case
1gremlinpython5.7MMulti-DB, Neptune, JanusGraph
1rdflib1.45MRDF/Semantic web
1neo4j520KNeo4j (official)
1python-arango350K-1.2MArangoDB (official)
2neomodel25KNeo4j OGM
3pydgraph8KDgraph
3pyTigerGraph5.6KTigerGraph
4py2neoEOLDo not use
4pyorientStale (2017)Avoid

Top Picks by Use Case#

For Neo4j Integration#

Primary: neo4j (official driver)

  • Best for: Direct Cypher queries, maximum control, performance
  • Install: pip install neo4j

Alternative: neomodel (OGM)

  • Best for: Django-style model definitions, rapid development
  • Install: pip install neomodel

For Multi-Database Portability#

Primary: gremlinpython

  • Best for: Neptune, JanusGraph, CosmosDB, any Gremlin-compatible DB
  • Install: pip install gremlinpython

For ArangoDB#

Primary: python-arango

  • Best for: Multi-model (document + graph + key-value) applications
  • Install: pip install python-arango

For Semantic Web / Knowledge Graphs#

Primary: rdflib

  • Best for: RDF processing, SPARQL queries, linked data
  • Install: pip install rdflib

For Graph Analytics / ML#

Primary: pyTigerGraph[gds]

  • Best for: Large-scale graph analytics, ML on graphs
  • Install: pip install pyTigerGraph[gds]

Decision Matrix#

RequirementRecommended Library
Need Cypher query languageneo4j, neomodel
Need Gremlin query languagegremlinpython
Need SPARQL / RDFrdflib
Need vendor portabilitygremlinpython
Need ORM-style abstractionneomodel
AWS Neptunegremlinpython + neptune-python-utils
Multi-model (doc + graph)python-arango
Graph machine learningpyTigerGraph[gds]
Distributed graph at scalepydgraph, pyTigerGraph

Libraries to Avoid#

  1. py2neo - Officially EOL (April 2025), migrate to neo4j/neomodel
  2. pyorient - Last release 2017, OrientDB has limited Python support
  3. neo4j-driver - Deprecated package name, use neo4j instead

Key Insights#

  1. gremlinpython dominates downloads due to AWS Neptune and cloud adoption
  2. Neo4j ecosystem is strongest with official driver + OGM options
  3. ArangoDB has excellent official support with zero open issues
  4. RDFLib serves a distinct use case (semantic web vs property graphs)
  5. TigerGraph and Dgraph are niche but officially supported

Next Steps for Deeper Evaluation#

  1. Test connection setup with actual database instances
  2. Benchmark query performance for representative workloads
  3. Evaluate async support for concurrent applications
  4. Review error handling and retry mechanisms
  5. Assess integration with web frameworks (FastAPI, Django)

Data Sources#

  • PyPI: Package metadata and downloads
  • PyPI Stats: https://pypistats.org/
  • GitHub: Repository metrics
  • Snyk Advisor: Package health analysis
  • Official documentation for each library

Research conducted: December 2025

S2: Comprehensive

S2 Comprehensive Discovery: Graph Database Python Client Libraries#

Overview#

This document outlines the methodology for evaluating Python client libraries for graph databases. The analysis covers official drivers, community libraries, and Object-Graph Mappers (OGMs) across multiple graph database platforms.

Scope#

Libraries Evaluated#

LibraryDatabaseTypeMaintenance
neo4j-driverNeo4jOfficial DriverActive (Neo4j Inc.)
py2neoNeo4jCommunity DriverEOL (Archived)
neomodelNeo4jOGMActive (Neo4j Labs)
python-arangoArangoDBOfficial DriverActive (ArangoDB)
pyTigerGraphTigerGraphOfficial ClientActive (TigerGraph)
gremlinpythonMulti-DBOfficial (TinkerPop)Active (Apache)
pydgraphDgraphOfficial DriverActive (Hypermode)
rdflibRDF/SPARQLLibraryActive (Community)

Evaluation Criteria#

1. API Design and Ergonomics#

  • Pythonic Design: Adherence to Python idioms (PEP 8, context managers, generators)
  • Type Hints: MyPy compatibility and IDE support
  • Documentation Quality: Official docs, examples, and community resources
  • Learning Curve: Time to productivity for developers

2. Performance Characteristics#

  • Connection Pooling: Configuration options and efficiency
  • Bulk Operations: Batch insert/update capabilities
  • Serialization: Data format handling (JSON, Binary, custom)
  • Rust Extensions: Native code acceleration options

3. Async Support#

  • Native asyncio: Built-in async/await support
  • Framework Integration: FastAPI, aiohttp, Starlette compatibility
  • Concurrent Transactions: Parallel query execution

4. Transaction and Consistency#

  • ACID Support: Transaction isolation levels
  • Retry Logic: Automatic retry on transient failures
  • Causal Consistency: Bookmark/session management
  • Read/Write Splitting: Routing to appropriate cluster nodes

5. Query Language Support#

LibraryPrimarySecondary
neo4j-driverCypher-
neomodelPython OGMCypher (raw)
python-arangoAQL-
pyTigerGraphGSQLREST API
gremlinpythonGremlin-
pydgraphGraphQL+/DQL-
rdflibSPARQLRDF/Turtle

6. Schema and Migration#

  • Schema Definition: Programmatic vs. declarative
  • Constraint Management: Unique, existence, type constraints
  • Index Management: Creation, deletion, optimization
  • Migration Tooling: Version control for schema changes

7. Testing and Development#

  • Mocking Support: Test doubles and fixtures
  • Embedded Mode: In-process database for testing
  • CI/CD Integration: Docker, testcontainers compatibility

Data Sources#

Primary Sources#

  1. Official Documentation: Driver manuals and API references
  2. GitHub Repositories: Source code, issues, release notes
  3. PyPI: Package metadata, version history, dependencies

Secondary Sources#

  1. Community Forums: Stack Overflow, database-specific communities
  2. Performance Benchmarks: Published comparisons and metrics
  3. Migration Guides: Version upgrade documentation

Analysis Deliverables#

  1. Per-Library Deep Dives: 100-200 lines covering features, patterns, and limitations
  2. Feature Matrix: Side-by-side comparison across all criteria
  3. Recommendations: Use-case based guidance with justifications

Versioning Context#

All analysis conducted against library versions current as of December 2024:

  • neo4j-driver: 6.0.x
  • neomodel: 6.0.x
  • python-arango: 8.2.x
  • pyTigerGraph: 1.6.x
  • gremlinpython: 3.7.x
  • pydgraph: 24.x / 25.x
  • rdflib: 7.2.x

Graph Database Python Client Libraries: Feature Matrix#

Overview#

This matrix compares Python client libraries for graph databases across key functional and technical criteria. Libraries are evaluated as of December 2024.

Quick Reference#

LibraryDatabaseQuery LanguageStatus
neo4j-driverNeo4jCypherActive
py2neoNeo4jCypherEOL
neomodelNeo4jOGM/CypherActive
python-arangoArangoDBAQLActive
pyTigerGraphTigerGraphGSQLActive
gremlinpythonMulti-DBGremlinActive
pydgraphDgraphDQLActive
rdflibRDF storesSPARQLActive

Async Support#

LibraryNative asyncioAsync VariantFramework Compat
neo4j-driverYesBuilt-inFastAPI, aiohttp
py2neoNo--
neomodelYesBuilt-in (v5+)Django, FastAPI
python-arangoNopython-arango-asyncFastAPI (separate pkg)
pyTigerGraphPartialAsyncTigerGraphConnectionLimited
gremlinpythonNoaiogremlin, goblinVia third-party
pydgraphNogRPC futures onlyLimited
rdflibNoManual wrappingVia thread pool

Connection Management#

LibraryConnection PoolingPool Size ConfigLiveness Check
neo4j-driverYesmax_connection_pool_sizeliveness_check_timeout
py2neoBasicLimitedNo
neomodelYes (via driver)Via driver_optionsVia driver
python-arangoNo-No
pyTigerGraphNo-No
gremlinpythonYespool_size parameterKnown issues
pydgraphManualMultiple stubsManual
rdflibN/AN/AN/A

Transaction Support#

LibraryACIDManaged TxnAuto-retryCausal Consistency
neo4j-driverYesexecute_read/writeYesBookmarks
py2neoYesContext managerNoNo
neomodelYesContext managerNoVia driver
python-arangoYesStream/JS txnNoNo
pyTigerGraphLimitedVia RESTNoNo
gremlinpythonYestx.begin/commitNoNo
pydgraphYestxn() contextManualNo
rdflibNoN/AN/AN/A

Query Language Features#

LibraryParameterizedPrepared/CachedBulk Operations
neo4j-driverYes ($params)NoUNWIND pattern
py2neoYesNoBatch methods
neomodelYesNosave() loop
python-arangoYes (@params)Noinsert_many()
pyTigerGraphYesInstalled queriesupsertVertices()
gremlinpythonLimitedNoBatch traversals
pydgraphYes ($params)NoJSON arrays
rdflibYes (initBindings)prepareQuery()addN()

OGM/ORM Capabilities#

LibraryOGM LayerSchema DefinitionHooksValidation
neo4j-driverNoManualNoNo
py2neoBuilt-inGraphObjectLimitedNo
neomodelBuilt-inStructuredNodeYesProperty-level
python-arangoNoManualNoNo
pyTigerGraphSchema APIObject-orientedNoGSQL
gremlinpythonVia GoblinVertex/Edge classesLimitedVia Goblin
pydgraphNoDQL schemaNoNo
rdflibNoRDF/OWLNoSHACL (ext)

Type System#

LibraryType HintsMyPy SupportSpatial TypesTemporal Types
neo4j-driverYesGoodPointDate/DateTime/Duration
py2neoPartialLimitedVia CypherVia Cypher
neomodelYesGoodPointPropertyDateTime/Date
python-arangoPartialLimitedGeoJSONISO strings
pyTigerGraphLimitedLimitedGSQL typesDATETIME
gremlinpythonLimitedLimitedVia propertiesVia properties
pydgraphLimitedLimitedGeo (geo:)dateTime
rdflibPartialLimitedGeoSPARQLxsd:dateTime

Performance Features#

LibraryNative ExtensionsBinary ProtocolCompression
neo4j-driverRust (optional)BoltNo
py2neoNoBolt/HTTPNo
neomodelVia driverBoltNo
python-arangoNoHTTP/RESTOptional
pyTigerGraphNoHTTP/RESTNo
gremlinpythonNoWebSocketGraphBinary
pydgraphNogRPCProtocol Buffers
rdflibNoN/AN/A

Error Handling#

LibraryTyped ExceptionsRetry CategoriesError Codes
neo4j-driverYesTransient/Client/DBYes
py2neoPartialNoLimited
neomodelVia driverVia driverVia driver
python-arangoYesNoArangoDB codes
pyTigerGraphBasicNoREST status
gremlinpythonGremlinServerErrorNoServer codes
pydgraphgRPC errorsManualgRPC codes
rdflibStandard PythonNoNo

Testing Support#

LibraryMockingEmbedded ModeTestcontainers
neo4j-driverManualNoYes
py2neoNoNoPossible
neomodelManualNoYes
python-arangoNoNoYes
pyTigerGraphNoNoLimited
gremlinpythonNoJVM onlyYes (Gremlin Server)
pydgraphNoNoYes
rdflibIn-memory graphYesN/A

Documentation and Community#

LibraryOfficial DocsAPI ReferenceExamplesCommunity
neo4j-driverExcellentCompleteExtensiveLarge
py2neoArchivedArchivedLimitedInactive
neomodelGoodCompleteModerateActive
python-arangoGoodCompleteGoodModerate
pyTigerGraphGoodCompleteGoodModerate
gremlinpythonGoodReferenceBook availableLarge
pydgraphModerateREADMEBasicSmall
rdflibExcellentCompleteExtensiveLarge

Python Version Support#

LibraryMin VersionMax VersionNotes
neo4j-driver3.103.14Drops 3.9 in v6
py2neo3.x-EOL
neomodel3.83.12+
python-arango3.9Latest
pyTigerGraph3.7Latest
gremlinpython3.10Latest
pydgraph3.7Latest
rdflib3.8Latest

Database Version Support#

LibrarySupported VersionsLTS Support
neo4j-driver4.4+, 5.x4.4 LTS, 5.26 LTS
py2neo4.x (frozen)-
neomodel4.4+, 5.xVia driver
python-arango3.11+Via ArangoDB
pyTigerGraph3.x+Via TigerGraph
gremlinpythonTinkerPop 3.xVia database
pydgraphVersion-matchedVia Dgraph
rdflibN/AN/A

Installation Size#

LibraryCore SizeDependenciesOptional Extras
neo4j-driver~500KBpytzrust-ext (~2MB)
py2neo~1MBSeveralpygments
neomodel~200KBneo4j-drivershapely, extras
python-arango~300KBrequests-
pyTigerGraph~500KBrequeststorch (GDS)
gremlinpython~200KBaiohttp, nest-asyncio-
pydgraph~100KBgrpcio, protobuf-
rdflib~2MBpyparsing, isodatelxml, html5lib

Summary Scores (1-5)#

LibraryAPI DesignPerformanceAsyncEcosystemOverall
neo4j-driver55555.0
py2neo43112.3
neomodel54444.3
python-arango44343.8
pyTigerGraph33232.8
gremlinpython33243.0
pydgraph34222.8
rdflib43143.0

Scores based on: API ergonomics, Python idiom adherence, documentation quality, maintenance activity, and production readiness.


gremlinpython - Apache TinkerPop Gremlin Client#

Overview#

gremlinpython is the official Python language variant (GLV) for Apache TinkerPop’s Gremlin graph traversal language. It provides a consistent API for interacting with any TinkerPop-enabled graph database, offering database portability through a standardized query language.

Key Information#

AttributeValue
Packagegremlinpython
Version3.7.x
Python Support3.10+
ProtocolWebSocket (GraphBinary/GraphSON)
LicenseApache 2.0
Repositorygithub.com/apache/tinkerpop

Supported Databases#

gremlinpython works with any TinkerPop-compliant database:

  • Amazon Neptune
  • Azure Cosmos DB (Gremlin API)
  • JanusGraph
  • OrientDB
  • Neo4j (with TinkerPop plugin)
  • TigerGraph (with TinkerPop connector)
  • DataStax Graph

Installation#

pip install gremlinpython

Connection Management#

Basic Connection#

from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

# Create traversal source
g = traversal().with_remote(
    DriverRemoteConnection('ws://localhost:8182/gremlin', 'g')
)

# With authentication
g = traversal().with_remote(
    DriverRemoteConnection(
        'wss://your-endpoint:8182/gremlin',
        'g',
        username='user',
        password='password'
    )
)

Connection Options#

from gremlin_python.driver.client import Client

# Lower-level client access
client = Client(
    'ws://localhost:8182/gremlin',
    'g',
    pool_size=8,           # Connection pool size
    max_workers=4,         # Thread pool workers
    message_serializer=None # Custom serializer
)

# Submit raw Gremlin
result = client.submit("g.V().count()")
for r in result:
    print(r)

Traversal Basics#

Creating Vertices#

# Add vertex
g.addV('person').property('name', 'Alice').property('age', 30).next()

# Add with ID
g.addV('person').property(T.id, 'alice').property('name', 'Alice').next()

Creating Edges#

# Add edge between vertices
g.V().has('person', 'name', 'Alice').as_('a') \
     .V().has('person', 'name', 'Bob').as_('b') \
     .addE('knows').from_('a').to('b').property('since', 2020).next()

Reading Data#

# Get all vertices of type
people = g.V().hasLabel('person').toList()

# Get specific vertex
alice = g.V().has('person', 'name', 'Alice').next()

# Get vertex properties
props = g.V().has('person', 'name', 'Alice').valueMap().next()

Updating Data#

# Update property
g.V().has('person', 'name', 'Alice') \
     .property('age', 31).next()

# Add multiple properties
g.V().has('person', 'name', 'Alice') \
     .property('email', '[email protected]') \
     .property('city', 'NYC').next()

Deleting Data#

# Delete vertex (and connected edges)
g.V().has('person', 'name', 'Alice').drop().iterate()

# Delete edge
g.E().hasLabel('knows').drop().iterate()

Traversal Patterns#

Filtering#

# Has filters
g.V().has('person', 'age', P.gt(25)).toList()
g.V().has('person', 'name', P.within('Alice', 'Bob')).toList()

# Multiple conditions
g.V().hasLabel('person') \
     .has('age', P.gte(18)) \
     .has('age', P.lt(65)).toList()

# Not filter
g.V().hasLabel('person').not_(__.has('retired', True)).toList()

Traversing Relationships#

# Outgoing edges
friends = g.V().has('person', 'name', 'Alice').out('knows').toList()

# Incoming edges
followers = g.V().has('person', 'name', 'Alice').in_('follows').toList()

# Both directions
connections = g.V().has('person', 'name', 'Alice').both('knows').toList()

# Multiple hops
friends_of_friends = g.V().has('person', 'name', 'Alice') \
                          .out('knows').out('knows') \
                          .dedup().toList()

Path Queries#

# Get paths
paths = g.V().has('person', 'name', 'Alice') \
             .repeat(__.out('knows')).times(2) \
             .path().by('name').toList()

# Shortest path
path = g.V().has('person', 'name', 'Alice') \
            .repeat(__.out().simplePath()) \
            .until(__.has('person', 'name', 'Charlie')) \
            .path().limit(1).next()

Aggregation#

# Count
count = g.V().hasLabel('person').count().next()

# Group by
by_age = g.V().hasLabel('person') \
              .group().by('age').by(__.count()).next()

# Statistics
stats = g.V().hasLabel('person') \
             .values('age').fold() \
             .project('min', 'max', 'avg', 'count') \
             .by(__.min()) \
             .by(__.max()) \
             .by(__.mean()) \
             .by(__.count()).next()

Serialization#

from gremlin_python.driver.serializer import GraphBinarySerializersV1

g = traversal().with_remote(
    DriverRemoteConnection(
        'ws://localhost:8182/gremlin',
        'g',
        message_serializer=GraphBinarySerializersV1()
    )
)

GraphSON#

from gremlin_python.driver.serializer import GraphSONSerializersV3d0

g = traversal().with_remote(
    DriverRemoteConnection(
        'ws://localhost:8182/gremlin',
        'g',
        message_serializer=GraphSONSerializersV3d0()
    )
)

Transaction Support#

# Begin transaction
tx = g.tx()

# Get transaction-bound traversal
gtx = tx.begin()

try:
    gtx.addV('person').property('name', 'Alice').next()
    gtx.addV('person').property('name', 'Bob').next()
    tx.commit()
except:
    tx.rollback()
    raise

Async Alternatives#

gremlinpython itself is synchronous. For async support, consider:

aiogremlin#

from aiogremlin import Cluster, Graph

cluster = await Cluster.open(hosts=['localhost'])
client = await cluster.connect()
g = Graph().traversal().withRemote(client)

# Async operations
result = await g.V().toList()

Goblin OGM#

from goblin import Goblin, Vertex, String

class Person(Vertex):
    name = String()

app = await Goblin.open(hosts=['localhost'])
session = await app.session()

person = Person(name='Alice')
session.add(person)
await session.flush()

gremlinpy (FastAPI compatible)#

from gremlinpy import Graph

g = Graph().traversal()
# Compatible with existing event loops

Traversal Strategies#

from gremlin_python.process.strategies import *

# Read-only strategy
g = g.withStrategies(ReadOnlyStrategy())

# Subgraph strategy (filter)
g = g.withStrategies(SubgraphStrategy(
    vertices=__.hasLabel('person'),
    edges=__.hasLabel('knows')
))

# Partition strategy
g = g.withStrategies(PartitionStrategy(
    partitionKey='region',
    writePartition='us-west'
))

Error Handling#

from gremlin_python.driver.protocol import GremlinServerError

try:
    result = g.V().has('invalid').next()
except GremlinServerError as e:
    print(f"Server error: {e}")
except StopIteration:
    print("No results found")

Connection Pooling Issues#

Known limitation (TINKERPOP-3114):

“Once a connection error occurred, pooled connections are broken and will not be recovered.”

Workaround:

# Implement connection health checks
def get_connection():
    try:
        g.V().limit(1).next()
        return g
    except:
        # Reconnect
        return create_new_connection()

Limitations#

  • No native Python asyncio (use aiogremlin/goblin)
  • Connection pool recovery issues
  • WebSocket-only protocol
  • Remote execution only (no embedded mode)
  • Reference-only objects from server (no full properties)
  • Significant memory overhead for large result sets

When to Use#

Choose gremlinpython when:

  • Database portability is important
  • Working with TinkerPop-compatible databases
  • Standard graph query language preferred
  • Amazon Neptune or Azure Cosmos DB target

Consider alternatives when:

  • Native async required (use aiogremlin/goblin)
  • Database-specific features needed
  • Maximum performance critical
  • OGM patterns preferred (use goblin)

Resources#


Neo4j Python Driver (neo4j)#

Overview#

The official Neo4j Python driver provides low-level, high-performance access to Neo4j databases using the Bolt protocol. Maintained by Neo4j Inc., it serves as the foundation for higher-level libraries like neomodel.

Key Information#

AttributeValue
Packageneo4j (formerly neo4j-driver)
Version6.0.x
Python Support3.10, 3.11, 3.12, 3.13, 3.14
ProtocolBolt 4.4, 5.0-5.8, 6.0
LicenseApache 2.0
Repositorygithub.com/neo4j/neo4j-python-driver

Installation#

pip install neo4j

# With optional Rust extensions for performance
pip install neo4j-rust-ext

Core Features#

Connection Management#

from neo4j import GraphDatabase

# Basic connection
driver = GraphDatabase.driver(
    "neo4j://localhost:7687",
    auth=("neo4j", "password")
)

# With context manager (recommended)
with GraphDatabase.driver(uri, auth=auth) as driver:
    driver.verify_connectivity()
    # Use driver...

Query Execution#

# Simple query execution (v5.0+)
records, summary, keys = driver.execute_query(
    "MATCH (p:Person {name: $name}) RETURN p",
    name="Alice",
    database_="neo4j"
)

# With routing control
from neo4j import RoutingControl

records, summary, keys = driver.execute_query(
    "MATCH (p:Person) RETURN p",
    routing_=RoutingControl.READ
)

Session-Based Transactions#

with driver.session(database="neo4j") as session:
    # Managed transaction (recommended - auto-retry)
    result = session.execute_read(
        lambda tx: tx.run("MATCH (p:Person) RETURN p").data()
    )

    # Write transaction
    session.execute_write(
        lambda tx: tx.run(
            "CREATE (p:Person {name: $name})",
            name="Bob"
        )
    )

Async Support#

Full async/await support mirrors the synchronous API:

from neo4j import AsyncGraphDatabase

async def main():
    async with AsyncGraphDatabase.driver(uri, auth=auth) as driver:
        async with driver.session() as session:
            result = await session.execute_read(
                lambda tx: tx.run("MATCH (n) RETURN n").data()
            )

Async Features#

  • AsyncDriver, AsyncSession, AsyncTransaction
  • AsyncResult with async iteration
  • Compatible with asyncio, FastAPI, aiohttp
  • Shares non-I/O components with sync implementation

Connection Pooling#

Configuration Options#

driver = GraphDatabase.driver(
    uri, auth=auth,
    max_connection_pool_size=100,        # Max connections per host
    connection_acquisition_timeout=60,    # Seconds to wait for connection
    connection_timeout=30,                # TCP connection timeout
    max_connection_lifetime=3600,         # Max age of pooled connection
    liveness_check_timeout=60             # Idle check threshold
)

Best Practices#

  • Create one driver instance per application
  • Driver objects are expensive to create (connection pool setup)
  • Sessions are lightweight - create/close as needed
  • Use context managers for automatic resource cleanup

Transaction Support#

Transaction Types#

  1. Auto-commit: Single statement, no retry

    session.run("CREATE (n:Node)")
  2. Managed Transactions: Recommended - includes retry logic

    session.execute_read(work_function)
    session.execute_write(work_function)
  3. Explicit Transactions: Manual control

    tx = session.begin_transaction()
    try:
        tx.run(query)
        tx.commit()
    except:
        tx.rollback()

Causal Consistency#

# Bookmark management for causal chains
with driver.session(bookmarks=[bookmark]) as session:
    session.execute_write(work)
    new_bookmark = session.last_bookmark()

Error Handling#

from neo4j.exceptions import (
    ServiceUnavailable,
    TransientError,
    ClientError,
    DatabaseError
)

try:
    driver.execute_query(query)
except ServiceUnavailable:
    # Connection/cluster issues
except TransientError:
    # Retryable errors (deadlock, etc.)
except ClientError:
    # Query syntax, constraint violations

Type System#

Neo4j to Python Type Mapping#

Neo4j TypePython Type
Integerint
Floatfloat
Stringstr
Booleanbool
Listlist
Mapdict
Nodeneo4j.graph.Node
Relationshipneo4j.graph.Relationship
Pathneo4j.graph.Path
Datedatetime.date
DateTimedatetime.datetime
Durationneo4j.time.Duration
Pointneo4j.spatial.Point

Performance Optimization#

Rust Extensions#

pip install neo4j-rust-ext
  • Drop-in replacement for default transport
  • Significant performance improvement for I/O-heavy workloads
  • No code changes required

Bulk Operations#

# Batch create with UNWIND
session.execute_write(lambda tx: tx.run(
    "UNWIND $batch AS row CREATE (n:Node {id: row.id, name: row.name})",
    batch=[{"id": 1, "name": "A"}, {"id": 2, "name": "B"}]
))

Testing#

# Use testcontainers for integration tests
from testcontainers.neo4j import Neo4jContainer

with Neo4jContainer() as neo4j:
    driver = GraphDatabase.driver(
        neo4j.get_connection_url(),
        auth=("neo4j", neo4j.NEO4J_ADMIN_PASSWORD)
    )

Limitations#

  • No built-in ORM/OGM (use neomodel for that)
  • Cypher-only (no Gremlin support)
  • Manual schema management required
  • No migration tooling included

When to Use#

Choose neo4j-driver when:

  • Direct control over queries and transactions is needed
  • Performance is critical (with Rust extensions)
  • Building custom abstractions on top
  • Async support is required

Consider alternatives when:

  • OGM patterns would simplify development (use neomodel)
  • Multi-database portability is needed (use gremlinpython)

Resources#


neomodel - Neo4j Object Graph Mapper#

Overview#

neomodel is a Python Object Graph Mapper (OGM) for Neo4j that provides Django-style model definitions for graph data. It allows developers to work with graph data using Pythonic patterns without writing raw Cypher queries.

Key Information#

AttributeValue
Packageneomodel
Version6.0.x
Python Support3.8+
ProtocolBolt (via neo4j-driver)
LicenseMIT
Repositorygithub.com/neo4j-contrib/neomodel
StatusNeo4j Labs (actively maintained)

Installation#

pip install neomodel

# With extras (includes Shapely for spatial data)
pip install neomodel[extras]

# With Rust driver extensions for performance
pip install neomodel[rust-driver-ext]

Configuration#

from neomodel import config

# Connection string
config.DATABASE_URL = 'bolt://neo4j:password@localhost:7687'

# Or using dataclass configuration (v6.0+)
from neomodel import NeomodelConfig

config = NeomodelConfig(
    driver_options={"max_connection_pool_size": 50},
    database="neo4j",
    auto_install_labels=True
)

Environment Variables#

NEO4J_BOLT_URL=bolt://neo4j:password@localhost:7687
NEO4J_DATABASE=neo4j

Model Definition#

Basic Node Definition#

from neomodel import (
    StructuredNode, StringProperty, IntegerProperty,
    UniqueIdProperty, RelationshipTo, RelationshipFrom
)

class Person(StructuredNode):
    uid = UniqueIdProperty()
    name = StringProperty(required=True)
    age = IntegerProperty(index=True)

    # Relationships
    friends = RelationshipTo('Person', 'FRIENDS_WITH')
    employer = RelationshipTo('Company', 'WORKS_AT')

Property Types#

from neomodel import (
    StringProperty,      # String values
    IntegerProperty,     # Integer values
    FloatProperty,       # Floating point
    BooleanProperty,     # Boolean
    DateProperty,        # datetime.date
    DateTimeProperty,    # datetime.datetime
    UniqueIdProperty,    # Auto-generated UUID
    ArrayProperty,       # Lists
    JSONProperty,        # JSON-serializable dicts
    PointProperty,       # Spatial data
)

Relationship Properties#

from neomodel import StructuredRel, DateTimeProperty

class WorkedAt(StructuredRel):
    start_date = DateTimeProperty()
    end_date = DateTimeProperty()
    role = StringProperty()

class Person(StructuredNode):
    name = StringProperty()
    employers = RelationshipTo('Company', 'WORKED_AT', model=WorkedAt)

CRUD Operations#

Create#

# Create single node
person = Person(name="Alice", age=30).save()

# Create with relationships
company = Company(name="Acme").save()
person.employer.connect(company)

Read#

# Get by property
alice = Person.nodes.get(name="Alice")

# Filter nodes
adults = Person.nodes.filter(age__gte=18)

# All nodes
all_people = Person.nodes.all()

# First match
first_person = Person.nodes.first()

Update#

person = Person.nodes.get(name="Alice")
person.age = 31
person.save()

Delete#

person = Person.nodes.get(name="Alice")
person.delete()

Query API#

Filtering#

# Comparison operators
Person.nodes.filter(age__gt=25)      # Greater than
Person.nodes.filter(age__gte=25)     # Greater or equal
Person.nodes.filter(age__lt=25)      # Less than
Person.nodes.filter(age__lte=25)     # Less or equal
Person.nodes.filter(name__ne="Bob")  # Not equal

# String operators
Person.nodes.filter(name__contains="ali")
Person.nodes.filter(name__startswith="A")
Person.nodes.filter(name__endswith="ce")
Person.nodes.filter(name__icontains="ALI")  # Case insensitive

# List operations
Person.nodes.filter(name__in=["Alice", "Bob"])

Traversal (v6.0+)#

# Advanced traversal with filtering and ordering
results = Person.nodes.filter(name="Alice").traverse(
    relation_type="FRIENDS_WITH",
    filter_expr={"age__gte": 18},
    order_by="age"
)

Raw Cypher#

from neomodel import db

results, meta = db.cypher_query(
    "MATCH (p:Person) WHERE p.age > $age RETURN p",
    {"age": 25}
)

Async Support#

from neomodel import adb, AsyncStructuredNode

class Person(AsyncStructuredNode):
    name = StringProperty()

async def main():
    # Async operations
    person = await Person(name="Alice").save()
    alice = await Person.nodes.get(name="Alice")
    await person.delete()

    # Async traversal
    friends = await alice.friends.all()

Async Configuration#

from neomodel import adb

await adb.set_connection("bolt://localhost:7687")

Schema Management#

Constraints and Indexes#

from neomodel import install_all_labels, install_labels

# Install all constraints and indexes
install_all_labels()

# Install for specific models
install_labels(Person)

Schema Definition#

class Person(StructuredNode):
    # Unique constraint
    email = StringProperty(unique_index=True)

    # Index only
    name = StringProperty(index=True)

    # Required (not null)
    created = DateTimeProperty(required=True)

Hooks#

class Person(StructuredNode):
    name = StringProperty()

    def pre_save(self):
        # Called before saving
        self.name = self.name.strip()

    def post_save(self):
        # Called after saving
        log.info(f"Saved {self.name}")

    def pre_delete(self):
        # Called before deletion
        pass

    def post_delete(self):
        # Called after deletion
        pass

Transaction Support#

from neomodel import db

# Context manager
with db.transaction:
    person = Person(name="Alice").save()
    company = Company(name="Acme").save()
    person.employer.connect(company)

# Explicit control
db.begin()
try:
    person = Person(name="Alice").save()
    db.commit()
except:
    db.rollback()
    raise

Django Integration#

# settings.py
NEOMODEL_NEO4J_BOLT_URL = 'bolt://neo4j:password@localhost:7687'

# models.py
from django_neomodel import DjangoNode
from neomodel import StringProperty

class Person(DjangoNode):
    name = StringProperty()

    class Meta:
        app_label = 'myapp'

Vector and Full-Text Search (v6.0+)#

from neomodel import VectorIndex, FullTextIndex

class Document(StructuredNode):
    content = StringProperty()
    embedding = ArrayProperty()

    # Vector index for semantic search
    __vector_index__ = VectorIndex(
        property_name='embedding',
        dimensions=384
    )

    # Full-text index
    __fulltext_index__ = FullTextIndex(
        property_names=['content']
    )

Performance Considerations#

Batch Operations#

# Use batch_save for bulk inserts
from neomodel import db

with db.transaction:
    for data in large_dataset:
        Person(name=data['name']).save()

Connection Pooling#

Inherited from neo4j-driver configuration - set via driver_options in config.

Limitations#

  • Neo4j-specific (no multi-database portability)
  • No automatic migration tooling (schema drift possible)
  • OGM overhead vs. raw Cypher
  • Complex traversals may require raw Cypher

When to Use#

Choose neomodel when:

  • Django-like model patterns preferred
  • Type safety and validation important
  • Schema enforcement needed
  • Working primarily with Neo4j

Consider alternatives when:

  • Maximum performance required (use neo4j-driver)
  • Multi-database support needed (use gremlinpython)
  • Complex graph algorithms (use raw Cypher)

Resources#


py2neo (End of Life)#

Status: Archived#

IMPORTANT: py2neo is End of Life (EOL) as of 2023. No further updates will be released. Users should migrate to the official Neo4j Python driver.

The project has been transferred to Neo4j Inc. for archival purposes at neo4j-contrib/py2neo.

Overview#

py2neo was a comprehensive Neo4j client library and toolkit providing a high-level API, OGM capabilities, admin tools, and a Cypher lexer for Pygments. It supported both Bolt and HTTP protocols.

Key Information#

AttributeValue
Packagepy2neo
Final Version2021.2
Python Support3.x
ProtocolsBolt, HTTP
LicenseApache 2.0
Repositorygithub.com/neo4j-contrib/py2neo (archived)

Historical Features#

Graph Object API#

from py2neo import Graph, Node, Relationship

# Connect to database
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))

# Create nodes and relationships
alice = Node("Person", name="Alice")
bob = Node("Person", name="Bob")
knows = Relationship(alice, "KNOWS", bob)

# Merge to database
graph.merge(alice, "Person", "name")

OGM Capabilities#

from py2neo.ogm import GraphObject, Property, RelatedTo

class Person(GraphObject):
    __primarykey__ = "name"

    name = Property()
    born = Property()
    friends = RelatedTo("Person", "KNOWS")

# Usage
person = Person()
person.name = "Alice"
graph.push(person)

Cypher Execution#

# Direct Cypher queries
results = graph.run(
    "MATCH (p:Person {name: $name}) RETURN p",
    name="Alice"
)

for record in results:
    print(record["p"])

Batch Operations#

from py2neo import Graph

tx = graph.begin()
for i in range(1000):
    tx.create(Node("Item", id=i))
    if i % 100 == 0:
        tx.commit()
        tx = graph.begin()
tx.commit()

Migration Path#

For low-level access, migrate to the official Neo4j Python driver:

# py2neo (old)
from py2neo import Graph
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))
result = graph.run("MATCH (n) RETURN n")

# neo4j-driver (new)
from neo4j import GraphDatabase
driver = GraphDatabase.driver("neo4j://localhost:7687", auth=("neo4j", "password"))
with driver.session() as session:
    result = session.run("MATCH (n) RETURN n")

For OGM functionality, migrate to neomodel:

# py2neo OGM (old)
from py2neo.ogm import GraphObject, Property

class Person(GraphObject):
    name = Property()

# neomodel (new)
from neomodel import StructuredNode, StringProperty

class Person(StructuredNode):
    name = StringProperty()

Why py2neo Was Deprecated#

  1. Maintenance Burden: Single maintainer model not sustainable
  2. Official Driver Improvements: Neo4j’s official driver matured significantly
  3. Community Fragmentation: Multiple overlapping libraries caused confusion
  4. Compatibility Challenges: Keeping up with Neo4j versions became difficult

Historical Strengths#

  • Clean, Pythonic API
  • Built-in OGM functionality
  • Cypher lexer for syntax highlighting
  • HTTP fallback when Bolt unavailable
  • Comprehensive documentation

Historical Limitations#

  • Single maintainer created bus factor risk
  • Calendar versioning led to breaking changes
  • No async support
  • Performance overhead vs. official driver
  • Infrequent updates in later years

Lessons for Library Selection#

The py2neo deprecation offers lessons for evaluating graph database libraries:

  1. Prefer Official Drivers: Better long-term support guarantees
  2. Check Maintainer Count: Multiple maintainers reduce abandonment risk
  3. Evaluate Release Frequency: Regular releases indicate active maintenance
  4. Consider Corporate Backing: Libraries backed by database vendors more stable

Current Alternatives#

NeedRecommended Library
Low-level accessneo4j-driver
OGM functionalityneomodel
Async supportneo4j-driver (async)
Bulk operationsneo4j-driver + UNWIND

Resources (Archival)#


pydgraph - Dgraph Python Client#

Overview#

pydgraph is the official Python client for Dgraph, a distributed, horizontally scalable graph database. It uses gRPC for high-performance communication and supports Dgraph’s GraphQL-like query language (DQL, formerly GraphQL+-).

Key Information#

AttributeValue
Packagepydgraph
Version24.x / 25.x
Python Support3.7+
ProtocolgRPC
LicenseApache 2.0
Repositorygithub.com/hypermodeinc/pydgraph

Version Compatibility#

Dgraph Versionpydgraph Version
21.03.x21.03.x
23.0.x23.0.x
24.0.x24.0.x
25.0.x25.0.x

Installation#

pip install pydgraph

Connection Management#

Basic Connection#

import pydgraph

# Create client stub
stub = pydgraph.DgraphClientStub('localhost:9080')

# Create client
client = pydgraph.DgraphClient(stub)

# Close when done
stub.close()

Multiple Stubs (Cluster)#

# Connect to multiple cluster nodes
stub1 = pydgraph.DgraphClientStub('node1:9080')
stub2 = pydgraph.DgraphClientStub('node2:9080')
stub3 = pydgraph.DgraphClientStub('node3:9080')

client = pydgraph.DgraphClient(stub1, stub2, stub3)

Connection Strings#

# Using connection string
client = pydgraph.DgraphClient.from_cloud(
    "dgraph://user:pass@host:9080?tls=true"
)

# Dgraph Cloud
client = pydgraph.DgraphClient.from_cloud(
    "https://your-instance.cloud.dgraph.io/graphql",
    api_key="your-api-key"
)

TLS Configuration#

import grpc

# Load credentials
with open('ca.crt', 'rb') as f:
    ca_cert = f.read()

credentials = grpc.ssl_channel_credentials(ca_cert)

stub = pydgraph.DgraphClientStub(
    'localhost:9080',
    credentials=credentials
)

Schema Management#

Alter Schema#

schema = """
    name: string @index(exact) .
    age: int @index(int) .
    email: string @index(hash) @upsert .

    type Person {
        name
        age
        email
        friends
    }
"""

client.alter(pydgraph.Operation(schema=schema))

Drop Operations#

# Drop all data and schema
client.alter(pydgraph.Operation(drop_all=True))

# Drop specific predicate
client.alter(pydgraph.Operation(drop_attr='name'))

# Drop specific type
client.alter(pydgraph.Operation(drop_op=pydgraph.Operation.TYPE, drop_value='Person'))

Transaction Types#

Read-Write Transaction#

txn = client.txn()
try:
    # Mutations and queries
    response = txn.mutate(set_nquads='_:alice <name> "Alice" .')
    txn.commit()
finally:
    txn.discard()

Read-Only Transaction#

txn = client.txn(read_only=True)
try:
    response = txn.query(query_string)
finally:
    txn.discard()

Best-Effort Transaction#

# For stale reads (better performance)
txn = client.txn(read_only=True, best_effort=True)

Mutations#

JSON Mutations#

import json

txn = client.txn()
try:
    data = {
        'uid': '_:alice',
        'dgraph.type': 'Person',
        'name': 'Alice',
        'age': 30,
        'friends': [
            {'uid': '_:bob', 'dgraph.type': 'Person', 'name': 'Bob'}
        ]
    }

    response = txn.mutate(set_obj=data)

    # Get assigned UIDs
    alice_uid = response.uids['alice']
    bob_uid = response.uids['bob']

    txn.commit()
finally:
    txn.discard()

N-Quads Mutations#

txn = client.txn()
try:
    nquads = """
        _:alice <dgraph.type> "Person" .
        _:alice <name> "Alice" .
        _:alice <age> "30"^^<xs:int> .
    """
    response = txn.mutate(set_nquads=nquads)
    txn.commit()
finally:
    txn.discard()

Delete Mutations#

txn = client.txn()
try:
    # Delete specific predicate
    txn.mutate(del_nquads=f'<{uid}> <name> * .')

    # Delete node completely
    txn.mutate(del_obj={'uid': uid})

    txn.commit()
finally:
    txn.discard()

Queries (DQL)#

Basic Query#

query = """
    {
        people(func: type(Person)) {
            uid
            name
            age
            friends {
                name
            }
        }
    }
"""

response = client.txn(read_only=True).query(query)
result = json.loads(response.json)
people = result['people']

Parameterized Query#

query = """
    query findPerson($name: string) {
        person(func: eq(name, $name)) {
            uid
            name
            age
        }
    }
"""

variables = {'$name': 'Alice'}
response = client.txn(read_only=True).query(query, variables=variables)

Aggregation Queries#

query = """
    {
        stats(func: type(Person)) {
            count: count(uid)
            avgAge: avg(age)
            minAge: min(age)
            maxAge: max(age)
        }
    }
"""

Upsert Operations#

Basic Upsert#

query = """
    query {
        user as var(func: eq(email, "[email protected]"))
    }
"""

mutation = """
    uid(user) <name> "Alice" .
    uid(user) <email> "[email protected]" .
"""

txn = client.txn()
try:
    request = txn.create_request(
        query=query,
        mutations=[pydgraph.Mutation(set_nquads=mutation)]
    )
    response = txn.do_request(request)
    txn.commit()
finally:
    txn.discard()

Conditional Upsert#

query = """
    query {
        user as var(func: eq(email, "[email protected]"))
    }
"""

# Only mutate if user doesn't exist
mutation = pydgraph.Mutation(
    set_nquads='uid(user) <name> "Alice" .',
    cond='@if(eq(len(user), 0))'
)

Async Operations#

pydgraph provides async variants using gRPC futures:

async_alter#

future = client.async_alter(pydgraph.Operation(schema=schema))

# Handle result
try:
    result = pydgraph.DgraphClient.handle_alter_future(future)
except Exception as e:
    if pydgraph.util.is_jwt_expired(e):
        # Refresh token and retry
        pass

async_query and async_mutation#

txn = client.txn()

# Async query
query_future = txn.async_query(query_string)
result = pydgraph.Txn.handle_query_future(query_future)

# Async mutation
mutation_future = txn.async_mutation(set_obj=data)
result = pydgraph.Txn.handle_mutate_future(mutation_future)

Note: Async methods use gRPC futures, not native Python asyncio. They cannot retry on JWT expiration.

ACL and Authentication#

Login#

# Login with credentials
client.login("groot", "password")

# Login with namespace (multi-tenancy)
client.login_into_namespace("user", "password", namespace=1)

Refresh Token#

# Tokens expire - refresh periodically
client.retry_login()

Error Handling#

import grpc

try:
    response = txn.mutate(set_obj=data)
    txn.commit()
except grpc.RpcError as e:
    if e.code() == grpc.StatusCode.ABORTED:
        # Transaction conflict - retry
        pass
    elif e.code() == grpc.StatusCode.UNAUTHENTICATED:
        # JWT expired
        client.retry_login()
except pydgraph.errors.TransactionError as e:
    # Transaction-specific error
    pass
finally:
    txn.discard()

Performance Considerations#

Batch Mutations#

# Accumulate mutations, commit in batches
txn = client.txn()
batch = []

for item in large_dataset:
    batch.append(item)
    if len(batch) >= 1000:
        txn.mutate(set_obj=batch)
        batch = []

if batch:
    txn.mutate(set_obj=batch)

txn.commit()

Connection Reuse#

# Reuse client and stubs across requests
# Create once at application startup

Limitations#

  • gRPC futures not native asyncio
  • No connection pooling (manage stubs manually)
  • No OGM layer included
  • DQL learning curve (different from Cypher/Gremlin)
  • Limited IDE support for DQL
  • No built-in migration tooling

When to Use#

Choose pydgraph when:

  • Distributed, horizontally scalable graph needed
  • GraphQL-native development preferred
  • Multi-tenancy (namespaces) required
  • Integration with Dgraph Cloud
  • High-write throughput scenarios

Consider alternatives when:

  • Native asyncio critical
  • OGM patterns preferred
  • Cypher or Gremlin expertise exists
  • Smaller scale deployments

Resources#


python-arango - ArangoDB Python Driver#

Overview#

python-arango is the official Python driver for ArangoDB, providing comprehensive access to ArangoDB’s multi-model capabilities including document, graph, and key-value operations. It offers a Pythonic interface to ArangoDB’s REST API.

Key Information#

AttributeValue
Packagepython-arango
Version8.2.x
Python Support3.9+
ArangoDB Support3.11+
ProtocolHTTP REST
LicenseMIT
Repositorygithub.com/arangodb/python-arango

Installation#

pip install python-arango

# For async support (separate package)
pip install python-arango-async

Connection Management#

Basic Connection#

from arango import ArangoClient

# Initialize client
client = ArangoClient(hosts="http://localhost:8529")

# Connect to database
db = client.db("mydb", username="root", password="password")

# System database for admin operations
sys_db = client.db("_system", username="root", password="password")

Connection Options#

client = ArangoClient(
    hosts="http://localhost:8529",
    http_client=None,          # Custom HTTP client
    serializer=None,           # Custom JSON serializer
    deserializer=None          # Custom JSON deserializer
)

Document Operations#

Basic CRUD#

# Get collection
collection = db.collection("users")

# Insert
metadata = collection.insert({"name": "Alice", "age": 30})
# Returns: {'_id': 'users/12345', '_key': '12345', '_rev': '_abc123'}

# Get by key
doc = collection.get("12345")

# Update
collection.update({"_key": "12345", "age": 31})

# Replace
collection.replace({"_key": "12345", "name": "Alice", "age": 31})

# Delete
collection.delete("12345")

Batch Operations#

# Batch insert
docs = [{"name": f"User{i}"} for i in range(1000)]
results = collection.insert_many(docs)

# Batch update
updates = [{"_key": key, "status": "active"} for key in keys]
collection.update_many(updates)

# Batch delete
collection.delete_many([{"_key": k} for k in keys])

AQL Queries#

Query Execution#

# Simple query
cursor = db.aql.execute(
    "FOR doc IN users FILTER doc.age > @min_age RETURN doc",
    bind_vars={"min_age": 25}
)

# Iterate results
for doc in cursor:
    print(doc)

# Get all results as list
results = cursor.batch()

Query Options#

cursor = db.aql.execute(
    query,
    bind_vars={"param": value},
    count=True,           # Include count
    batch_size=100,       # Results per batch
    ttl=3600,             # Cursor TTL in seconds
    max_runtime=30.0,     # Max execution time
    profile=True          # Include query profile
)

# Access statistics
print(cursor.statistics())
print(cursor.profile())

Graph Operations#

Graph Management#

# Create graph
graph = db.create_graph(
    "social",
    edge_definitions=[{
        "edge_collection": "knows",
        "from_vertex_collections": ["users"],
        "to_vertex_collections": ["users"]
    }]
)

# Get existing graph
graph = db.graph("social")

Vertex Operations#

# Get vertex collection
users = graph.vertex_collection("users")

# Insert vertex
users.insert({"_key": "alice", "name": "Alice"})

# Get vertex
alice = users.get("alice")

# Update vertex
users.update({"_key": "alice", "age": 30})

Edge Operations#

# Get edge collection
knows = graph.edge_collection("knows")

# Insert edge
knows.insert({
    "_from": "users/alice",
    "_to": "users/bob",
    "since": 2020
})

# Traverse graph
result = graph.traverse(
    start_vertex="users/alice",
    direction="outbound",
    max_depth=2
)

Async Support#

python-arango-async (Separate Package)#

from arangoasync import ArangoClient
from arangoasync.auth import Auth

async with ArangoClient(hosts="http://localhost:8529") as client:
    auth = Auth(username="root", password="password")
    db = await client.db("mydb", auth=auth)

    # Async operations
    collection = db.collection("users")
    await collection.insert({"name": "Alice"})

    cursor = await db.aql.execute("FOR doc IN users RETURN doc")
    async for doc in cursor:
        print(doc)

Fire-and-Forget Async (python-arango)#

# Create async execution context
async_db = db.begin_async_execution(return_result=True)

# Queue operations
job1 = async_db.collection("users").insert({"name": "Alice"})
job2 = async_db.collection("users").insert({"name": "Bob"})

# Check job status
print(job1.status())  # 'pending', 'done', or 'error'

# Get results when ready
result1 = job1.result()

Transaction Support#

Single Request Transactions#

# Stream transaction
txn = db.begin_transaction(
    read=["users"],
    write=["orders"]
)

try:
    txn.collection("users").insert({"name": "Alice"})
    txn.collection("orders").insert({"item": "Book"})
    txn.commit()
except:
    txn.abort()
    raise

JavaScript Transactions#

# Server-side transaction with JavaScript
result = db.transaction(
    read=["users"],
    write=["orders"],
    action="""
        function(params) {
            const db = require('@arangodb').db;
            const user = db.users.insert({name: params.name});
            return user;
        }
    """,
    params={"name": "Alice"}
)

Index Management#

# Create persistent index
collection.add_persistent_index(
    fields=["name", "email"],
    unique=True,
    sparse=False
)

# Create geo index
collection.add_geo_index(
    fields=["location"],
    geo_json=True
)

# Create fulltext index (deprecated, use ArangoSearch)
collection.add_fulltext_index(
    fields=["description"],
    min_length=3
)

# List indexes
for index in collection.indexes():
    print(index)

ArangoSearch Views#

# Create search view
db.create_arangosearch_view(
    name="users_view",
    properties={
        "links": {
            "users": {
                "analyzers": ["text_en"],
                "fields": {
                    "name": {},
                    "bio": {"analyzers": ["text_en"]}
                }
            }
        }
    }
)

# Search query
cursor = db.aql.execute("""
    FOR doc IN users_view
    SEARCH ANALYZER(doc.bio == "developer", "text_en")
    RETURN doc
""")

Error Handling#

from arango.exceptions import (
    ArangoError,
    DocumentInsertError,
    DocumentGetError,
    AQLQueryExecuteError,
    TransactionAbortError
)

try:
    collection.insert({"_key": "duplicate"})
except DocumentInsertError as e:
    print(f"Error code: {e.error_code}")
    print(f"HTTP status: {e.http_code}")
    print(f"Message: {e.error_message}")
except ArangoError as e:
    # Generic error handling
    pass

Foxx Microservices#

# Install Foxx service
db.foxx.install(
    mount="/myapp",
    source="https://github.com/user/foxx-service/archive/main.zip"
)

# Call Foxx endpoint
response = db.foxx.request(
    method="POST",
    mount="/myapp",
    path="/api/endpoint",
    data={"param": "value"}
)

Cluster Support#

# Cluster health
health = sys_db.cluster.server_health()

# Cluster statistics
stats = sys_db.cluster.statistics()

# Rebalance shards
sys_db.cluster.rebalance_shards()

Limitations#

  • No native Python asyncio in main package (use python-arango-async)
  • No OGM layer (document-centric design)
  • HTTP protocol only (no binary protocol)
  • Fire-and-forget async differs from true async

When to Use#

Choose python-arango when:

  • Multi-model database needed (document + graph + key-value)
  • AQL query language preferred
  • Microservice architecture (Foxx)
  • Horizontal scaling required

Consider alternatives when:

  • Pure graph database needed (use Neo4j)
  • Native asyncio critical (use python-arango-async)
  • Gremlin compatibility needed (use gremlinpython)

Resources#


pyTigerGraph - TigerGraph Python Client#

Overview#

pyTigerGraph is the official Python package for interacting with TigerGraph databases. It provides comprehensive access to TigerGraph’s graph analytics and machine learning capabilities, with special emphasis on Graph Data Science (GDS) workflows.

Key Information#

AttributeValue
PackagepyTigerGraph
Version1.6.x
Python Support3.7+
ProtocolREST API
LicenseApache 2.0
Repositorygithub.com/tigergraph/pyTigerGraph

Installation#

# Core package
pip install pyTigerGraph

# With Graph Data Science features
pip install 'pyTigerGraph[gds]'

Connection Management#

Basic Connection#

import pyTigerGraph as tg

conn = tg.TigerGraphConnection(
    host="https://your-instance.i.tgcloud.io",
    graphname="MyGraph",
    username="tigergraph",
    password="password"
)

# Generate API token
conn.getToken(conn.createSecret())

TigerGraph Cloud#

conn = tg.TigerGraphConnection(
    host="https://your-instance.i.tgcloud.io",
    graphname="MyGraph",
    apiToken="your-api-token"
)

Async Connection#

from pyTigerGraph import AsyncTigerGraphConnection

async_conn = AsyncTigerGraphConnection(
    host="https://your-instance.i.tgcloud.io",
    graphname="MyGraph",
    username="tigergraph",
    password="password"
)

Schema Management#

Object-Oriented Schema (v1.5+)#

# Define vertex types
person = conn.gds.vertexType("Person", [
    ("name", "STRING"),
    ("age", "INT"),
    ("email", "STRING")
])

# Define edge types
knows = conn.gds.edgeType("KNOWS",
    from_vertex="Person",
    to_vertex="Person",
    attributes=[
        ("since", "DATETIME"),
        ("strength", "FLOAT")
    ]
)

# Create graph from schema
conn.gds.createGraph("SocialNetwork", [person], [knows])

GSQL Schema Definition#

# Execute GSQL directly
conn.gsql("""
    CREATE VERTEX Person (
        PRIMARY_ID id STRING,
        name STRING,
        age INT
    )

    CREATE DIRECTED EDGE KNOWS (
        FROM Person,
        TO Person,
        since DATETIME
    )
""")

Data Operations#

Vertex Operations#

# Upsert vertex (insert or update)
conn.upsertVertex(
    vertexType="Person",
    vertexId="alice",
    attributes={"name": "Alice", "age": 30}
)

# Get vertex
vertex = conn.getVertices(
    vertexType="Person",
    vertexIds=["alice"]
)

# Delete vertex
conn.delVertices(
    vertexType="Person",
    vertexId="alice"
)

Edge Operations#

# Upsert edge
conn.upsertEdge(
    sourceVertexType="Person",
    sourceVertexId="alice",
    edgeType="KNOWS",
    targetVertexType="Person",
    targetVertexId="bob",
    attributes={"since": "2020-01-01"}
)

# Get edges
edges = conn.getEdges(
    sourceVertexType="Person",
    sourceVertexId="alice",
    edgeType="KNOWS"
)

Bulk Operations#

# Bulk upsert vertices
vertices = [
    {"name": "Alice", "age": 30},
    {"name": "Bob", "age": 25},
    {"name": "Charlie", "age": 35}
]
conn.upsertVertices("Person", vertices)

# Bulk upsert edges
edges = [
    {
        "from_id": "alice",
        "to_id": "bob",
        "attributes": {"since": "2020-01-01"}
    }
]
conn.upsertEdges("Person", "KNOWS", "Person", edges)

GSQL Queries#

Installed Queries#

# Install query
conn.gsql("""
    CREATE QUERY findFriends(VERTEX<Person> p) FOR GRAPH MyGraph {
        Start = {p};
        Friends = SELECT t
                  FROM Start:s -(KNOWS)-> Person:t;
        PRINT Friends;
    }
    INSTALL QUERY findFriends
""")

# Run installed query
result = conn.runInstalledQuery("findFriends", {"p": "alice"})

# Async query execution
job_id = conn.runInstalledQuery("longQuery", params={}, runAsync=True)
status = conn.checkQueryStatus([job_id])

Interpreted Queries#

# Run query without installing
result = conn.gsql("""
    INTERPRET QUERY () FOR GRAPH MyGraph {
        Persons = SELECT p FROM Person:p;
        PRINT Persons;
    }
""")

Query Metadata#

# Get query information
metadata = conn.getQueryMetadata("findFriends")

# List running queries
running = conn.getRunningQueries()

# Abort query
conn.abortQuery(["query_id_1", "query_id_2"])

Graph Data Science#

Feature Engineering (with GDS package)#

# Install GDS algorithms
conn.gds.featurizer.installAlgorithm("pagerank")

# Run PageRank
result = conn.gds.featurizer.runAlgorithm(
    "pagerank",
    params={"v_type": "Person", "e_type": "KNOWS"}
)

# Community detection
result = conn.gds.featurizer.runAlgorithm(
    "louvain",
    params={"v_type": "Person", "e_type": "KNOWS"}
)

Graph Neural Networks#

# PyTorch Geometric data loader
from torch_geometric.loader import NeighborLoader

# Create graph data
data = conn.gds.getVertexDataFrame("Person")

# Vertex feature extraction
features = conn.gds.featurizer.extractVertexFeatures(
    v_type="Person",
    attributes=["age", "degree"]
)

# Edge feature extraction
edge_features = conn.gds.featurizer.extractEdgeFeatures(
    e_type="KNOWS",
    attributes=["strength"]
)

Train/Test Split#

# Split vertices for ML
conn.gds.vertexSplitter(
    v_types=["Person"],
    train_fraction=0.8,
    validate_fraction=0.1,
    test_fraction=0.1
)

Error Handling#

from pyTigerGraph.pyTigerGraphException import TigerGraphException

try:
    result = conn.runInstalledQuery("nonexistent")
except TigerGraphException as e:
    print(f"Error: {e}")

Authentication#

Token Management#

# Create secret
secret = conn.createSecret()

# Get token with lifetime
token = conn.getToken(secret, lifetime=86400)  # 24 hours

# Refresh token
new_token = conn.refreshToken(secret)

Role-Based Access#

# With RBAC
conn = tg.TigerGraphConnection(
    host="https://your-instance.i.tgcloud.io",
    graphname="MyGraph",
    username="analyst",
    password="password"
)

Performance Considerations#

Caveats#

From official documentation:

“pyTigerGraph may perform slower than direct HTTP requests to the TigerGraph REST API due to its feature-rich abstraction layer adding URL setup, logging, authentication, and validation.”

Optimization Tips#

# Use bulk operations for large datasets
conn.upsertVertices("Person", large_list, atomic=False)

# Disable unnecessary logging
import logging
logging.getLogger("pyTigerGraph").setLevel(logging.WARNING)

# Use async for long-running queries
job_id = conn.runInstalledQuery("heavyQuery", runAsync=True)

Limitations#

  • REST-only protocol (higher latency than binary protocols)
  • Performance overhead from abstraction layer
  • GSQL learning curve for complex queries
  • Less mature ecosystem than Neo4j/ArangoDB
  • GDS features require additional package

When to Use#

Choose pyTigerGraph when:

  • Graph analytics and ML are primary use cases
  • Large-scale graph processing needed
  • GSQL expertise available
  • TigerGraph Cloud deployment
  • Integration with PyTorch Geometric or DGL needed

Consider alternatives when:

  • Simple CRUD operations primary use case
  • Low latency critical (consider direct REST)
  • Multi-database portability needed
  • Smaller graphs with simpler requirements

Resources#


rdflib - RDF Graph Library for Python#

Overview#

rdflib is a pure Python library for working with RDF (Resource Description Framework) data. It provides comprehensive support for parsing, serializing, and querying RDF graphs using SPARQL, making it the standard choice for semantic web and linked data applications in Python.

Key Information#

AttributeValue
Packagerdflib
Version7.2.x
Python Support3.8+
Query LanguageSPARQL 1.1
LicenseBSD-3-Clause
Repositorygithub.com/RDFLib/rdflib

Installation#

pip install rdflib

# With optional dependencies
pip install rdflib[html,lxml]

Core Concepts#

RDF Triples#

RDF data consists of triples: (subject, predicate, object)

from rdflib import Graph, Literal, URIRef, Namespace
from rdflib.namespace import RDF, FOAF, XSD

# Create graph
g = Graph()

# Define namespace
EX = Namespace("http://example.org/")

# Add triple
g.add((
    EX.alice,                           # Subject
    FOAF.name,                          # Predicate
    Literal("Alice", datatype=XSD.string)  # Object
))

Node Types#

from rdflib import URIRef, Literal, BNode

# URI Reference (resources)
person = URIRef("http://example.org/alice")

# Literal (values)
name = Literal("Alice")
age = Literal(30, datatype=XSD.integer)
name_en = Literal("Alice", lang="en")

# Blank Node (anonymous)
address = BNode()

Graph Operations#

Creating and Populating Graphs#

from rdflib import Graph, Literal, URIRef
from rdflib.namespace import RDF, FOAF

g = Graph()

# Bind namespace prefix
g.bind("foaf", FOAF)
g.bind("ex", EX)

# Add triples
alice = URIRef("http://example.org/alice")
bob = URIRef("http://example.org/bob")

g.add((alice, RDF.type, FOAF.Person))
g.add((alice, FOAF.name, Literal("Alice")))
g.add((alice, FOAF.knows, bob))

g.add((bob, RDF.type, FOAF.Person))
g.add((bob, FOAF.name, Literal("Bob")))

Querying Triples#

# All triples
for s, p, o in g:
    print(s, p, o)

# Specific patterns
for person in g.subjects(RDF.type, FOAF.Person):
    name = g.value(person, FOAF.name)
    print(f"{person}: {name}")

# Check existence
if (alice, FOAF.knows, bob) in g:
    print("Alice knows Bob")

Removing Triples#

# Remove specific triple
g.remove((alice, FOAF.knows, bob))

# Remove by pattern (None = wildcard)
g.remove((alice, None, None))  # Remove all triples about Alice

SPARQL Queries#

SELECT Queries#

query = """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>

    SELECT ?name ?friend
    WHERE {
        ?person a foaf:Person ;
                foaf:name ?name ;
                foaf:knows ?friendUri .
        ?friendUri foaf:name ?friend .
    }
"""

for row in g.query(query):
    print(f"{row.name} knows {row.friend}")

ASK Queries#

query = """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>

    ASK {
        ?person foaf:name "Alice" .
    }
"""

result = g.query(query)
print(bool(result))  # True or False

CONSTRUCT Queries#

query = """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX ex: <http://example.org/>

    CONSTRUCT {
        ?person ex:displayName ?name .
    }
    WHERE {
        ?person foaf:name ?name .
    }
"""

result_graph = g.query(query).graph

Parameterized Queries#

from rdflib.plugins.sparql import prepareQuery

query = prepareQuery("""
    SELECT ?name
    WHERE {
        ?person foaf:name ?name .
    }
""", initNs={"foaf": FOAF})

# With initial bindings
results = g.query(
    query,
    initBindings={'person': alice}
)

SPARQL Update#

update = """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>

    INSERT DATA {
        <http://example.org/charlie> a foaf:Person ;
            foaf:name "Charlie" .
    }
"""

g.update(update)

Serialization#

Parsing RDF#

# Parse from file
g.parse("data.ttl", format="turtle")

# Parse from URL
g.parse("http://example.org/data.rdf")

# Parse from string
g.parse(data=rdf_string, format="turtle")

# Supported formats
formats = ["xml", "turtle", "n3", "nt", "nquads", "trig", "json-ld"]

Serializing RDF#

# Serialize to string
turtle = g.serialize(format="turtle")
jsonld = g.serialize(format="json-ld")

# Serialize to file
g.serialize("output.ttl", format="turtle")

# Available formats
# RDF/XML, N3, NTriples, N-Quads, Turtle, TriG, TriX, JSON-LD, HexTuples

Persistence#

In-Memory (Default)#

g = Graph()  # Default in-memory store

Berkeley DB#

from rdflib import Graph
from rdflib.plugins.stores import BerkeleyDB

store = BerkeleyDB()
g = Graph(store, identifier="mygraph")
g.open("/path/to/store", create=True)

# Use graph...

g.close()

SQLite (via rdflib-sqlalchemy)#

# pip install rdflib-sqlalchemy
from rdflib import Graph, ConjunctiveGraph
from rdflib_sqlalchemy import registerplugins

registerplugins()

g = Graph(store="SQLAlchemy", identifier="mygraph")
g.open("sqlite:///graph.db", create=True)

Remote SPARQL Endpoints#

SPARQLWrapper#

# pip install sparqlwrapper
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
    SELECT ?label
    WHERE {
        <http://dbpedia.org/resource/Python_(programming_language)>
            rdfs:label ?label .
        FILTER (lang(?label) = 'en')
    }
""")
sparql.setReturnFormat(JSON)

results = sparql.query().convert()

Federated Queries#

query = """
    SELECT ?name ?abstract
    WHERE {
        ?person foaf:name ?name .
        SERVICE <http://dbpedia.org/sparql> {
            ?dbperson rdfs:label ?name ;
                      dbo:abstract ?abstract .
            FILTER (lang(?abstract) = 'en')
        }
    }
"""

Named Graphs and Datasets#

Conjunctive Graph#

from rdflib import ConjunctiveGraph, URIRef

# Dataset with multiple named graphs
ds = ConjunctiveGraph()

# Add to specific graph
graph1 = URIRef("http://example.org/graph1")
ds.add((alice, FOAF.name, Literal("Alice"), graph1))

# Query across graphs
for ctx in ds.contexts():
    print(f"Graph: {ctx.identifier}")

Dataset#

from rdflib import Dataset

ds = Dataset()
g1 = ds.graph(URIRef("http://example.org/graph1"))
g1.add((alice, FOAF.name, Literal("Alice")))

Custom SPARQL Functions#

from rdflib.plugins.sparql.operators import register_custom_function
from rdflib import Literal, URIRef

def custom_uppercase(value):
    return Literal(str(value).upper())

# Register function
register_custom_function(
    URIRef("http://example.org/uppercase"),
    custom_uppercase
)

# Use in query
query = """
    SELECT (ex:uppercase(?name) AS ?upper)
    WHERE { ?person foaf:name ?name }
"""

Async Support#

rdflib is synchronous. For async operations:

import asyncio
from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor()

async def async_query(graph, query):
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(
        executor,
        lambda: list(graph.query(query))
    )

Inference and Reasoning#

RDFS Inference#

from rdflib import Graph
from rdflib.plugins.stores.memory import IOMemory

# Enable RDFS reasoning
from rdflib import RDF, RDFS

g = Graph()
g.parse("ontology.ttl")

# Manual inference example
for s, _, o in g.triples((None, RDFS.subClassOf, None)):
    # Infer types based on subclass
    pass

OWL-RL (via owlrl)#

# pip install owlrl
import owlrl

g = Graph()
g.parse("data.ttl")

# Apply OWL-RL reasoning
owlrl.DeductiveClosure(owlrl.OWLRL_Semantics).expand(g)

Limitations#

  • No native async/await support
  • Memory-intensive for large graphs
  • SPARQL performance varies by store
  • Limited OWL reasoning (requires extensions)
  • Not a graph database (in-memory or file-based)

When to Use#

Choose rdflib when:

  • Working with RDF/semantic web data
  • SPARQL queries required
  • Linked data integration needed
  • Ontology processing
  • Standards compliance important (W3C RDF)

Consider alternatives when:

  • Property graph model preferred (use Neo4j)
  • High-performance database needed
  • Native async required
  • Large-scale graph analytics

Resources#


Graph Database Python Client Recommendations#

Executive Summary#

This document provides optimized recommendations for selecting Python client libraries for graph databases based on common use cases and technical requirements.

Primary Recommendations by Use Case#

1. General-Purpose Graph Applications#

Recommended: neo4j-driver + neomodel

ComponentLibraryRationale
Low-level accessneo4j-driverBest-in-class async, connection pooling, Rust extensions
OGM layerneomodelDjango-style models, validation, hooks

Why this combination:

  • Neo4j has the most mature Python ecosystem
  • neo4j-driver provides native asyncio for modern applications
  • neomodel adds productivity without sacrificing performance
  • Excellent documentation and community support

2. Multi-Database Portability#

Recommended: gremlinpython (with aiogremlin for async)

Compatible databases: Amazon Neptune, Azure Cosmos DB, JanusGraph, and more

Why Gremlin:

  • Standardized query language across vendors
  • Reduces vendor lock-in risk
  • Single codebase for multiple deployment targets

Caveats:

  • Native gremlinpython is synchronous; use aiogremlin or goblin for async
  • Database-specific features may not be accessible
  • Connection pooling has known recovery issues

3. Multi-Model Requirements (Document + Graph + Key-Value)#

Recommended: python-arango

Why ArangoDB:

  • Single database for multiple data models
  • AQL is powerful and SQL-like
  • Good Python driver quality

Async strategy: Use python-arango-async for true asyncio support

4. Semantic Web / RDF / Linked Data#

Recommended: rdflib + SPARQLWrapper

Why rdflib:

  • De facto standard for Python RDF processing
  • Full SPARQL 1.1 support
  • Extensive serialization format support

Limitations:

  • No native async (wrap with thread pools)
  • Not suitable for large-scale production graphs (use graph databases with SPARQL endpoints)

5. High-Scale Graph Analytics and ML#

Recommended: pyTigerGraph[gds]

Why TigerGraph:

  • Built-in graph data science algorithms
  • Direct integration with PyTorch Geometric and DGL
  • Distributed processing for large graphs

Caveats:

  • GSQL learning curve
  • Performance overhead in Python client
  • Smaller community than Neo4j

6. Distributed/Horizontally Scalable Graphs#

Recommended: pydgraph

Why Dgraph:

  • Native horizontal scaling
  • GraphQL-native design
  • gRPC for efficient communication

Caveats:

  • Async uses gRPC futures, not native asyncio
  • Smaller ecosystem than alternatives
  • DQL query language unique to Dgraph

Decision Matrix#

RequirementBest ChoiceRunner-up
Best overall Python experienceneo4j-driverpython-arango
OGM/Django-style modelsneomodelGoblin (Gremlin)
Native async/FastAPIneo4j-driverpython-arango-async
Database portabilitygremlinpython-
Multi-model (doc+graph)python-arango-
Graph ML/AnalyticspyTigerGraph[gds]-
Semantic web/RDFrdflib-
Horizontal scalingpydgraphTigerGraph
Cloud-native (AWS)gremlinpython (Neptune)-
Cloud-native (Azure)gremlinpython (Cosmos)-

Libraries to Avoid#

py2neo (Deprecated)#

  • End of Life - no further updates
  • Migrate to neo4j-driver + neomodel

Framework Integration Recommendations#

FastAPI Applications#

# Recommended stack
neo4j-driver (AsyncDriver)
# OR
python-arango-async

Django Applications#

# Recommended stack
neomodel with django_neomodel

Data Science / Jupyter#

# Recommended stack
pyTigerGraph[gds]  # For graph ML
# OR
rdflib  # For RDF/semantic data

Performance Optimization Tips#

Neo4j Stack#

  1. Install neo4j-rust-ext for 20-40% performance improvement
  2. Use execute_query() for simple operations (avoids session overhead)
  3. Configure connection pool based on concurrency needs
  4. Use UNWIND for bulk operations

ArangoDB Stack#

  1. Use batch methods (insert_many, update_many) for bulk operations
  2. Consider async driver for I/O-bound workloads
  3. Use ArangoSearch for full-text queries instead of AQL filters

Gremlin Stack#

  1. Prefer GraphBinary serialization over GraphSON
  2. Use prepared traversals for repeated queries
  3. Consider Goblin OGM for complex object mapping

Migration Considerations#

From py2neo to neo4j-driver#

  • Replace Graph.run() with session.run() or driver.execute_query()
  • Update transaction patterns to managed transactions
  • Migrate OGM code to neomodel

From SQL to Graph#

  • Start with neomodel for familiar ORM patterns
  • Use Cypher for complex traversals
  • Consider ArangoDB if joining existing document data

Conclusion#

For most Python graph database applications, the Neo4j ecosystem (neo4j-driver + neomodel) offers the best balance of:

  • API quality and Pythonic design
  • Native async support
  • Documentation and community
  • Performance (with Rust extensions)
  • OGM productivity (neomodel)

For specialized requirements (multi-database portability, RDF/semantic web, graph ML, or horizontal scaling), select the specialized library that best matches the use case as outlined above.

S3: Need-Driven

S3 Need-Driven Discovery: Graph Database Client Libraries#

Methodology Overview#

This analysis evaluates Python graph database client libraries through a need-driven lens, matching library capabilities to real-world use case requirements rather than comparing features in isolation.

Analysis Framework#

1. Use Case Decomposition#

Each use case is analyzed across five dimensions:

DimensionQuestions Addressed
Graph ModelProperty graph vs RDF vs hypergraph? Schema flexibility needs?
Query PatternsTraversal depth? Path finding? Aggregations? Pattern matching?
Scale ProfileNode/edge counts? Query concurrency? Growth trajectory?
Processing ModeReal-time OLTP? Batch analytics? Hybrid?
Integration ContextREST APIs? Event streams? ETL pipelines? Existing stack?

2. Library Capability Mapping#

For each use case, libraries are evaluated on:

  • Native support: Does the library directly support required patterns?
  • Performance characteristics: Latency, throughput, memory efficiency
  • Developer experience: API ergonomics, documentation, debugging
  • Operational maturity: Stability, community support, enterprise readiness

3. Gap Analysis#

Identifying where library capabilities fall short:

  • Missing features requiring workarounds
  • Performance limitations at scale
  • Integration friction points
  • Operational blind spots

Use Cases Analyzed#

Use CasePrimary PatternScale ProfileProcessing Mode
Social NetworkTraversal-heavyHigh volume, real-timeOLTP
Knowledge GraphSemantic queriesMedium volume, complexHybrid
Fraud DetectionPattern matchingHigh throughputReal-time + batch
Recommendation EngineCollaborative filteringVery high volumeBatch + real-time
Network InfrastructureTopology analysisMedium volumeOLTP + analytics
Supply ChainPath optimizationMedium-high volumeHybrid

Evaluation Criteria#

Functional Fit (40%)#

  • Query language expressiveness for use case patterns
  • Data model alignment with domain requirements
  • Built-in algorithms vs custom implementation needs

Performance Fit (30%)#

  • Query latency for typical operations
  • Throughput under concurrent load
  • Memory efficiency for graph size

Operational Fit (20%)#

  • Connection pooling and failover
  • Monitoring and observability hooks
  • Transaction management capabilities

Integration Fit (10%)#

  • Async/await support
  • Framework compatibility (FastAPI, Django, etc.)
  • Data pipeline integration (Pandas, Apache Spark)

Libraries Under Evaluation#

LibraryDatabaseGraph ModelMaturity
neo4j (official)Neo4jProperty GraphProduction
py2neoNeo4jProperty GraphProduction
python-arangoArangoDBMulti-modelProduction
pyTigerGraphTigerGraphProperty GraphProduction
gremlinpythonVariousProperty GraphProduction
rdflibVariousRDF/Triple StoreProduction
NetworkXIn-memoryGeneralProduction

Deliverables#

  1. Per-use-case analysis: Detailed evaluation of library fit
  2. Recommendation matrix: Best-fit library by use case and constraint
  3. Gap documentation: Known limitations and workarounds

Recommendation Summary: Graph Database Client Libraries by Use Case#

Quick Reference Matrix#

Use CaseBest FitAlternativeScale Trigger
Social Networkneo4jpyTigerGraph> 100M users
Knowledge Graphneo4j + neosemanticsrdflib (small)> 1M triples
Fraud DetectionpyTigerGraphneo4j> 1B transactions
Recommendation Engineneo4jpyTigerGraph> 100M users
Network Infrastructureneo4jpython-arango> 1M resources
Supply Chainneo4jpyTigerGraphGlobal enterprise

Library Recommendations by Priority#

1. neo4j (Official Driver) - Primary Choice#

Best for: Most graph use cases at moderate scale

Strengths across use cases:

  • Cypher query language is most expressive for graph patterns
  • GDS (Graph Data Science) library covers common algorithms
  • Mature Python driver with async support
  • Strong community and documentation
  • Visualization tools (Bloom) for non-technical users

When to choose neo4j:

  • Team has or can develop Cypher expertise
  • Use case fits property graph model
  • Scale < 1B edges
  • Need graph algorithms (centrality, community, paths)
  • Visualization is important

Installation:

uv pip install neo4j

2. pyTigerGraph - Scale-First Choice#

Best for: High-volume fraud detection, massive recommendation systems

Strengths across use cases:

  • Distributed architecture handles massive scale
  • GSQL optimized for deep traversals
  • Strong financial services and enterprise focus
  • ML workbench for graph embeddings

When to choose pyTigerGraph:

  • Scale exceeds 1B edges
  • Deep traversals (5+ hops) are common
  • Distributed processing required
  • Enterprise budget available
  • Financial/fraud detection primary use case

Installation:

uv pip install pyTigerGraph

3. python-arango - Multi-Model Choice#

Best for: Knowledge graphs with complex documents, cost-sensitive deployments

Strengths across use cases:

  • Combines document + graph in single database
  • Good horizontal scaling
  • Cost-effective (open source core)
  • Flexible schema for evolving models

When to choose python-arango:

  • Need document storage alongside graph
  • Budget constraints on database licensing
  • Schema flexibility is priority
  • Multi-model queries beneficial

Installation:

uv pip install python-arango

4. rdflib - Standards-First Choice#

Best for: Small-to-medium knowledge graphs requiring RDF/SPARQL compliance

Strengths:

  • Full RDF/SPARQL specification compliance
  • Inference engine support
  • Standards-based data exchange
  • Good for linked data applications

When to choose rdflib:

  • RDF/SPARQL compliance required
  • Ontology reasoning needed
  • Scale < 1M triples
  • Academic or research contexts

Installation:

uv pip install rdflib

5. gremlinpython - Portability Choice#

Best for: Multi-database environments, cloud-native deployments

Strengths:

  • Works with many backends (Neptune, JanusGraph, etc.)
  • Cloud-managed options available
  • Standard traversal language

When to choose gremlinpython:

  • Using AWS Neptune or similar managed service
  • Need database portability
  • Multi-cloud strategy

Installation:

uv pip install gremlinpython

6. NetworkX - Analysis Choice#

Best for: Prototyping, offline analysis, algorithm development

Strengths:

  • Rich algorithm library
  • Easy Python integration
  • Great for research and prototyping
  • Integrates with scientific Python stack

When to choose NetworkX:

  • Prototyping graph logic before production
  • Offline batch analysis
  • Algorithm research and development
  • In-memory data fits requirements

Installation:

uv pip install networkx

Decision Framework#

START
  |
  v
Is scale > 1B edges?
  |-- YES --> pyTigerGraph
  |-- NO --> Continue
  |
  v
Is RDF/SPARQL compliance required?
  |-- YES --> Scale < 1M? --> rdflib
  |           Scale > 1M? --> neo4j + neosemantics
  |-- NO --> Continue
  |
  v
Is document + graph multi-model needed?
  |-- YES --> python-arango
  |-- NO --> Continue
  |
  v
Is database portability required?
  |-- YES --> gremlinpython
  |-- NO --> Continue
  |
  v
Production use case?
  |-- YES --> neo4j (official driver)
  |-- NO --> NetworkX for prototyping

Common Hybrid Patterns#

Pattern 1: Neo4j + NetworkX#

  • Neo4j for production serving
  • NetworkX for algorithm prototyping
  • Export graph subset for analysis
  • Graph database for relationship queries
  • Vector database (Pinecone, Milvus) for embeddings
  • Combine for hybrid recommendations

Pattern 3: Graph DB + Optimization Solver#

  • Graph database for topology storage
  • OR-Tools/Gurobi for constrained optimization
  • Write optimal solutions back to graph

Gaps Across All Libraries#

GapWorkaround
Real-time graph algorithmsPre-compute, cache results
Temporal queriesTemporal properties, time-bucketed subgraphs
Streaming ingestionExternal stream processor (Kafka Connect)
Multi-tenant isolationDatabase-per-tenant or property-based filtering
Schema migrationVersion properties, migration scripts

Final Recommendation#

For teams starting with graph databases in Python:

  1. Start with neo4j official driver - best documentation, most examples
  2. Add NetworkX for prototyping and analysis workflows
  3. Evaluate scale after initial deployment
  4. Consider pyTigerGraph if scaling beyond 1B edges
  5. Consider python-arango if multi-model becomes valuable

Use Case: Fraud Detection#

Domain Description#

Fraud detection leverages graph analysis to identify suspicious patterns in transactions, accounts, and entity relationships. Graphs excel at revealing hidden connections between seemingly unrelated entities, detecting ring structures, and identifying anomalous behavior patterns that traditional tabular analysis misses.

Requirements Analysis#

Graph Model Requirements#

AspectRequirementRationale
Model TypeProperty GraphNeed rich properties on both nodes and edges
SchemaFlexibleFraud patterns evolve; schema must adapt quickly
TemporalityTime-awareTransaction timestamps critical for pattern detection

Key Entity Types:

  • Accounts (bank, merchant, user)
  • Transactions (payments, transfers, purchases)
  • Devices (phones, IP addresses, browsers)
  • Identities (SSN, email, phone numbers)
  • Locations (addresses, GPS coordinates)

Query Pattern Complexity#

Primary Patterns:

  • Ring detection: Circular money flows (A -> B -> C -> A)
  • Shared identity: Multiple accounts sharing device/IP/email
  • Velocity analysis: Transaction frequency and amount patterns
  • Network expansion: Exploring N-hop neighborhood of suspicious entity
  • Similarity matching: Finding accounts with similar behavior patterns

Query Characteristics:

  • Depth: 3-6 hops for pattern detection
  • Time windows: Queries scoped to time ranges (last 24h, 7d, 30d)
  • Aggregation: Sum, count, standard deviation of transactions
  • Pattern matching: Complex subgraph patterns with constraints

Scale Requirements#

MetricTypical RangeHigh Scale
Accounts (nodes)10M - 100M1B+
Transactions (edges)100M - 10B100B+
Real-time queries100 - 1K QPS10K+ QPS
Pattern scans1M - 100M/hour1B+/hour

Processing Mode#

  • Real-time: Transaction scoring at payment time (< 100ms)
  • Near-real-time: Alert generation (< 5 min lag)
  • Batch: Pattern discovery, model training (hourly/daily)

Integration Requirements#

  • Transaction streaming (Kafka, Kinesis) for real-time ingestion
  • ML pipeline for fraud scoring models
  • Case management systems for investigation workflows
  • Regulatory reporting and audit trails
  • Alert delivery (email, SMS, dashboards)

Library Evaluation#

neo4j (Official Driver)#

Strengths:

  • Excellent pattern matching with Cypher
  • GDS library has community detection, PageRank for risk scoring
  • Good transaction support for consistent writes
  • Bloom visualization for investigators

Limitations:

  • Real-time scoring at 10K+ TPS challenging
  • Temporal queries require careful indexing
  • Graph algorithms not available in Community edition

Fit Score: 8/10

pyTigerGraph#

Strengths:

  • Built for high-throughput transaction processing
  • GSQL optimized for deep link analysis
  • Native support for temporal patterns
  • Designed for financial services scale

Limitations:

  • Enterprise licensing costs
  • Steeper learning curve for GSQL
  • Smaller Python community

Fit Score: 9/10 (high scale); 7/10 (smaller deployments)

python-arango#

Strengths:

  • Good throughput for transaction ingestion
  • Multi-model allows storing raw transaction documents
  • Flexible schema for evolving fraud patterns
  • Cost-effective scaling

Limitations:

  • Graph algorithms less mature than Neo4j GDS
  • Pattern matching syntax less expressive
  • Smaller fraud detection community

Fit Score: 7/10

gremlinpython (with Neptune)#

Strengths:

  • Managed service reduces operational burden
  • Good for AWS-native architectures
  • Scales horizontally

Limitations:

  • Query latency can be variable
  • Limited graph algorithm support
  • Gremlin verbose for complex patterns

Fit Score: 6/10

NetworkX (with external storage)#

Strengths:

  • Rich algorithm library for analysis
  • Good for offline pattern discovery
  • Easy prototyping of detection logic

Limitations:

  • In-memory only (not for production scale)
  • No persistence or transactions
  • Cannot handle real-time requirements

Fit Score: 4/10 (analysis only)

Gaps and Workarounds#

GapImpactWorkaround
Real-time graph algorithmsCannot run PageRank per transactionPre-compute risk scores, incremental updates
Temporal pattern matchingLimited native time-series supportTime-bucketed subgraphs, temporal indices
Streaming ingestionNot all drivers handle high-volume streamsKafka Connect, custom streaming layer
ExplainabilityGraph patterns hard to explain to regulatorsPath export, visualization, rule extraction
Model integrationLimited native ML supportFeature extraction to external ML pipeline

Architecture Pattern#

[Transaction Stream]
        |
        v
[Stream Processor] -- real-time features --> [ML Scoring Service]
        |
        v
[Graph Database] <-- enrichment queries
        |
        v
[Batch Analytics] -- pattern discovery --> [Rule Engine Update]

Hybrid Approach:

  1. Real-time: Feature extraction + ML scoring (sub-100ms)
  2. Near-real-time: Graph enrichment queries (100ms-1s)
  3. Batch: Deep pattern analysis, model retraining

Recommendation#

Best Fit: pyTigerGraph for enterprise fraud detection

At the scale typical for financial fraud detection (billions of transactions), TigerGraph’s distributed architecture and GSQL’s pattern matching capabilities make it the strongest choice. The financial services focus means battle-tested at relevant scale.

Alternative: neo4j official driver for smaller deployments or teams with existing Cypher expertise. The GDS library provides excellent algorithm support for pattern discovery.

Hybrid pattern: Use Neo4j/TigerGraph for graph storage and queries, with NetworkX for offline algorithm development and prototyping.


Use Case: Knowledge Graph#

Domain Description#

Knowledge graphs represent entities and their semantic relationships, enabling structured knowledge representation, reasoning, and discovery. Common applications include enterprise knowledge management, semantic search, question answering systems, and data integration across disparate sources.

Requirements Analysis#

Graph Model Requirements#

AspectRequirementRationale
Model TypeRDF/Triple Store OR Property GraphRDF for standards compliance; Property Graph for flexibility
SchemaOntology-drivenNeed formal type hierarchies and relationship constraints
SemanticsRich typingEntities have types, properties have ranges, relationships have semantics

Model Choice Considerations:

  • RDF/SPARQL: Best for standards compliance, linked data, inference
  • Property Graph: Better performance, easier development, less formal semantics

Query Pattern Complexity#

Primary Patterns:

  • Semantic traversal: Following typed relationships with constraints
  • Inference queries: Deriving implicit relationships from explicit ones
  • Faceted search: Filtering entities by multiple attribute combinations
  • Path queries: Finding connection paths with semantic constraints

Query Characteristics:

  • Depth: Variable (1-10+ hops depending on question complexity)
  • Filtering: Heavy use of type and property constraints
  • Aggregation: Counting, grouping by entity types
  • Reasoning: RDFS/OWL inference for RDF; manual for property graphs

Scale Requirements#

MetricTypical RangeHigh Scale
Entities (nodes)100K - 10M100M+
Facts (edges)1M - 100M1B+
Concurrent queries10 - 100 QPS1K+ QPS
Ontology complexity100 - 1K classes10K+ classes

Processing Mode#

  • Primary: Interactive queries for search and exploration
  • Secondary: Batch ingestion from source systems
  • Latency target: < 500ms for exploratory queries; < 100ms for autocomplete

Integration Requirements#

  • NLP pipelines for entity extraction and linking
  • Data integration from multiple source systems (databases, APIs, documents)
  • Search engines (Elasticsearch) for full-text capabilities
  • Visualization tools for graph exploration
  • LLM integration for natural language querying

Library Evaluation#

rdflib#

Strengths:

  • Native RDF/SPARQL support
  • Standards compliant (W3C specifications)
  • Good for small-to-medium knowledge graphs
  • Inference engine support (OWL-RL)

Limitations:

  • In-memory by default (persistence requires plugins)
  • Performance degrades above 1M triples
  • Limited concurrent query support
  • No built-in clustering

Fit Score: 7/10 (small-medium); 4/10 (large scale)

neo4j (Official Driver)#

Strengths:

  • Excellent query performance at scale
  • Flexible property graph for evolving ontologies
  • Full-text search integration
  • Strong Python ecosystem

Limitations:

  • No native RDF/SPARQL (requires neosemantics plugin)
  • No built-in inference engine
  • Ontology constraints require manual enforcement

Fit Score: 8/10

python-arango#

Strengths:

  • Multi-model allows combining document + graph
  • Good for knowledge graphs with rich entity attributes
  • Full-text search built-in
  • Scales well horizontally

Limitations:

  • No RDF/SPARQL support
  • Limited semantic reasoning capabilities
  • Smaller knowledge graph community

Fit Score: 7/10

gremlinpython (with Neptune/JanusGraph)#

Strengths:

  • Cloud-native options (AWS Neptune)
  • Supports both property graph and RDF modes
  • Good for large-scale deployments

Limitations:

  • Verbose query syntax for complex patterns
  • Variable performance across backends
  • Less intuitive for semantic queries

Fit Score: 6/10

pyTigerGraph#

Strengths:

  • Excellent scale for massive knowledge graphs
  • GSQL supports complex pattern matching
  • Built-in ML workbench for embeddings

Limitations:

  • Enterprise-focused (cost considerations)
  • Steeper learning curve
  • Limited RDF ecosystem integration

Fit Score: 7/10 (large scale)

Gaps and Workarounds#

GapImpactWorkaround
Inference across librariesMost lack native reasoningExternal reasoner (HermiT, Pellet) or pre-materialization
Schema evolutionOntology changes disruptiveVersioned schemas, migration scripts
Multilingual supportLimited language handlingExternal NLP, language-tagged properties
Provenance trackingNeed to track fact sourcesCustom edge properties for provenance
Temporal knowledgeFacts change over timeTemporal properties, versioned subgraphs

Hybrid Architecture Pattern#

For production knowledge graphs, consider a hybrid approach:

[RDFLib for ontology management]
        |
        v
[Neo4j/ArangoDB for query execution]
        |
        v
[Elasticsearch for full-text search]

This combines:

  • RDFLib’s semantic capabilities for schema management
  • Property graph’s query performance for runtime
  • Search engine’s text capabilities for discovery

Recommendation#

Best Fit: neo4j official driver with neosemantics plugin

For knowledge graph applications requiring both semantic expressiveness and query performance, Neo4j with the neosemantics (n10s) plugin provides the best balance. It supports RDF import/export while leveraging Cypher’s performance for queries.

Alternative: rdflib for smaller knowledge graphs (< 1M triples) where standards compliance and inference are primary requirements.

Alternative: python-arango when knowledge entities have complex nested attributes and document-style storage is beneficial.


Use Case: Network Infrastructure#

Domain Description#

Network infrastructure graphs model the topology and dependencies of IT systems, including physical networks, cloud resources, microservices, and their interconnections. Use cases include impact analysis, root cause detection, capacity planning, and configuration management.

Requirements Analysis#

Graph Model Requirements#

AspectRequirementRationale
Model TypeProperty GraphRich metadata on nodes (config, status); typed edges
SchemaSemi-structuredCore types stable; vendor-specific attributes vary
HierarchyMulti-levelPhysical -> logical -> application layers

Key Entity Types:

  • Physical: Servers, switches, routers, data centers
  • Virtual: VMs, containers, Kubernetes pods
  • Application: Services, databases, APIs, queues
  • Configuration: Ports, IPs, certificates, credentials
  • Connections: Network links, API calls, data flows

Query Pattern Complexity#

Primary Patterns:

  • Impact analysis: “What is affected if server X fails?”
  • Root cause: “What upstream dependencies could cause service Y to fail?”
  • Path analysis: Network paths between two endpoints
  • Configuration drift: Finding misconfigurations across related resources
  • Dependency depth: How deep is the dependency tree for a service?

Query Characteristics:

  • Depth: Variable (1-hop for direct deps, 5+ for full impact)
  • Direction: Both upstream and downstream traversals
  • Filtering: By resource type, status, environment
  • Aggregation: Counting affected resources, grouping by type

Scale Requirements#

MetricTypical RangeHigh Scale
Resources (nodes)10K - 100K1M+
Dependencies (edges)50K - 1M10M+
Query frequency10 - 100 QPS1K+ QPS
Update frequency100 - 10K/min100K/min

Processing Mode#

  • Real-time: Incident impact analysis (< 1s)
  • Near-real-time: Topology updates from discovery (< 1min lag)
  • Batch: Full topology reconciliation, analytics

Integration Requirements#

  • CMDB/asset management systems
  • Monitoring tools (Prometheus, Datadog, Nagios)
  • Cloud provider APIs (AWS, GCP, Azure)
  • Container orchestration (Kubernetes API)
  • Incident management (PagerDuty, ServiceNow)
  • IaC tools (Terraform, Ansible) for configuration

Library Evaluation#

neo4j (Official Driver)#

Strengths:

  • Excellent for dependency traversal queries
  • Cypher’s variable-length paths perfect for impact analysis
  • Good visualization integration for operations teams
  • APOC procedures for graph algorithms

Limitations:

  • Schema flexibility can lead to inconsistency
  • Need careful index strategy for large topologies
  • Single-node architecture limits write scale

Fit Score: 9/10

python-arango#

Strengths:

  • Multi-model stores rich configuration documents
  • Good for combining graph with document queries
  • Horizontal scaling for large infrastructures
  • Cost-effective for moderate scale

Limitations:

  • AQL traversal syntax less intuitive than Cypher
  • Fewer infrastructure-specific examples
  • Smaller operations/SRE community

Fit Score: 7/10

pyTigerGraph#

Strengths:

  • Scales well for very large infrastructures
  • Good for cross-region federated topologies
  • GSQL handles complex impact queries

Limitations:

  • Overkill for most infrastructure use cases
  • Enterprise licensing costs
  • Steeper learning curve

Fit Score: 6/10 (typical); 8/10 (very large scale)

gremlinpython (with Neptune/JanusGraph)#

Strengths:

  • Cloud-native options integrate with cloud monitoring
  • Standard traversal language
  • Good for multi-cloud environments

Limitations:

  • Verbose for operational queries
  • Debugging traversals challenging during incidents
  • Variable performance

Fit Score: 6/10

NetworkX#

Strengths:

  • Excellent for topology analysis algorithms
  • Easy integration with Python operations tools
  • Good for offline planning and analysis

Limitations:

  • In-memory only
  • Cannot handle real-time incident queries
  • No persistence

Fit Score: 5/10 (analysis only)

Gaps and Workarounds#

GapImpactWorkaround
Real-time topology updatesDiscovery lag during changesEvent-driven updates, eventual consistency
Multi-layer correlationPhysical-logical-app mapping complexTyped edges, layer property on nodes
Historical topologyNeed point-in-time topologyTemporal properties, snapshot graphs
Dynamic environmentsKubernetes pods ephemeralAggregate by service, not pod
Cross-system correlationMultiple source systemsCanonical ID mapping layer

Architecture Pattern#

[Discovery Sources]
   |-- Cloud APIs
   |-- K8s API
   |-- CMDB
   |-- Network monitoring
        |
        v
[Topology Aggregator] -- canonical model --> [Graph Database]
        |                                           |
        v                                           v
[Change Event Stream]                        [Query API]
        |                                           |
        v                                           v
[Alert Enrichment]                           [Dashboard/CLI]

Operational Queries:

// Impact analysis: What services are affected if this server fails?
MATCH (server:Server {id: $serverId})<-[:RUNS_ON*1..3]-(service:Service)
RETURN service.name, service.criticality

// Root cause: What could cause this API to fail?
MATCH path = (api:API {name: $apiName})-[:DEPENDS_ON*1..5]->(dep)
WHERE dep.status = 'unhealthy'
RETURN path

// Dependency depth
MATCH path = (service:Service {name: $svc})-[:DEPENDS_ON*]->(dep)
RETURN max(length(path)) as maxDepth

Operational Considerations#

ConsiderationApproach
Incident responsePre-computed impact sets for critical services
Discovery frequencyBalance freshness vs database load
Schema evolutionVersion type hierarchies, migration scripts
Access controlEnvironment-based graph partitioning
Audit trailChange log for topology modifications

Recommendation#

Best Fit: neo4j official driver

For network infrastructure and dependency mapping, Neo4j’s Cypher language provides the most natural expression of dependency queries. The ability to write variable-length path queries (-[:DEPENDS_ON*1..5]->) makes impact analysis and root cause queries intuitive.

Key advantages for operations:

  • Fast time-to-value with intuitive query language
  • Bloom visualization for non-technical stakeholders
  • Active operations/SRE community with examples

Alternative: python-arango when infrastructure includes complex configuration documents that benefit from document storage alongside graph relationships.

Complement with NetworkX for offline topology analysis, capacity planning, and what-if simulations that don’t need real-time data.


Use Case: Recommendation Engine#

Domain Description#

Recommendation engines leverage graph structures to model user-item relationships, enabling collaborative filtering, content-based recommendations, and hybrid approaches. Graphs naturally represent the bipartite relationship between users and items, as well as item-item and user-user similarities.

Requirements Analysis#

Graph Model Requirements#

AspectRequirementRationale
Model TypeProperty GraphEdge weights for ratings/interactions; rich node properties
StructureBipartite coreUser-Item relationships; User-User and Item-Item derived
WeightsNumeric edge propertiesRatings, interaction counts, recency scores

Key Entity Types:

  • Users (profiles, preferences, segments)
  • Items (products, content, services)
  • Interactions (views, purchases, ratings, saves)
  • Categories/Tags (content metadata)

Query Pattern Complexity#

Primary Patterns:

  • Collaborative filtering: “Users who liked X also liked Y”
  • User neighborhood: Similar users based on shared interactions
  • Item neighborhood: Similar items based on shared user base
  • Path-based recommendations: Multi-hop reasoning (A likes B, B similar to C)
  • Popularity queries: Top items by interaction count

Query Characteristics:

  • Depth: 2-3 hops typical (user -> item -> similar items)
  • Aggregation: Heavy (counting, averaging, ranking)
  • Filtering: By recency, category, availability
  • Personalization: User-specific traversal starting points

Scale Requirements#

MetricTypical RangeHigh Scale
Users (nodes)100K - 10M100M+
Items (nodes)10K - 1M10M+
Interactions (edges)10M - 1B100B+
Recommendation requests100 - 10K QPS100K+ QPS
Latency target< 100ms< 50ms

Processing Mode#

  • Real-time: Serving recommendations (< 100ms)
  • Batch: Computing similarity matrices, embeddings (hourly/daily)
  • Incremental: Updating recommendations as new interactions arrive

Integration Requirements#

  • API layer for client applications (REST, GraphQL)
  • Event streaming for real-time interaction capture
  • Feature store for ML model features
  • A/B testing infrastructure for recommendation experiments
  • Analytics for recommendation performance tracking

Library Evaluation#

neo4j (Official Driver)#

Strengths:

  • Cypher excellent for collaborative filtering queries
  • GDS library has similarity algorithms (cosine, Jaccard)
  • Node embedding algorithms for hybrid approaches
  • Good caching for repeated query patterns

Limitations:

  • Real-time computation at scale challenging
  • Need GDS for similarity algorithms (Enterprise)
  • No native matrix operations

Fit Score: 8/10

python-arango#

Strengths:

  • Good performance for bipartite graph queries
  • Multi-model allows storing item metadata as documents
  • Cost-effective scaling for high interaction volumes

Limitations:

  • Limited built-in similarity algorithms
  • Less mature recommendation-specific ecosystem
  • Need custom similarity implementations

Fit Score: 6/10

pyTigerGraph#

Strengths:

  • Excellent scale for high-volume interactions
  • GSQL supports complex aggregation patterns
  • Built-in ML workbench for embeddings
  • Graph feature extraction for ML models

Limitations:

  • Enterprise cost considerations
  • Overkill for smaller catalogs
  • Steeper learning curve

Fit Score: 8/10 (large scale); 6/10 (smaller deployments)

gremlinpython#

Strengths:

  • Works with multiple backends
  • Standard traversal patterns
  • Cloud options available

Limitations:

  • Verbose for aggregation-heavy queries
  • No built-in similarity algorithms
  • Performance varies by backend

Fit Score: 5/10

NetworkX#

Strengths:

  • Rich algorithm library (bipartite algorithms)
  • Easy prototyping of recommendation logic
  • Good for offline analysis and testing

Limitations:

  • In-memory only
  • Cannot serve real-time recommendations
  • No persistence

Fit Score: 3/10 (prototyping only)

Gaps and Workarounds#

GapImpactWorkaround
Real-time similarityCannot compute on-the-fly at scalePre-computed similarity cache
Cold startNew users/items have no connectionsContent-based fallback, popularity-based
Implicit feedbackView != purchase signal strengthWeight tuning, decay functions
DiversityGraph algorithms tend toward popular itemsRe-ranking layer, exploration bonus
ExplanationHard to explain graph-based recommendationsPath extraction, rule-based overlays

Architecture Pattern#

[Interaction Events]
        |
        v
[Stream Processor] --> [Real-time Features]
        |
        v
[Graph Database] <-- batch similarity updates
        |
        v
[Recommendation Service]
        |
        v
[Cache Layer] --> [API Response]

Hybrid Recommendation Pattern:

  1. Graph-based collaborative filtering for relationship signals
  2. Embedding-based similarity for scale and cold start
  3. Business rules layer for diversity, freshness, inventory
  4. Caching layer for latency requirements

Pre-computation Strategy#

For production recommendation systems, pre-compute:

ComputationFrequencyStorage
Item-item similarity top-KDailyGraph edges or Redis
User-item affinity scoresHourlyFeature store
User segmentsDailyUser properties
Popular items per categoryHourlyCache layer

Recommendation#

Best Fit: neo4j official driver for most recommendation use cases

Neo4j’s combination of expressive queries (Cypher) and graph algorithms (GDS) makes it well-suited for recommendation systems. The ability to compute Jaccard similarity, node embeddings, and community detection in the database enables sophisticated recommendations.

Alternative: pyTigerGraph for very high-scale systems (100M+ users, 1B+ interactions) where distributed processing is essential.

Hybrid pattern: Use the graph database for relationship storage and collaborative filtering queries, combined with vector similarity search (Pinecone, Milvus) for embedding-based recommendations and caching (Redis) for serving latency.


Use Case: Social Network Graph#

Domain Description#

Social networks model relationships between users including follows, friendships, group memberships, content sharing, and interactions. The graph structure captures the social fabric that enables features like friend suggestions, feed ranking, and influence analysis.

Requirements Analysis#

Graph Model Requirements#

AspectRequirementRationale
Model TypeProperty GraphNodes need rich attributes (profiles); edges need properties (timestamp, strength)
SchemaSemi-flexibleCore user/relationship types stable; new interaction types added frequently
DirectionalityMixedFollows are directional; friendships are bidirectional

Query Pattern Complexity#

Primary Patterns:

  • Friend-of-friend traversal: 2-3 hop neighborhood exploration
  • Mutual connections: Finding common neighbors between two users
  • Shortest path: Degrees of separation between users
  • Influence propagation: Multi-hop traversal with aggregation

Query Characteristics:

  • Depth: Typically 2-4 hops (beyond 4 hops performance degrades rapidly)
  • Breadth: Can explode (users with 10K+ connections)
  • Aggregation: Count, distinct, top-N patterns common

Scale Requirements#

MetricTypical RangeHigh Scale
Users (nodes)100K - 10M100M+
Relationships (edges)1M - 500M10B+
Concurrent queries100 - 1K QPS10K+ QPS
Write throughput100 - 10K/sec100K+/sec

Processing Mode#

  • Primary: Real-time OLTP for user-facing features
  • Secondary: Batch analytics for recommendations and insights
  • Latency target: < 50ms for interactive queries

Integration Requirements#

  • REST/GraphQL API layer for mobile/web clients
  • Event streaming for activity feeds (Kafka, Redis Streams)
  • ML pipeline integration for recommendation models
  • Analytics warehouse sync for business intelligence

Library Evaluation#

neo4j (Official Driver)#

Strengths:

  • Excellent Cypher support for complex traversals
  • Native path-finding algorithms (shortest path, all paths)
  • Strong transaction support for consistent updates
  • Async driver available for high concurrency

Limitations:

  • Single-database focus limits multi-tenancy options
  • Graph algorithms require separate APOC/GDS plugins
  • Connection pooling configuration can be complex

Fit Score: 8/10

py2neo#

Strengths:

  • Pythonic OGM (Object-Graph Mapping) layer
  • Easier onboarding for developers new to graphs
  • Good integration with pandas for analytics

Limitations:

  • Performance overhead from OGM abstraction
  • Less control over query optimization
  • Maintenance concerns (community-driven)

Fit Score: 6/10

python-arango#

Strengths:

  • Multi-model allows document storage alongside graph
  • AQL provides flexible query patterns
  • Built-in support for graph traversal with configurable depth

Limitations:

  • Less mature graph algorithm ecosystem
  • Smaller community for social network patterns
  • Traversal syntax less intuitive than Cypher

Fit Score: 7/10

pyTigerGraph#

Strengths:

  • Designed for massive scale (10B+ edges)
  • GSQL optimized for deep traversals
  • Built-in distributed processing

Limitations:

  • Steeper learning curve
  • Enterprise licensing costs
  • Less flexible for rapid prototyping

Fit Score: 7/10 (9/10 at very high scale)

gremlinpython#

Strengths:

  • Database-agnostic (works with many backends)
  • Standard traversal language
  • Good for multi-database environments

Limitations:

  • Verbose syntax compared to Cypher
  • Performance varies by backend
  • Debugging traversals can be challenging

Fit Score: 6/10

Gaps and Workarounds#

GapImpactWorkaround
Real-time graph algorithmsCannot compute PageRank on-the-flyPre-compute in batch, cache results
Supernodes (celebrities)Traversal explosionBidirectional search, sampling strategies
Temporal queriesLimited time-series supportAdd timestamp indices, partition by time
Multi-hop aggregationsMemory pressureStreaming result processing, pagination

Recommendation#

Best Fit: neo4j official driver

For social network applications, the combination of expressive Cypher queries, mature ecosystem (GDS for algorithms), and strong async support makes the official Neo4j driver the best choice for most scale profiles.

Alternative: pyTigerGraph for platforms expecting 100M+ users where distributed processing becomes essential.


Use Case: Supply Chain#

Domain Description#

Supply chain graphs model the network of suppliers, manufacturers, distributors, and logistics providers that move products from raw materials to end customers. Graph analysis enables risk assessment, optimization of logistics, supplier diversification, and end-to-end traceability.

Requirements Analysis#

Graph Model Requirements#

AspectRequirementRationale
Model TypeProperty GraphRich attributes on entities; weighted relationships
Multi-graphMultiple edge typesMaterial flow, financial flow, information flow
TemporalTime-aware edgesLead times, seasonal variations, historical performance

Key Entity Types:

  • Organizations: Suppliers, manufacturers, distributors, retailers
  • Facilities: Factories, warehouses, ports, distribution centers
  • Products: SKUs, components, raw materials, finished goods
  • Logistics: Routes, carriers, shipments
  • Contracts: Agreements, terms, pricing

Query Pattern Complexity#

Primary Patterns:

  • Shortest path: Optimal route from supplier to customer
  • Risk propagation: “If supplier X fails, what products are affected?”
  • Alternative sourcing: Finding backup suppliers for a component
  • Bottleneck detection: Identifying single points of failure
  • Cost optimization: Weighted path finding for lowest total cost

Query Characteristics:

  • Depth: 3-10 hops (raw material to finished product)
  • Weighted: Edges have cost, time, capacity attributes
  • Aggregation: Sum costs, max lead times, min capacities
  • Constraints: Capacity limits, geographic restrictions

Scale Requirements#

MetricTypical RangeHigh Scale
Entities (nodes)10K - 100K1M+
Relationships (edges)100K - 1M10M+
Query frequency10 - 100 QPS1K QPS
Path computations100 - 10K/day100K/day

Processing Mode#

  • Real-time: Disruption impact assessment (< 5s)
  • Batch: Route optimization, network redesign (hourly/daily)
  • Simulation: What-if analysis for planning

Integration Requirements#

  • ERP systems (SAP, Oracle) for order and inventory data
  • TMS (Transportation Management Systems) for logistics
  • Supplier portals for performance data
  • IoT/tracking systems for shipment visibility
  • BI tools for reporting and visualization
  • Planning tools for demand forecasting

Library Evaluation#

neo4j (Official Driver)#

Strengths:

  • Excellent weighted shortest path algorithms
  • Cypher handles multi-hop supply chain queries well
  • GDS library for network analysis (centrality, community)
  • Good visualization for supply chain mapping

Limitations:

  • Complex optimization needs external solvers
  • Limited native geospatial support
  • Large-scale simulations may need export to specialized tools

Fit Score: 8/10

python-arango#

Strengths:

  • Multi-model stores complex product/contract documents
  • Good geospatial support for logistics
  • Scales well for medium-large supply chains
  • Cost-effective for exploration

Limitations:

  • Fewer built-in graph algorithms
  • Path optimization less mature than Neo4j
  • Smaller supply chain community

Fit Score: 7/10

pyTigerGraph#

Strengths:

  • Excellent for very large global supply chains
  • GSQL handles complex path computations
  • Built-in graph analytics
  • Enterprise supply chain focus

Limitations:

  • Enterprise licensing costs
  • Steeper learning curve
  • Overkill for regional supply chains

Fit Score: 8/10 (global enterprise); 6/10 (smaller chains)

gremlinpython#

Strengths:

  • Database-agnostic
  • Standard traversal patterns
  • Works with Neptune for AWS supply chains

Limitations:

  • Verbose for weighted path queries
  • Limited optimization algorithms
  • Less intuitive for supply chain queries

Fit Score: 5/10

NetworkX#

Strengths:

  • Rich library for network optimization
  • Excellent for simulations and what-if analysis
  • Easy integration with optimization libraries (PuLP, OR-Tools)
  • Good for research and prototyping

Limitations:

  • In-memory only
  • Cannot serve production queries
  • Export/import overhead for real data

Fit Score: 6/10 (analysis); 2/10 (production)

Gaps and Workarounds#

GapImpactWorkaround
Constrained optimizationCannot express capacity constraints in queryExport to optimization solver
Multi-objective pathsTrade-off cost vs time vs risk complexPareto frontier computation offline
Temporal edgesLead times vary by season/volumeTime-parameterized edge properties
Geospatial routingDistance calculations limitedIntegrate with mapping APIs
SimulationWhat-if at scale challengingClone subgraphs, sandbox environments
Data freshnessSupply chain data from many sourcesETL pipeline, change data capture

Architecture Pattern#

[Source Systems]
   |-- ERP
   |-- TMS
   |-- Supplier portals
   |-- IoT/Tracking
        |
        v
[ETL Pipeline] -- transformation --> [Graph Database]
        |                                    |
        v                                    v
[Master Data Management]              [Query API]
                                            |
                                            v
                            [Planning/Visualization Tools]

Query Examples:

// Shortest path by lead time
MATCH path = shortestPath(
  (supplier:Supplier {id: $supplierId})-[:SHIPS_TO*]-(dc:DistributionCenter {id: $dcId})
)
RETURN path, reduce(time=0, r in relationships(path) | time + r.leadTimeDays) as totalLeadTime

// Risk propagation: What's affected if this supplier fails?
MATCH (supplier:Supplier {id: $supplierId})<-[:SOURCED_FROM*1..5]-(product:Product)
RETURN product.sku, product.name, product.criticality

// Alternative suppliers
MATCH (product:Product {sku: $sku})-[:SOURCED_FROM]->(current:Supplier)
MATCH (component)<-[:CONTAINS]-(product)
MATCH (alt:Supplier)-[:PROVIDES]->(component)
WHERE alt <> current
RETURN alt.name, count(component) as componentsAvailable

Optimization Patterns#

For complex supply chain optimization, combine graph database with optimization:

  1. Graph database: Topology storage, constraint queries
  2. Export to pandas/NumPy: Data preparation
  3. Optimization solver (OR-Tools, Gurobi): Route optimization
  4. Write back to graph: Optimal routes as relationships
# Example hybrid pattern
from neo4j import GraphDatabase
from ortools.constraint_solver import routing_enums_pb2, pywrapcp

# 1. Extract network from graph
with driver.session() as session:
    network = session.run("MATCH (a)-[r:ROUTE]->(b) RETURN ...").data()

# 2. Build optimization model
manager = pywrapcp.RoutingIndexManager(...)
routing = pywrapcp.RoutingModel(manager)

# 3. Solve
solution = routing.SolveWithParameters(search_params)

# 4. Write optimal routes back to graph
with driver.session() as session:
    session.run("CREATE (r:OptimalRoute {...})", optimal_route)

Recommendation#

Best Fit: neo4j official driver

For supply chain applications, Neo4j provides the best balance of query expressiveness, graph algorithms, and ecosystem maturity. The combination of Cypher for querying and GDS for analytics covers most supply chain needs.

Key advantages for supply chain:

  • Weighted shortest path for logistics optimization
  • Centrality algorithms for identifying critical nodes
  • Community detection for supplier clustering
  • Good visualization for supply chain mapping

Alternative: pyTigerGraph for global enterprises with very large, distributed supply chains requiring massive scale.

Complement with NetworkX/OR-Tools for complex constrained optimization that goes beyond graph traversal (e.g., vehicle routing, facility location).

S4: Strategic

Strategic Analysis Methodology: Graph Database Client Libraries#

Analysis Framework#

This strategic assessment evaluates Python client libraries for graph databases through a 5-year viability lens, focusing on sustainability, portability, and ecosystem evolution.

Evaluation Dimensions#

1. Library Sustainability Assessment#

  • Maintenance Cadence: Release frequency, bug fix responsiveness, security patches
  • Corporate vs Community: Official vendor support vs community-driven development
  • Funding Model: Venture-backed, open-source foundation, or hybrid approaches
  • Bus Factor: Number of active maintainers, knowledge distribution
  • Breaking Change Philosophy: Semantic versioning adherence, deprecation cycles

2. Ecosystem Positioning#

  • Market Share Alignment: Does the library serve a growing or declining database?
  • Standards Compliance: GQL ISO standard readiness and migration path
  • AI/ML Integration: Support for GraphRAG, knowledge graphs, vector embeddings
  • Cloud Service Compatibility: Works with managed offerings (Neptune, Cosmos DB)

3. Portability Analysis#

  • Query Language Lock-in: Cypher vs Gremlin vs proprietary languages
  • Data Model Portability: Property graph standardization, export/import capabilities
  • Abstraction Layer Options: TinkerPop compatibility, ORM/OGM availability

4. Risk Assessment#

  • Vendor Viability: Financial health, acquisition risk, licensing changes
  • Technology Obsolescence: Language evolution (async support, typing)
  • Community Fragmentation: Fork risks, competing implementations

Data Sources#

  • PyPI release history and download statistics
  • GitHub commit activity and contributor metrics
  • Corporate financial disclosures and funding announcements
  • ISO GQL standardization progress (ISO/IEC 39075:2024)
  • Market research reports (Gartner, Forrester, independent analysts)
  • Vendor roadmaps and conference announcements

Scoring Methodology#

Each library receives ratings (1-5) across:

  • Maintenance Health
  • Corporate Backing Stability
  • Breaking Change Risk (inverted: lower = better)
  • Dependency Security
  • Long-term Viability Confidence

Selection Context#

The analysis considers different use case profiles:

  • Enterprise Production: Stability, support contracts, compliance
  • Startup/Growth: Flexibility, cost efficiency, rapid iteration
  • Research/Academic: Feature richness, community, documentation
  • Multi-Database: Portability, abstraction, standards compliance

Time Horizon Considerations#

  • Short-term (1-2 years): Current maintenance, Python version support
  • Medium-term (3-5 years): GQL adoption, cloud service evolution
  • Long-term (5+ years): Standards consolidation, market concentration

Graph Database Ecosystem Evolution (2025-2030)#

Market Growth Trajectory#

The graph database market is experiencing explosive growth, with projections ranging from $8.9B to $13.7B by 2030 (22-30% CAGR depending on source). Key growth drivers:

  • AI/ML Workloads: Knowledge graphs powering RAG and agentic systems
  • Cloud-Native Adoption: 72% of 2024 deployments are cloud-based
  • Fraud Detection: 28.4% of 2025 market revenue from fraud/risk analytics
  • SME Accessibility: Fastest-growing segment at 30%+ CAGR

GQL ISO Standard Impact (ISO/IEC 39075:2024)#

Timeline and Adoption#

  • April 2024: GQL standard officially published by ISO
  • 2024-2025: openCypher evolving toward GQL compliance
  • 2025: Neo4j Cypher 25 introduces GQL-conformant features
  • 2026-2028: Expected broad vendor adoption

What This Means for Developers#

  1. Cypher Users: Smooth transition path as Cypher converges to GQL
  2. Gremlin Users: No direct GQL migration; separate language families
  3. GSQL Users: Likely continued proprietary path; TigerGraph may add GQL layer
  4. New Projects: Consider GQL-ready implementations

Standard Features#

  • 600+ pages of formal definitions
  • Comparable in scope to SQL-92
  • Pattern matching, path finding, graph mutations
  • Expected to reduce vendor lock-in over time

Query Language Standardization Landscape#

Current State (2025)#

LanguageTypeVendorsGQL Path
CypherDeclarativeNeo4j, Memgraph, AGEConverging
GremlinTraversalNeptune, Cosmos, JanusGraphSeparate family
GSQLProprietaryTigerGraphUnknown
SPARQLRDFVariousSeparate family
openCypherOpen StandardMultipleEvolving to GQL

Convergence Timeline#

  • Short-term (2025-2026): Cypher/openCypher implementations add GQL features
  • Medium-term (2027-2028): Majority of property graph databases GQL-compliant
  • Long-term (2029+): GQL becomes default query language for new databases

Multi-Model Database Convergence#

PostgreSQL Graph Capabilities#

  • Apache AGE: Graph extension bringing Cypher to PostgreSQL
  • Incubator Status: Apache Software Foundation project
  • Value Proposition: Add graph queries to existing PostgreSQL investments

MongoDB Evolution#

  • Current: Document-focused with limited graph features
  • Trend: Focus on Atlas Search, Vector Search, AI workloads
  • Graph Strategy: Not a primary focus

Market Implication#

Multi-model databases offer “good enough” graph capabilities for many use cases, potentially limiting growth of pure graph databases. However, deep graph analytics still favor specialized databases.

Cloud-Native Graph Services Growth#

Major Cloud Offerings#

ProviderServiceQuery LanguagesStatus
AWSNeptuneGremlin, openCypherActive
AzureCosmos DB (Gremlin)GremlinStable
GoogleSpanner GraphSQL + GraphGA (2024)
Neo4jAuraDBCypherGrowing

2024-2025 Developments#

  • Google Spanner Graph: Entered market with SQL-integrated graph
  • AWS Neptune + Bedrock: Graph RAG for knowledge bases
  • Neo4j Aura: New analytics and GenAI features

Trend: Managed Services Dominating#

Cloud-based deployments (72%+ share) reduce infrastructure concerns but increase vendor lock-in. Library selection should consider cloud provider compatibility.

AI/ML Integration with Graph Databases#

GraphRAG Revolution (2024+)#

Microsoft’s open-source GraphRAG (July 2024) established graph-augmented retrieval as a production pattern. Key developments:

  • Knowledge Graph Construction: LLMs extracting structured graphs from text
  • Graph + Vector Hybrid: Combining semantic search with relationship traversal
  • Agentic RAG: LLM agents using graph reasoning for multi-step workflows

Production Evidence#

  • 300-320% ROI reported for knowledge graph implementations
  • LinkedIn: 63% improvement in ticket resolution with graph-based systems
  • Finance: 50% improvement in fraud detection rates

Library Implications#

Graph database clients increasingly need:

  • Vector index support (embeddings)
  • Streaming/async for real-time processing
  • LLM framework integration (LangChain, LlamaIndex)
  • Batch import for knowledge graph construction

Predictions for 2030#

High Confidence#

  1. GQL becomes dominant property graph query language
  2. Cloud-managed graph services capture majority of new deployments
  3. GraphRAG/knowledge graph use cases drive enterprise adoption
  4. Vector + graph hybrid architectures become standard

Medium Confidence#

  1. Neo4j maintains market leadership but with reduced share
  2. TinkerPop/Gremlin remains relevant for multi-database scenarios
  3. PostgreSQL AGE captures significant “casual graph” use cases

Lower Confidence#

  1. Complete query language standardization across vendors
  2. Proprietary languages (GSQL) gaining significant share
  3. On-premise deployments returning to favor

Graph Database Python Client Library Viability Assessment#

Executive Summary#

This assessment evaluates the long-term viability of Python client libraries for major graph databases. Libraries are rated on maintenance health, corporate backing, and sustainability for production use over a 5-year horizon.


Neo4j Python Driver#

Package: neo4j (PyPI) Type: Official vendor driver Current Version: 6.0.3 (November 2025)

Maintenance Status: EXCELLENT#

  • Release Cadence: Monthly releases since 5.0
  • Recent Activity: 6.0.x series actively developed with breaking changes for modernization
  • Python Support: 3.10, 3.11, 3.12, 3.13 (dropped 3.7-3.9 in 6.0)
  • Migration Tools: Official migration assistant for codebase upgrades

Corporate Backing: STRONG#

  • Vendor: Neo4j Inc. (founded 2007)
  • Funding: $581M total raised, $2B valuation
  • Revenue: $200M+ ARR (2024), 44% market share in graph DBMS
  • Customers: 75%+ Fortune 100, including BMW, NASA, UBS
  • Business Model: Open-source core + AuraDB managed service

Breaking Change History: MODERATE RISK#

Recent 5.x to 6.x migration requires attention:

  • Error handling redesign (DriverError vs Neo4jError separation)
  • Resource management changes (explicit .close() required)
  • Package rename from neo4j-driver to neo4j
  • Element IDs changed from integers to strings (5.x)

Dependency Health: EXCELLENT#

  • Minimal dependencies, optional Rust extensions for performance
  • No security vulnerabilities detected
  • Clean dependency tree

Bus Factor Risk: LOW#

  • Large engineering team at Neo4j
  • Multiple maintainers across driver ecosystem
  • Comprehensive documentation and enterprise support

Viability Score: 9/10#

Recommendation: Primary choice for Neo4j deployments. Strong long-term investment.


Neomodel (Neo4j OGM)#

Package: neomodel (PyPI) Type: Community OGM under Neo4j Labs Current Version: 6.0.0 (2024)

Maintenance Status: GOOD (Improved)#

  • Release Cadence: Active development resumed 2023
  • 2024 Updates: Async support, mypy typing (95% coverage), vector index support
  • Python Support: 3.7+ with Neo4j 5.x and 4.4 LTS

Corporate Backing: COMMUNITY + LABS#

  • Moved to Neo4j Labs program (official recognition, community-driven)
  • Production use by Novo Nordisk (OpenStudyBuilder)
  • No dedicated corporate funding

Breaking Change History: MODERATE#

  • Major version bumps may require model adjustments
  • Configuration system overhaul in recent versions

Bus Factor Risk: MEDIUM#

  • Small maintainer team (Marius Conjeaud primary)
  • Active community but concentration of knowledge

Viability Score: 7/10#

Recommendation: Suitable for Neo4j projects needing OGM patterns. Monitor maintainer activity.


python-arango (ArangoDB)#

Package: python-arango (PyPI) Type: Official vendor driver Current Version: Latest 2024 release

Maintenance Status: GOOD#

  • Release Cadence: Healthy release activity
  • Weekly Downloads: 352,711 (popular package)
  • Python Support: 3.8+
  • Async Alternative: python-arango-async available

Corporate Backing: MODERATE (Changed)#

  • Vendor: ArangoDB GmbH (founded 2014)
  • Funding: $58.6M total raised
  • Licensing Change: Moved to BSL 1.1 for version 3.12+ (Q1 2024)
    • Still source-available for non-commercial use
    • Cannot be used for competing managed services
    • Community Edition Transition Fund available

Breaking Change History: LOW#

  • Stable API evolution
  • Good backward compatibility

Dependency Health: GOOD#

  • No security vulnerabilities detected
  • Reasonable dependency footprint

Bus Factor Risk: MEDIUM#

  • Smaller company than Neo4j
  • Dual headquarters (San Francisco/Cologne)

Viability Score: 7/10#

Recommendation: Viable for multi-model needs. Watch licensing implications for SaaS deployments.


pyTigerGraph#

Package: pyTigerGraph (PyPI) Type: Official vendor SDK Current Version: 1.8.1

Maintenance Status: ADEQUATE#

  • Release Cadence: Active but less frequent
  • Weekly Downloads: 5,614 (smaller user base)
  • Recent Features: Async support (1.8), REST endpoint refactoring (1.7)
  • Contributors: 30 open-source contributors

Corporate Backing: STRONG (Enterprise Focus)#

  • Vendor: TigerGraph (founded 2012)
  • Funding: $172-174M total raised
  • Investors: Tiger Global, AME Cloud Ventures, Baidu
  • Focus: Enterprise analytics, fraud detection, supply chain
  • Customers: Uber, VISA, Alipay, Zillow

Breaking Change History: LOW-MODERATE#

  • Version 1.7+ requires TigerGraph DB 4.1+ for new features
  • Generally stable API

Dependency Health: GOOD#

  • No security vulnerabilities detected

Bus Factor Risk: MEDIUM#

  • Enterprise focus may limit open-source investment
  • Proprietary GSQL creates ecosystem isolation

Viability Score: 6/10#

Recommendation: Best for enterprise-scale graph analytics with existing TigerGraph investment. Not recommended as a first graph database choice due to GSQL lock-in.


gremlinpython (Apache TinkerPop)#

Package: gremlinpython (PyPI) Type: Apache Foundation project Current Version: 3.8.0 (November 2025)

Maintenance Status: EXCELLENT#

  • Release Cadence: Regular releases, 4.0 beta in development
  • Governance: Apache Software Foundation PMC
  • Python Support: Modern Python versions

Corporate Backing: FOUNDATION + MULTI-VENDOR#

  • Apache Software Foundation governance since 2016
  • Supported by multiple vendors (AWS, Microsoft, DataStax)
  • PMC includes contributors from diverse organizations
  • Active community with Discord, Twitch, YouTube presence

Breaking Change History: MODERATE (4.0 Coming)#

TinkerPop 4.0 introduces significant changes:

  • Dropping WebSockets for HTTP 1.1
  • Removing Bytecode in favor of gremlin-lang scripts
  • Simplifying connection options

Dependency Health: GOOD#

  • Standard Apache project quality

Bus Factor Risk: LOW#

  • Multiple major vendors invested
  • PMC governance ensures continuity
  • Long-term Apache stewardship

Viability Score: 8/10#

Recommendation: Excellent choice for multi-database portability strategy. Works with JanusGraph, Neptune, Cosmos DB, DataStax. TinkerPop 4.0 migration planning needed.


Cloud Provider SDKs#

AWS Neptune (boto3 + gremlinpython)#

  • Gremlin and openCypher support
  • Strong backing from AWS
  • Lock-in to AWS ecosystem
  • Viability tied to AWS platform (effectively permanent)

Azure Cosmos DB (azure-cosmos + gremlinpython)#

  • Gremlin API among multiple options
  • Microsoft backing but graph capabilities seen as stagnant
  • Multi-model flexibility
  • Viability tied to Azure platform

Summary Viability Matrix#

LibraryMaintenanceBackingBreaking RiskBus FactorOverall
neo4j99799/10
neomodel76757/10
python-arango86867/10
pyTigerGraph67766/10
gremlinpython98698/10

Key Findings#

Safest Long-term Bets#

  1. neo4j: Dominant market position, strong funding, active development
  2. gremlinpython: Apache governance, multi-vendor support, portability value

Watch List#

  • python-arango: BSL licensing change may affect SaaS use cases
  • pyTigerGraph: GSQL proprietary language creates lock-in risk

Emerging Considerations#

  • All libraries adding async support (critical for modern Python)
  • Vector/embedding support becoming table stakes
  • GQL standard will reshape query language landscape

Vendor Lock-in Analysis: Graph Database Clients#

Query Language Portability Assessment#

Portability Spectrum#

Most Portable                                    Least Portable
     |                                                  |
  Gremlin -----> Cypher/GQL -----> GSQL -----> Proprietary

Gremlin (Apache TinkerPop)#

Portability Score: 9/10

  • Supported Databases: JanusGraph, Neptune, Cosmos DB, DataStax, OrientDB
  • Strengths: True multi-database abstraction, Apache governance
  • Weaknesses: Imperative style less intuitive than Cypher
  • Best For: Projects requiring database portability guarantees

Cypher / openCypher / GQL#

Portability Score: 7/10 (improving)

  • Current Support: Neo4j, Memgraph, AGE (PostgreSQL), RedisGraph (EOL)
  • GQL Future: Expected broad adoption 2026-2028
  • Strengths: Declarative, readable, standardizing via ISO GQL
  • Weaknesses: Neo4j dominance means de facto lock-in
  • Best For: Projects betting on GQL standardization

GSQL (TigerGraph)#

Portability Score: 2/10

  • Single Vendor: Only TigerGraph
  • Strengths: Turing-complete, optimized for deep analytics
  • Weaknesses: Complete vendor lock-in, no migration path
  • Best For: Enterprise analytics with long-term TigerGraph commitment

Data Model Portability#

Property Graph Model#

All major graph databases (Neo4j, TigerGraph, Neptune, JanusGraph) use property graphs, providing basic model compatibility:

  • Nodes/Vertices: Labeled entities with properties
  • Edges/Relationships: Typed connections with properties
  • Export Formats: CSV, JSON, GraphML widely supported

Migration Complexity Matrix#

FromTo Neo4jTo NeptuneTo TigerGraphTo JanusGraph
Neo4j-MediumHighMedium
NeptuneMedium-HighLow
TigerGraphHighHigh-High
JanusGraphMediumLowHigh-

Key Factors:

  • Query translation (Cypher <-> Gremlin <-> GSQL)
  • Schema and constraint differences
  • Indexing strategy variations
  • Application code rewrite requirements

Export/Import Tooling#

Available Tools:

  • Neo4j: LOAD CSV, neo4j-admin export, APOC procedures
  • Memgraph: Neo4j migration module (direct connection)
  • General: GraphML format interchange
  • Microsoft: MigrateToGraph (relational to graph)

Limitations:

  • No universal graph-to-graph migration standard
  • Query translation typically manual
  • Application logic must be rewritten

Abstraction Layer Options#

TinkerPop as Universal Layer#

What It Provides:

  • Common Gremlin query language
  • Vendor-agnostic driver interfaces
  • Standard property graph model

Databases Supported:

  • JanusGraph (native)
  • Amazon Neptune
  • Azure Cosmos DB
  • DataStax Enterprise
  • OrientDB

Databases NOT Supported:

  • Neo4j (native Cypher only, no TinkerPop)
  • TigerGraph (GSQL only)

When TinkerPop Makes Sense:

  1. Multi-cloud strategy requiring database flexibility
  2. Existing investment in Gremlin queries
  3. Need to switch between Neptune/Cosmos/JanusGraph
  4. Avoiding single-vendor dependency

When TinkerPop Doesn’t Make Sense:

  1. Neo4j-specific features required
  2. GSQL analytics capabilities needed
  3. Cypher/GQL standardization bet
  4. Simple use case not needing portability

ORM/OGM Abstraction#

Available OGMs:

  • Neomodel: Neo4j only
  • Object-Graph Mappers: Database-specific implementations

Limitation: No cross-database Python OGM exists. OGMs provide code abstraction but not database portability.

Lock-in Risk Mitigation Strategies#

Strategy 1: TinkerPop-First#

Choose Gremlin-compatible database; use gremlinpython exclusively.

Pros: Maximum portability, multi-vendor competition Cons: Excludes Neo4j, foregoes Cypher benefits Risk Level: Low lock-in, medium feature limitation

Strategy 2: GQL-Ready Cypher#

Choose Neo4j or openCypher database; prepare for GQL migration.

Pros: Best tooling (Neo4j), GQL future-proofing Cons: Near-term Cypher lock-in, GQL timeline uncertainty Risk Level: Medium lock-in, low feature limitation

Strategy 3: Abstraction Layer#

Build internal abstraction over database clients.

Pros: Control over interfaces, potential future migration Cons: Development overhead, incomplete feature coverage Risk Level: Low lock-in, high development cost

Strategy 4: Cloud Provider Lock-in Accept#

Choose Neptune or Cosmos DB; accept cloud platform dependency.

Pros: Managed service benefits, cloud ecosystem integration Cons: Full cloud vendor lock-in Risk Level: High lock-in, low operational burden

Recommendations by Use Case#

Startup/MVP#

  • Choice: Neo4j + Cypher
  • Rationale: Best developer experience, largest community, GQL path
  • Lock-in Acceptance: Medium (acceptable for velocity)

Enterprise Multi-Database#

  • Choice: TinkerPop/Gremlin
  • Rationale: Proven portability, vendor-neutral governance
  • Lock-in Acceptance: Low (portability required)

Deep Analytics#

  • Choice: TigerGraph + GSQL
  • Rationale: Best performance for complex algorithms
  • Lock-in Acceptance: High (feature-driven decision)

Cloud-Native#

  • Choice: Neptune or Cosmos DB (matching cloud provider)
  • Rationale: Operational simplicity, ecosystem integration
  • Lock-in Acceptance: High (cloud strategy dependent)

Strategic Recommendations: Graph Database Client Libraries#

5-Year Horizon Summary#

For Python projects requiring graph database capabilities over the next 5 years, the strategic landscape centers on two viable paths: Neo4j/Cypher with GQL evolution, or TinkerPop/Gremlin for multi-database portability.

Primary Recommendation: Neo4j Python Driver#

Package: neo4j When to Choose: Default choice for most new graph database projects

Rationale#

  1. Market Leadership: 44% market share, $200M+ ARR, Fortune 100 adoption
  2. Funding Stability: $581M raised, $2B valuation, path to IPO
  3. Active Development: Monthly releases, Python 3.13 support, Rust extensions
  4. GQL Alignment: Cypher converging to ISO GQL standard (smooth transition)
  5. AI/ML Integration: Best GraphRAG tooling, LangChain integration, vector support
  6. Community: Largest graph database community, extensive documentation

Risk Factors to Monitor#

  • Breaking changes in major versions (5.x to 6.x pattern)
  • Managed service (AuraDB) pricing evolution
  • GQL standardization timeline slippage

Secondary Recommendation: gremlinpython (TinkerPop)#

Package: gremlinpython When to Choose: Multi-database portability required

Rationale#

  1. True Portability: Works across Neptune, Cosmos DB, JanusGraph, DataStax
  2. Apache Governance: Foundation backing, multi-vendor PMC
  3. Cloud Flexibility: Switch between AWS/Azure/on-premise
  4. Long-term Stability: Apache projects rarely abandoned

Risk Factors to Monitor#

  • TinkerPop 4.0 migration (significant API changes)
  • No GQL convergence (separate from Cypher/GQL ecosystem)
  • Learning curve for imperative traversal patterns

Conditional Recommendations#

For OGM Requirements: neomodel#

Package: neomodel Condition: Need Python OGM patterns with Neo4j

  • Active maintenance under Neo4j Labs
  • Async support added 2024
  • Production use by major enterprises
  • Monitor maintainer activity (smaller team)

For Multi-Model Needs: python-arango#

Package: python-arango Condition: Document + graph + key-value in single database

  • BSL 1.1 licensing change (2024) limits SaaS use
  • Viable for internal applications
  • Async variant available

For Enterprise Analytics: pyTigerGraph#

Package: pyTigerGraph Condition: Deep graph algorithms, existing TigerGraph investment

  • Strong enterprise backing
  • GSQL lock-in is significant risk
  • Not recommended for new projects without specific requirements

Library Avoidance List#

  1. Deprecated packages: neo4j-driver (use neo4j instead)
  2. Abandoned projects: py2neo (deleted), unmaintained forks
  3. Proprietary-only SDKs: Unless committed to that vendor long-term

Strategic Decision Framework#

Choose Neo4j (neo4j) When:#

  • Starting a new graph database project
  • Developer experience is a priority
  • GraphRAG or knowledge graph use case
  • Willing to bet on GQL standardization
  • Single-database architecture acceptable

Choose TinkerPop (gremlinpython) When:#

  • Multi-cloud or multi-database strategy required
  • Using Neptune, Cosmos DB, or JanusGraph
  • Vendor-neutral governance is important
  • Portability outweighs developer convenience

Choose Cloud-Specific When:#

  • Already committed to AWS (Neptune) or Azure (Cosmos)
  • Managed services preferred over self-hosted
  • Cloud ecosystem integration is primary concern

5-Year Outlook Summary#

Library2025 Status2030 Projection
neo4jStrongDominant (GQL leader)
gremlinpythonStrongStable (portability)
neomodelGoodDependent on community
python-arangoGoodViable (watch license)
pyTigerGraphAdequateNiche (enterprise)

Highest Confidence Bet: Neo4j Python driver with Cypher/GQL path Best Hedge Strategy: TinkerPop for projects needing future flexibility

Published: 2026-03-06 Updated: 2026-03-06