From Python Fallbacks to SQL-Native: A 12-Month Journey

When we started building NeoSQLite, we took a "get it working first" approach. Complex aggregation operations like $in, $nin, $elemMatch, and $project were handled by Python fallbacks—meaning we'd fetch ALL documents from SQLite, then filter them in Python. It worked, but it was slow.

Then we started dogfooding with Neo-Bloggy (our blogging platform that runs entirely on NeoSQLite instead of MongoDB). Production usage revealed the pain points real users would face.

The SQL-Tier Revolution (v1.14.x series)

Over the last 6 releases, we systematically moved operations from Python into native SQL:

v1.14.0 — Moved $project stage to SQL-tier (no more loading full documents just to project 2 fields)

v1.14.9-10 — Fixed $elemMatch and $in/$nin on array fields. Instead of returning 0 results or unfiltered documents, they now use proper SQL CTE patterns with json_each()

v1.14.11 — Added native regex operators ($regexMatch, $regexFind) directly in SQL tier using custom SQLite functions. Array operators got 10-100x speedup with CTE patterns

v1.14.12 — Fixed the "malformed JSON" edge case (because even SQLite has its quirks with json_each() syntax!)

The NX-27017 Milestone

In v1.13.0, we shipped something unexpected—a MongoDB Wire Protocol Server that lets PyMongo connect directly to SQLite. No code changes needed. This isn't just an API clone; it's full wire protocol compatibility with 100% test parity against real MongoDB.

What This Means

  • 3x faster than MongoDB for typical operations
  • 30-300x faster for index operations (SQLite's B-trees are fast)
  • Zero network overhead — embedded database, embedded performance
  • Drop-in replacement — existing PyMongo code works unchanged

The Lesson

Building a database isn't about getting the API right. It's about getting the execution model right. Every time we pushed logic from Python down to SQL, we got closer to SQLite's raw performance while maintaining MongoDB's developer experience.

The 3x number isn't theoretical—it's measured against a real MongoDB instance in our CI pipeline, running 54 different operation categories across 10 iterations each.

Want to try it?

pip install neosqlite

Or check out the benchmark.