ChronDB is a document-oriented database built with time-travel capabilities at its core. While many databases offer point-in-time recovery, ChronDB takes this a step further by making version history a first-class feature.
The key features we implemented are:
Let’s explore these features and the Git internals that make them possible.
When we designed ChronDB, we chose Git as our storage backend for several reasons:
We created a storage layer that saves documents as JSON files in a Git repository, with each change automatically committed.
The first feature we implemented was document history retrieval. This allows users to see all versions of a document, including metadata like who made each change and when.
Our implementation leverages several key Git concepts:
LogCommand
to retrieve all commits that modified a specific document path.RevWalk
and TreeWalk
allow us to navigate through the commit history and the file tree structure within each commit.The result is a chronologically ordered list of document versions, starting with the most recent, that provides a complete audit trail of all changes.
Next, we implemented the ability to retrieve a document as it existed at a specific commit. This lets users access historical versions without changing the current state.
To accomplish this, we leverage several Git internals:
ObjectId
representation.RevWalk
, we parse the specified commit to access its tree structure.TreeWalk
, we navigate to the document’s path in that commit’s file tree.This approach allows us to effectively time-travel to any point in a document’s history and retrieve its exact state at that moment, without modifying the current version.
The most interesting feature is document restoration. Unlike traditional Git operations like git reset
or git revert
, which would modify or undo history, we wanted to maintain the full chronological history when restoring a document.
Our solution leverages Git’s commit model in a novel way:
HEAD
as its parent, maintaining the chronological commit history.This approach maintains a complete audit trail of changes, allowing users to see when a document was restored and from which version, without losing any historical information.
Let’s dive deeper into the Git internals that make these features possible:
In ChronDB, documents are stored as JSON files in a Git repository. We use a structured path convention based on document ID and table name to organize files within the repository.
To ensure proper file organization:
When a document is saved or restored, we create a Git commit to record the change. Our implementation uses a “virtual” commit process that works directly with Git’s internal structures:
CommitBuilder
, setting appropriate author and committer information, timestamp, message, and the new tree.HEAD
.This approach avoids filesystem I/O for staging and provides fine-grained control over the commit process.
To retrieve a document’s history, we leverage Git’s powerful history traversal capabilities:
Log Command Initialization: We start with JGit’s LogCommand
, configuring it to follow a specific file path.
Commit Iteration: The log command gives us an iterator over all commits that affected the specified document.
Commit Metadata Extraction: For each commit, we extract rich metadata including commit ID, author, timestamp, and message.
Content Extraction Process:
RevWalk
to parse the commit and access its treeTreeWalk
navigates to the specific document path within that treeObjectLoader
Result Assembly: All this information is combined into a comprehensive history record.
To access a document at a specific commit, we use a precise navigation process:
ObjectId
.RevWalk
to locate and parse the specified commit.TreeWalk
, we navigate to the document’s path within that commit’s tree.ObjectLoader
and deserialize it to obtain the document as it existed at that point in time.The chronological restoration process combines several Git techniques:
RefUpdate
to point to this new commit, making it the current version while preserving the complete history.This approach is fundamentally different from Git’s built-in revert
or reset
operations, as it preserves the complete chronological history while still restoring the document to a previous state.
We wrote comprehensive tests to verify our implementation behaves correctly. Our test suite verifies several key aspects:
Our tests demonstrate that our approach successfully maintains a complete audit trail of all document changes, including restorations, making ChronDB suitable for applications with strict compliance and auditing requirements.
By implementing document version history with chronological restoration in ChronDB, we’ve created a powerful way to track changes and revert to previous states without losing any history. This approach provides a complete audit trail for compliance purposes and enables advanced time-travel capabilities.
The use of Git as our storage backend has proved to be an excellent choice, providing a solid foundation for versioning with minimal custom code. The ability to push/pull changes also enables simple replication across instances.
Future enhancements could include:
We hope this deep dive into ChronDB’s implementation has been insightful. The full source code is available at github.com/moclojer/chrondb.
Here you can see the initial implementation of this feature.