Skip to content

Cluster & Replication

YantrikDB Server has production-style clustering (alpha — running live on homelab clusters, not yet battle-tested at scale) with:

  • CRDT-based replication — converges automatically, no merge conflicts
  • Raft-lite leader election — automatic failover in <10 seconds
  • Witness daemon — run safe HA with only 2 data nodes
  • Read-only enforcement — followers reject writes, point clients to the leader
  • Multi-database — each database replicates independently
  • Cluster master token — one token works on every node

Recommended setup: 2 voters + 1 witness.

┌──────────────────┐ heartbeats ┌──────────────────┐
│ data node 1 │ ◄───────────▶ │ data node 2 │
│ (voter) │ oplog sync │ (voter) │
│ full storage │ │ full storage │
└────────┬─────────┘ └────────┬─────────┘
│ │
│ ┌──────────────────┐ │
└────▶│ witness │◄────────┘
│ (vote-only) │
│ ~10 MB RAM │
└──────────────────┘

The witness is a tiny daemon (~3 MB binary, no disk storage) whose only job is to vote in elections. It breaks ties so 2 data nodes can run safe HA without needing a 3rd full node.

This is the same pattern as Azure SQL (witness instance), MongoDB (arbiter), Redis Sentinel, and MariaDB Galera (garbd).

Terminal window
yantrikdb cluster init \
--node-id 1 \
--output /etc/yantrikdb.toml \
--data-dir /var/lib/yantrikdb \
--peers 192.168.1.2:7440 \
--witnesses 192.168.1.3:7440

Output:

config written to /etc/yantrikdb.toml
cluster_secret: ydb_cluster_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(use this as the auth token from any client to access the default database)

Save the cluster_secret. You’ll need it on every other node and as the auth token from clients.

2. On node2, generate config with the same secret

Section titled “2. On node2, generate config with the same secret”
Terminal window
yantrikdb cluster init \
--node-id 2 \
--output /etc/yantrikdb.toml \
--data-dir /var/lib/yantrikdb \
--peers 192.168.1.1:7440 \
--witnesses 192.168.1.3:7440 \
--secret <PASTE_SECRET_FROM_NODE1>
Terminal window
yantrikdb db --data-dir /var/lib/yantrikdb create default
Terminal window
yantrikdb-witness \
--node-id 99 \
--port 7440 \
--cluster-secret <PASTE_SECRET_FROM_NODE1> \
--state-file /var/lib/yantrikdb-witness/state.json

The witness needs no database, no config file, no embedding model — just the secret and a state file.

On node1 and node2:

Terminal window
yantrikdb serve --config /etc/yantrikdb.toml

After ~5 seconds, an election runs and one voter becomes leader.

Terminal window
yql --host 192.168.1.1 -t <cluster_secret>
yantrikdb> \cluster
node #1 — Leader
term: 1
leader: 1
healthy: yes | writable: yes
quorum: 2
+---------+-------------------+---------+-----------+------+----------+
| node_id | addr | role | reachable | term | last_seen|
+---------+-------------------+---------+-----------+------+----------+
| 2 | 192.168.1.2:7440 | voter | ✓ | 1 | 0.5s ago |
| 99 | 192.168.1.3:7440 | witness | ✓ | 1 | 0.5s ago |
+---------+-------------------+---------+-----------+------+----------+

Kill the leader (Ctrl+C or systemctl stop yantrikdb).

Within 5–10 seconds:

  1. The other voter detects missed heartbeats
  2. Runs an election
  3. The witness grants its vote
  4. The follower promotes itself to leader
Terminal window
curl -s http://192.168.1.2:7438/v1/cluster | jq .role
# "Leader"

When the old leader rejoins, it sees the higher term and demotes itself to follower automatically.

FailureBehavior
Leader voter diesOther voter + witness elect new leader in <10s
Follower voter diesLeader keeps writing (still has quorum with witness)
Witness diesBoth voters keep going, no new elections allowed
Witness + follower dieLeader becomes read-only (no quorum)
Network partition isolates a voterIsolated voter loses quorum, becomes read-only
All nodes dieRestart any node — it loads persistent state, rejoins cluster

To force a specific node to become leader (e.g. for maintenance):

Terminal window
yantrikdb cluster promote --url http://192.168.1.2:7438 -t <cluster_secret>

This triggers an election from that node.

When clustering is enabled, the cluster_secret doubles as a master Bearer token that works on any node in the cluster:

Terminal window
TOKEN=ydb_cluster_xxxxxxxx...
# This works whether node1 or node2 is leader
curl http://192.168.1.1:7438/v1/stats -H "Authorization: Bearer $TOKEN"
curl http://192.168.1.2:7438/v1/stats -H "Authorization: Bearer $TOKEN"

Per-node tokens (created with yantrikdb token create) still work for fine-grained access.

Full [cluster] section:

[cluster]
node_id = 1 # unique integer per node
role = "voter" # voter | read_replica | witness | single
cluster_port = 7440 # peer-to-peer port
heartbeat_interval_ms = 1000 # leader → follower heartbeat rate
election_timeout_ms = 5000 # follower → candidate transition delay
cluster_secret = "ydb_cluster_..."
replication_mode = "async" # async (default) or sync
[[cluster.peers]]
addr = "192.168.1.2:7440"
role = "voter"
[[cluster.peers]]
addr = "192.168.1.3:7440"
role = "witness"

Under the hood, every write is recorded as an oplog entry with a hybrid logical clock (HLC) timestamp. Followers continuously pull new ops from the leader and apply them locally via the same CRDT semantics that the engine already uses.

  • Add-wins set for memories (UUIDv7 keys, no collisions)
  • LWW for graph edges (HLC tiebreaker)
  • Set-union for consolidation
  • Forget always wins (tombstones are absolute)

This means the cluster converges naturally even after network partitions — there’s no manual conflict resolution needed.

For a deeper dive, see the Raft-lite design.