Append-only cryptographic event logging with logarithmic compression. Fixed 16-byte records, Merkle-verified integrity, formally proven safety properties.
SF128-LOG is a two-part system for tamper-evident event logging:
Pure Rust, minimal dependencies, no runtime. Every critical property is formally verified in SMT2 (Z3) or TLA+.
Every event is a fixed 16-byte identifier. No variable-length fields, no allocation, no fragmentation.
Encoded form — 28-char Crockford Base32, grouped for readability:
009G-0G72-XJS5-R004-926E-NBQK-0GXG
Alphabet: 0-9 A-H J-N P-T V-Z (no I/L/O/U). Case-insensitive. O→0, I/L→1 on decode.
0x4C — constant, identifies SF128-LOG formatSHA256(secret + ":" + hostname) → last 3 bytesSF128-LOG does not compress data in the traditional sense. Merkle roots are one-way digests — you cannot reconstruct events from a root hash. Instead, the system offers three storage tiers with different trade-offs between space and recoverability.
Keep every event in its original segment file. This is the only tier where you can enumerate, search, and replay the complete event stream. Everything else is derived from this.
| Format | 1M events | 1 year (1M/day) | Recoverable? |
|---|---|---|---|
| Binary (16 B/event) | 16 MB | 5.8 GB | 100% |
| Text log (~200 B/line) | ~200 MB | ~73 GB | 100% |
This is your source of truth. The binary format is already 12.5× smaller than text logs — that alone is significant.
After sealing a segment, you can delete the raw segment and keep only the manifest: the Merkle root, the anchor record, and inclusion proofs for events you care about. You lose the ability to enumerate all events, but you retain:
| What you keep | Size per epoch | 1 year | vs. full |
|---|---|---|---|
| Anchor record only | 69 B | ~25 KB | Chain integrity, no event detail |
| + 1,000 inclusion proofs | ~660 KB | ~240 MB | Prove specific events, not all |
| + all inclusion proofs | ~660 MB | ~240 GB | Prove any event, but can't list them |
This is the "keep the receipts, shred the paperwork" tier. Useful for long-term audit trails where you only need to prove specific claims, not replay the full history.
The most compact form: just the anchor chain. Each link is 69 bytes. You can verify that someone else's proof is valid against your chain, but you hold no event data yourself.
| Duration | Anchor chain size | What you can do |
|---|---|---|
| 1 day | 69 B | Verify proofs presented to you |
| 1 year | ~25 KB | Verify proofs, detect chain tampering |
| 10 years | ~250 KB | Long-term trust anchor |
This is what replication actually sends between nodes. Nodes don't trust each other's data — they verify proofs against their own chain copy.
The honest summary: the 16-byte binary record is the real space saving (12.5× over text logs). The Merkle layer doesn't compress data — it lets you choose what to forget while retaining cryptographic proof of what you kept.
The 128-bit format and Merkle infrastructure aren't limited to log storage. The fixed-size, self-describing record opens up several patterns:
Share proof that an event happened without revealing what else happened around it. An inclusion proof ties one leaf to a root — the verifier learns nothing about the other leaves. Useful for compliance ("we logged this action on this date") without exposing your full audit trail.
Two organizations can each maintain their own anchor chains. When they interact, they exchange inclusion proofs referencing their respective roots. Neither party needs access to the other's event stream. A shared event is anchored in both chains independently — if either party later tampers with their chain, the other's proof still holds.
Different data has different lifespans. With sealed manifests, you can implement retention tiers naturally:
The transition between tiers is a one-way operation: seal, extract proofs for important events, delete the segment. The anchor chain guarantees you can't silently remove an epoch later.
At 16 bytes per event, the format fits in constrained environments. A sensor can emit SF128-LOG records over a serial link or UDP. The receiving gateway batches them into segments and seals them. The sensor itself needs no storage beyond a single CRC8 lookup table and a clock.
The domain, stream, and source fields encode enough routing metadata to partition events by geography, application, or organizational unit — all within the 16-byte envelope. The --georoute-prefix flag in sf128logseal tags sealed manifests for regional routing during replication.
The daemon (sf128logd) sits behind rsyslog as a transform layer. Every syslog line gets an SF128 event ID and a SHA-256 content hash. If someone later edits a log line, the hash won't match. If someone deletes a line, the Merkle tree verification fails. If someone deletes an entire segment, the anchor chain has a gap. Each layer catches a different class of tampering.
Events within an epoch (time window or segment) are collected as leaves, sorted, then hashed into a binary Merkle tree using SHA-256 with domain separation:
// Leaf hash
leaf_hash = SHA256(0x00 || leaf_bytes)
// Internal node
node_hash = SHA256(0x01 || left_child || right_child)
// Leaves are sorted before tree construction
// → deterministic root regardless of insertion order
Epoch roots are chained together, forming a hash-linked append-only log (similar to a blockchain but without consensus overhead):
// Anchor record (69 bytes)
struct AnchorRecordV1 {
version: u8, // 1 byte
root_hash: [u8; 32], // Merkle root of this epoch
leaf_count: u32, // Number of events
prev_anchor: [u8; 32], // Hash of previous anchor (zeros if first)
anchor_hash: [u8; 32], // SHA256(version || root || count || prev)
}
// Verification
assert_eq!(
SHA256(version || root_hash || leaf_count || prev_anchor),
anchor_hash
);
To prove an event exists without revealing the full dataset, provide a Merkle path — a sequence of (direction, sibling_hash) pairs from leaf to root:
// Proof: log2(N) steps. For 1M events = 20 steps × 33 bytes = 660 bytes
struct ProofStep {
direction: u8, // 0 = sibling left, 1 = sibling right
sibling_hash: [u8; 32], // Hash of sibling node
}
# Basic: TCP listener, file output
sf128logd --listen 127.0.0.1:55128 --output /var/log/sf128.log
# Production: Unix socket, batching, segment rotation
sf128logd \
--unix-socket /run/sf128logd.sock \
--segment-dir /var/log/sf128/segments \
--segment-max-lines 1000000 \
--segment-max-seconds 60 \
--batch-size 256 \
--batch-time-ms 10 \
--fsync-per-batch
# Deterministic node ID derived from shared secret + hostname
export SF128_NODE_SECRET="cluster-secret-here"
export SF128_NODE_TAG="$(hostname)"
# Derivation: SHA256(secret:tag) → last 3 bytes → 24-bit source ID
# /etc/rsyslog.d/sf128.conf
module(load="omfwd")
*.* action(
type="omfwd"
target="127.0.0.1"
port="55128"
protocol="tcp"
template="RSYSLOG_SyslogProtocol23Format"
)
# Build Merkle tree over a completed segment
sf128logseal seal /var/log/sf128/segments/sf128log-*.log \
--node-tag $(hostname) \
--type-descriptor 0x0001 \
--georoute-prefix 0x000001 \
--root-txt
# Output: .manifest.json with root hash + inclusion proofs
# Verify segment against its manifest
sf128logctl verify-segment \
/var/log/sf128/segments/sf128log-START-END-node.log \
/var/log/sf128/segments/sf128log-START-END-node.manifest.json
# Verify anchor chain continuity
sf128logctl verify-epoch /var/log/sf128/segments/epoch-*.json
# /etc/systemd/system/sf128logd.service
[Unit]
Description=SF128 Log Daemon
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/sf128logd \
--unix-socket /run/sf128logd.sock \
--segment-dir /var/log/sf128/segments \
--segment-max-lines 1000000
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
Nodes exchange Merkle roots and anchor records via gossip protocol. Full event data stays local — only the cryptographic summaries replicate.
Bandwidth per sync: ~69 bytes per epoch (anchor record). Compare to replicating the full 16 MB segment.
Policy evaluation controls what data observers can see, without revealing the full event stream:
enum PolicyDecision {
Allow, // Full access to record
Deny, // Record is invisible
AllowWithRedactions, // Partial: some fields masked
}
Formally verified properties (TLA+):
Proves that any single-bit flip in a leaf, sibling hash, or root hash causes proof verification to fail. Models an 8-bit toy hash and exhaustively checks all bit positions.
All committed records form a valid chain. Peers only commit records whose predecessors are already committed. Prevents Byzantine ordering attacks.
An observer cannot infer the underlying policy from the view they receive. The system is information-theoretically sound: redacted views leak nothing about the policy that produced them.
Deliberately minimal. No runtime, no allocator, no framework.
# sf128-logt (library)
sha2 = "0.10" # SHA-256
chrono = "0.4" # Timestamps
serde = "1.0" # Serialization
clap = "4.5" # CLI
anyhow = "1.0" # Errors
# sf128logd (daemon) — same plus:
rand = "0.8" # Node ID derivation
# Dev only:
proptest = "1.4" # Property-based testing
criterion = "0.5" # Benchmarks