SF128-LOG — Append-Only Cryptographic Event Logging

What It Is

SF128-LOG is a two-part system for tamper-evident event logging:

sf128-logtLibrary — core types, Merkle trees, anchors, proofs, policies, replication
sf128logdDaemon — syslog transformer, batching, segment rotation, Unix socket
sf128logsealSealer — builds Merkle tree over segment, writes manifest with inclusion proofs
sf128logctlVerifier — validates segments, manifests, and anchor chains
sf128logstressHarness — end-to-end integrity testing under load

Pure Rust, minimal dependencies, no runtime. Every critical property is formally verified in SMT2 (Z3) or TLA+.

The 128-Bit Record

Every event is a fixed 16-byte identifier. No variable-length fields, no allocation, no fragmentation.

Magic8 bits

Ver8 bits

Domain8 bits

Timestamp (MJD µs)40 bits

Stream24 bits

Source24 bits

Type8 bits

CRC88 bits

Encoded form — 28-char Crockford Base32, grouped for readability:

009G-0G72-XJS5-R004-926E-NBQK-0GXG

Alphabet: 0-9 A-H J-N P-T V-Z (no I/L/O/U). Case-insensitive. O→0, I/L→1 on decode.

Field Details

Magic 0x4C — constant, identifies SF128-LOG format
Timestamp — 40-bit Modified Julian Day microseconds. ~35,000-year range at µs precision
Stream — 24-bit logical partition (16M streams per domain)
Source — 24-bit node ID, derived: SHA256(secret + ":" + hostname) → last 3 bytes
CRC8 — SAE J1850 over first 15 bytes. Detects single-byte corruption

Storage: Three Tiers

SF128-LOG does not compress data in the traditional sense. Merkle roots are one-way digests — you cannot reconstruct events from a root hash. Instead, the system offers three storage tiers with different trade-offs between space and recoverability.

Tier 1: Full Segments (Lossless)

Keep every event in its original segment file. This is the only tier where you can enumerate, search, and replay the complete event stream. Everything else is derived from this.

Format	1M events	1 year (1M/day)	Recoverable?
Binary (16 B/event)	16 MB	5.8 GB	100%
Text log (~200 B/line)	~200 MB	~73 GB	100%

This is your source of truth. The binary format is already 12.5× smaller than text logs — that alone is significant.

Tier 2: Sealed Manifests (Lossy, Keeps the Important Stuff)

After sealing a segment, you can delete the raw segment and keep only the manifest: the Merkle root, the anchor record, and inclusion proofs for events you care about. You lose the ability to enumerate all events, but you retain:

Proof that specific events existed — verifiable against the root
Chain integrity — anchor links prove no epochs were removed or reordered
Event count per epoch — you know how many events occurred, just not all of them

What you keep	Size per epoch	1 year	vs. full
Anchor record only	69 B	~25 KB	Chain integrity, no event detail
+ 1,000 inclusion proofs	~660 KB	~240 MB	Prove specific events, not all
+ all inclusion proofs	~660 MB	~240 GB	Prove any event, but can't list them

This is the "keep the receipts, shred the paperwork" tier. Useful for long-term audit trails where you only need to prove specific claims, not replay the full history.

Tier 3: Roots Only (Verification Metadata)

The most compact form: just the anchor chain. Each link is 69 bytes. You can verify that someone else's proof is valid against your chain, but you hold no event data yourself.

Duration	Anchor chain size	What you can do
1 day	69 B	Verify proofs presented to you
1 year	~25 KB	Verify proofs, detect chain tampering
10 years	~250 KB	Long-term trust anchor

This is what replication actually sends between nodes. Nodes don't trust each other's data — they verify proofs against their own chain copy.

Visual: What You Keep vs. What You Can Prove

Full segments

5.8 GB — everything

Manifests

240 MB — prove selected events

Anchors only

25 KB — verify others' proofs

The honest summary: the 16-byte binary record is the real space saving (12.5× over text logs). The Merkle layer doesn't compress data — it lets you choose what to forget while retaining cryptographic proof of what you kept.

What Else Can You Do With This

The 128-bit format and Merkle infrastructure aren't limited to log storage. The fixed-size, self-describing record opens up several patterns:

Selective Disclosure

Share proof that an event happened without revealing what else happened around it. An inclusion proof ties one leaf to a root — the verifier learns nothing about the other leaves. Useful for compliance ("we logged this action on this date") without exposing your full audit trail.

Cross-Organization Verification

Two organizations can each maintain their own anchor chains. When they interact, they exchange inclusion proofs referencing their respective roots. Neither party needs access to the other's event stream. A shared event is anchored in both chains independently — if either party later tampers with their chain, the other's proof still holds.

Tiered Retention Policies

Different data has different lifespans. With sealed manifests, you can implement retention tiers naturally:

Hot (0–30 days): full segments, searchable, replayable
Warm (30 days–1 year): sealed manifests with proofs for flagged events
Cold (1–10 years): anchor chain only — 25 KB/year

The transition between tiers is a one-way operation: seal, extract proofs for important events, delete the segment. The anchor chain guarantees you can't silently remove an epoch later.

Embedded & IoT Event Logging

At 16 bytes per event, the format fits in constrained environments. A sensor can emit SF128-LOG records over a serial link or UDP. The receiving gateway batches them into segments and seals them. The sensor itself needs no storage beyond a single CRC8 lookup table and a clock.

Georouted Event Streams

The domain, stream, and source fields encode enough routing metadata to partition events by geography, application, or organizational unit — all within the 16-byte envelope. The --georoute-prefix flag in sf128logseal tags sealed manifests for regional routing during replication.

Tamper-Evident Syslog

The daemon (sf128logd) sits behind rsyslog as a transform layer. Every syslog line gets an SF128 event ID and a SHA-256 content hash. If someone later edits a log line, the hash won't match. If someone deletes a line, the Merkle tree verification fails. If someone deletes an entire segment, the anchor chain has a gap. Each layer catches a different class of tampering.

Trust Architecture

Merkle Tree

Events within an epoch (time window or segment) are collected as leaves, sorted, then hashed into a binary Merkle tree using SHA-256 with domain separation:

// Leaf hash
leaf_hash = SHA256(0x00 || leaf_bytes)

// Internal node
node_hash = SHA256(0x01 || left_child || right_child)

// Leaves are sorted before tree construction
// → deterministic root regardless of insertion order

Anchor Chain

Epoch roots are chained together, forming a hash-linked append-only log (similar to a blockchain but without consensus overhead):

root₀

1M leaves

prev: ∅

→

root₁

1M leaves

prev: H(anchor₀)

→

root₂

1M leaves

prev: H(anchor₁)

→

root_n

…

prev: H(anchor_n-1)

// Anchor record (69 bytes)
struct AnchorRecordV1 {
    version:       u8,       // 1 byte
    root_hash:     [u8; 32], // Merkle root of this epoch
    leaf_count:    u32,      // Number of events
    prev_anchor:   [u8; 32], // Hash of previous anchor (zeros if first)
    anchor_hash:   [u8; 32], // SHA256(version || root || count || prev)
}

// Verification
assert_eq!(
    SHA256(version || root_hash || leaf_count || prev_anchor),
    anchor_hash
);

Inclusion Proof

To prove an event exists without revealing the full dataset, provide a Merkle path — a sequence of (direction, sibling_hash) pairs from leaf to root:

// Proof: log2(N) steps. For 1M events = 20 steps × 33 bytes = 660 bytes
struct ProofStep {
    direction:    u8,       // 0 = sibling left, 1 = sibling right
    sibling_hash: [u8; 32], // Hash of sibling node
}

Implementation Guide

1. Run the Daemon

# Basic: TCP listener, file output
sf128logd --listen 127.0.0.1:55128 --output /var/log/sf128.log

# Production: Unix socket, batching, segment rotation
sf128logd \
  --unix-socket /run/sf128logd.sock \
  --segment-dir /var/log/sf128/segments \
  --segment-max-lines 1000000 \
  --segment-max-seconds 60 \
  --batch-size 256 \
  --batch-time-ms 10 \
  --fsync-per-batch

2. Configure Node Identity

# Deterministic node ID derived from shared secret + hostname
export SF128_NODE_SECRET="cluster-secret-here"
export SF128_NODE_TAG="$(hostname)"

# Derivation: SHA256(secret:tag) → last 3 bytes → 24-bit source ID

3. Feed Events via Rsyslog

# /etc/rsyslog.d/sf128.conf
module(load="omfwd")
*.* action(
  type="omfwd"
  target="127.0.0.1"
  port="55128"
  protocol="tcp"
  template="RSYSLOG_SyslogProtocol23Format"
)

4. Seal Segments

# Build Merkle tree over a completed segment
sf128logseal seal /var/log/sf128/segments/sf128log-*.log \
  --node-tag $(hostname) \
  --type-descriptor 0x0001 \
  --georoute-prefix 0x000001 \
  --root-txt

# Output: .manifest.json with root hash + inclusion proofs

5. Verify Integrity

# Verify segment against its manifest
sf128logctl verify-segment \
  /var/log/sf128/segments/sf128log-START-END-node.log \
  /var/log/sf128/segments/sf128log-START-END-node.manifest.json

# Verify anchor chain continuity
sf128logctl verify-epoch /var/log/sf128/segments/epoch-*.json

6. Systemd Service

# /etc/systemd/system/sf128logd.service
[Unit]
Description=SF128 Log Daemon
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/sf128logd \
  --unix-socket /run/sf128logd.sock \
  --segment-dir /var/log/sf128/segments \
  --segment-max-lines 1000000
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Replication

Nodes exchange Merkle roots and anchor records via gossip protocol. Full event data stays local — only the cryptographic summaries replicate.

Each node seals its local segment → Merkle root + anchor
Gossip scheduler (LCG64 + Fisher-Yates) picks peers to contact
Nodes exchange anchor records (69 bytes each)
Recipient verifies chain continuity before committing
On-demand: request inclusion proofs for specific events

Bandwidth per sync: ~69 bytes per epoch (anchor record). Compare to replicating the full 16 MB segment.

Access Policies

Policy evaluation controls what data observers can see, without revealing the full event stream:

enum PolicyDecision {
    Allow,                // Full access to record
    Deny,                 // Record is invisible
    AllowWithRedactions,  // Partial: some fields masked
}

Formally verified properties (TLA+):

Non-interference — the decision determines the view, not vice versa
Deterministic — same policy + subject + context always yields the same decision

Formal Verification

SMT2 / Z3: Merkle Soundness

Proves that any single-bit flip in a leaf, sibling hash, or root hash causes proof verification to fail. Models an 8-bit toy hash and exhaustively checks all bit positions.

TLA+: Replication Safety

All committed records form a valid chain. Peers only commit records whose predecessors are already committed. Prevents Byzantine ordering attacks.

TLA+: Policy Non-Interference

An observer cannot infer the underlying policy from the view they receive. The system is information-theoretically sound: redacted views leak nothing about the policy that produced them.

Dependencies

Deliberately minimal. No runtime, no allocator, no framework.

# sf128-logt (library)
sha2 = "0.10"     # SHA-256
chrono = "0.4"    # Timestamps
serde = "1.0"     # Serialization
clap = "4.5"      # CLI
anyhow = "1.0"    # Errors

# sf128logd (daemon) — same plus:
rand = "0.8"      # Node ID derivation

# Dev only:
proptest = "1.4"  # Property-based testing
criterion = "0.5" # Benchmarks