# Duniter Node Architecture & Substrate Storage Key Derivation **Status:** Verified working — 2026-06-12 **Context:** cry01 Value Layer, Ğ1 balance lookup feature This document records findings from implementing and debugging the Ğ1 balance lookup feature in `cry01`. These were established through direct experimentation against the live Ğ1 mainnet via the orchestrator's Duniter nodes, and are not assumptions — every claim below was verified against real RPC responses. --- ## 1. Substrate Storage Key Derivation To read any value from a Substrate chain's state (e.g. an account's balance), you construct a storage key and call `state_getStorage`. For a `StorageMap` like `System.Account`, the key is: ``` storage_key = Twox128(PalletName) . Twox128(StorageItemName) . Hasher(map_key) ``` For `System.Account(account_id)`: ``` storage_key = Twox128("System") . Twox128("Account") . Blake2_128Concat(account_id) = Twox128("System") . Twox128("Account") . Blake2b_128(account_id) . account_id ``` ### 1.1 Twox128 — THE CRITICAL GOTCHA **Substrate's "Twox128" is NOT the same algorithm as the generic "xxHash128" (xxh128) that PHP's `hash()` function natively supports.** They produce different 16-byte outputs for the same input, despite the similar name and identical output size. This distinction cost most of a debugging session and must not be re-litigated. **Correct Twox128 construction:** ``` Twox128(data) = reverse(xxh64(data, seed=0)) . reverse(xxh64(data, seed=1)) ``` That is: two separate 64-bit xxHash digests (seeds 0 and 1), each **byte-reversed**, then concatenated to form 16 bytes. **PHP implementation** (verified correct, PHP 8.1+): ```php function cry01_twox128($data) { $h0 = strrev(hash('xxh64', $data, true, ['seed' => 0])); $h1 = strrev(hash('xxh64', $data, true, ['seed' => 1])); return $h0 . $h1; } ``` **Verification:** `Twox128("System") = 26aa394eea5630e07c48ae0c9558cef7` and `Twox128("Account") = b99d880ec681799c0cf30e8886371da9` — these match the canonical `System::Account` storage prefix published throughout Substrate/ Polkadot documentation. This is strong independent confirmation: any Substrate-based chain explorer or tool will recognize this prefix. **What does NOT work:** `hash('xxh128', $data, true)`. This is a different, single-pass 128-bit xxHash variant. It passes the generic xxh128 test vectors (e.g. `hash('xxh128', 'php.watch')` = `16c27099bd855aff3b3efe27980515ad`, which IS correct for plain xxh128) — but plain xxh128 is simply the wrong algorithm for Substrate storage prefixes. A test vector passing for "xxh128 in general" tells you nothing about whether it's the right primitive for "Substrate's Twox128" — these are unrelated facts that happen to share a name fragment. ### 1.2 Blake2_128Concat — confirmed correct `Blake2_128Concat(key) = Blake2b_128(key) . key` — i.e. the Blake2b-128 hash of the key, followed by the raw key bytes appended (not replaced). Blake2b-128 is **RFC 7693 parameterized** output (the output length is part of the hash's parameter block, NOT a truncation of Blake2b-512). PHP's `hash()` function does **not** support `blake2b` as an algorithm on this PHP 8.2.31 build at all (`hash_algos()` does not list `blake2b` or `blake2b512`). We vendor `deemru/Blake2b` (pure PHP, MIT license, `hubzilla/addon/cry01/vendor/Blake2b.php`) for this. Verified test vectors (RFC 7693, cross-checked via Python `hashlib.blake2b`): ``` Blake2b-128("") = cae66941d9efbd404e4d88758ea67670 Blake2b-128("abc") = cf4ab791c62b8d2b2109c90275287816 ``` ### 1.3 Full worked example For account `g1LvTpYXkKEASMiBYLp8RQmSN5kZyXtoHX8XE2FqQ9hDjqp5B`: ``` account_id (32 bytes) = 55f2d285cf400d2da003d43fe0ccd5207b6f08780bfdd62999e00d14dd731938 storage_key = 0x26aa394eea5630e07c48ae0c9558cef7 b99d880ec681799c0cf30e8886371da9 b157780e8874e1d5aeee0f3620cf7f76 55f2d285cf400d2da003d43fe0ccd5207b6f08780bfdd62999e00d14dd731938 ``` `state_getStorage` on this key returns the SCALE-encoded `AccountInfo` struct (see §3). --- ## 2. SS58 Address Decoding (Ğ1 addresses) Ğ1 addresses (e.g. `g1LvTpYXkKEASMiBYLp8RQmSN5kZyXtoHX8XE2FqQ9hDjqp5B`) are SS58-encoded. **The leading "g1" is NOT a literal prefix string** — it is simply the first two characters of the base58 encoding, which happen to spell "g1" coincidentally. The actual network identifier is encoded in bytes. **Confirmed format for Ğ1 (verified against a real address with valid checksum):** - Base58-decode the full address string → **36 bytes total** - Byte layout: `2-byte network prefix (0x5891) + 32-byte account ID + 2-byte checksum` - Checksum = first 2 bytes of `Blake2b-512("SS58PRE" + prefix + account_id)` This is the 14-bit extended SS58 prefix format (prefixes ≥ 64 use 2 bytes; Ğ1's prefix `0x5891` decodes to network ID 4129... — the exact numeric value wasn't computed, only the raw 2-byte form was needed and verified). **Implementation:** `cry01_ss58_decode()` in `cry01_chain.php`. Generic base58 decode is `cry01_base58_decode()` — pure PHP, byte-array accumulator, no bcmath/gmp dependency, handles arbitrary-length input. **Caveat:** other Substrate chains/older Duniter v1 addresses may decode to a different total length (e.g. 32 bytes with no checksum at all — this was observed for an old Cesium v1-era address during testing, and correctly rejected by `cry01_ss58_decode()` as "unexpected decoded length"). The 36-byte / 2-prefix-byte format is specific to (at least) Ğ1 v2 addresses as currently generated. --- ## 3. AccountInfo Decoding `state_getStorage` on a `System.Account` key returns a SCALE-encoded `AccountInfo` struct: ``` nonce: u32 (4 bytes) consumers: u32 (4 bytes) providers: u32 (4 bytes) sufficients: u32 (4 bytes) data.free: u128 (16 bytes) <- the spendable balance data.reserved: u128 (16 bytes) data.frozen: u128 (16 bytes) data.flags: u128 (16 bytes) ``` All fields are little-endian, concatenated with no padding/separators (total 80 bytes when all fields present, though trailing zero fields may be omitted/truncated in the raw response — always check actual length). `free` is at byte offset 16, length 16 (u128, little-endian). Duniter v2 uses **centimes** (1 Ğ1 = 100 units) as the smallest unit, same as Duniter v1. **u128 arithmetic without bcmath/gmp:** `cry01_le_bytes_to_decimal_string()` implements little-endian byte → base-10 string conversion using only string-based big-integer add/multiply (`cry01_decimal_string_add()`, `cry01_decimal_string_multiply()`). No PHP extensions required. **Verified result:** account with 1 Ğ1 → `free` raw value `100` → formatted as `1.00 Ğ1`. --- ## 4. Node Architecture: Light vs. Full ### 4.1 Light mirror node (`duniter-mirror.service`, pre-existing) - `--state-pruning 256` (default-ish), no explicit `--sync` flag - Disk usage: ~2GB at block ~1.39M - **Can serve `state_getStorage` for CURRENT state** (verified — this works fine for balance lookups) - Cannot serve state for blocks older than the pruning window (~256 blocks, roughly 25 minutes of history at 6s block time) - RPC originally bound to `127.0.0.1:9944` and `[::1]:9944` only (loopback) — **not reachable from the Hubzilla node over Wireguard** until fixed (see §5) ### 4.2 Full-state node (`duniter-full.service`, new tonight) - `--sync fast --state-pruning 256` - "fast" sync: downloads blocks without executing them, downloads latest state with proofs — much faster than `full` sync (full block execution from genesis) - Disk usage: **under 5GB** after sync to chain head (~1.39M blocks) — significantly smaller than initially estimated; the 32GB volume resize done tonight was generously oversized - Sync time from genesis to chain head: **roughly 10-15 minutes** at ~1500-2500 blocks/sec, ~600-900 KiB/s - Same current-state query capability as the light node — **for the balance lookup use case, this node was not strictly necessary**; the Twox128 fix alone would have made the light node work too (confirmed by testing the corrected storage key against both nodes — identical correct result) ### 4.3 What NEITHER node provides: full transaction history Both nodes above use `--state-pruning 256` — only recent state is retrievable. **Neither supports querying historical balances at arbitrary past blocks, nor provides transaction history.** For the planned future feature (paste an address, see full transaction history), this requires either: - `--state-pruning archive` (keep state for every historical block — significantly larger disk footprint, not yet measured) - A separate indexer (e.g. Subsquid/Squid, mentioned in Duniter's own docs for "public RPC" setups) that processes blocks and stores an indexed transaction database — likely the more practical path for a transaction-history UI, since raw archive-node state queries don't give you "all transactions for address X" without scanning every block This is future work, scoped separately. ### 4.4 Smith / validator node — explicitly out of scope here A Smith (validator) node requires session keys, `rotateKeys`, and on-chain Smith certification within the Ğ1 web of trust. This is a substantially larger, separate project (new Proxmox container, 786GB available on `/var/lib/vz` on `proxmox1`) and was **not** undertaken tonight. The `duniter-full` instance described in §4.2 is a plain full node, not a validator. --- ## 5. systemd Configuration Changes ### 5.1 `duniter-mirror.service` — RPC bind fix **Problem:** RPC server only listened on `127.0.0.1:9944` / `[::1]:9944` — the Hubzilla node (on the Wireguard network, 10.0.0.x) could not reach it (`Connection refused`). **Fix:** drop-in override at `/etc/systemd/system/duniter-mirror.service.d/override.conf`: ```ini [Service] ExecStart= ExecStart=/usr/bin/duniter --chain ${DUNITER_CHAIN_NAME} --name ${DUNITER_NODE_NAME}_mirror --listen-addr ${DUNITER_LISTEN_ADDR} --state-pruning ${DUNITER_PRUNING_PROFILE} --base-path ${BASE_PATH} --experimental-rpc-endpoint "listen-addr=127.0.0.1:9944,methods=safe" --experimental-rpc-endpoint "listen-addr=10.0.0.105:9944,methods=safe" ``` **Important gotchas encountered:** - `--experimental-rpc-endpoint` and the legacy `--rpc-cors` flag are **mutually exclusive** — using both is a hard error (`the argument '--rpc-cors ' cannot be used with '--experimental-rpc-endpoint ...'`) - The `cors=` sub-option of `--experimental-rpc-endpoint` expects `key=value` pairs separated by commas — passing a comma-separated list of CORS origins as `cors=http://a,http://b,...` breaks the parser (each origin gets misinterpreted as a separate `key=value` attempt). **We omitted `cors=` entirely** — not needed for server-to-server JSON-RPC (no browser involved). - `--experimental-rpc-endpoint` **replaces** the legacy RPC config wholesale — including the default localhost binding. The Oracle (`ORACLE_RPC_URL=ws://127.0.0.1:9944`) depends on a localhost endpoint, so **two** `--experimental-rpc-endpoint` flags are needed: one for `127.0.0.1` (Oracle) and one for `10.0.0.105` (Wireguard/Hubzilla access). - `methods=safe` restricts to read-only RPC methods — appropriate for both endpoints here, since neither the Oracle nor cry01 need to submit transactions through these nodes. **Result confirmed:** ``` Running JSON-RPC server: addr=127.0.0.1:9944,10.0.0.105:9944 ``` ### 5.2 `duniter-full.service` — new unit New standalone systemd unit at `/etc/systemd/system/duniter-full.service`: ```ini [Unit] Description=Duniter full-state node. After=network.target [Service] Type=simple User=duniter Group=duniter ExecStart=/usr/bin/duniter --chain g1 --name CivicInfrastructure-G1-Full_full --listen-addr /ip4/0.0.0.0/tcp/30334/ws --sync fast --state-pruning 256 --base-path /home/duniter/.local/share/duniter-full --experimental-rpc-endpoint "listen-addr=127.0.0.1:9945,methods=safe" --experimental-rpc-endpoint "listen-addr=10.0.0.105:9945,methods=safe" Restart=always RestartSec=10 [Install] WantedBy=multi-user.target EOF ``` **Gotcha:** an IPv6 `--listen-addr /ip6/[::]/tcp/30334/ws` was attempted first and failed with `multiaddr parsing error: invalid IPv6 address syntax` — the shell's bracket-glob handling of `[::]` mangled the argument before it reached the binary, even when quoted in the heredoc (the unit file itself stores it correctly, but constructing/testing such strings interactively via shell is error-prone). **Omitted IPv6 listen-addr entirely** — the existing `duniter-mirror` unit does the same (IPv4-only `/ip4/0.0.0.0/tcp/30333/ws`), so this is consistent with existing practice, not a regression. **Data directory:** `/home/duniter/.local/share/duniter-full`, owned by `duniter:duniter`, created fresh (separate from the mirror's `/home/duniter/.local/share/duniter`). **Disk:** orchestrator's root filesystem (`/dev/loop4`) was resized from ~8GB to 32GB ahead of this to provide headroom. Actual usage after full sync: under 5GB — the resize was generous relative to actual need, but a 32GB volume with ~27GB free leaves comfortable room for future growth (state trie grows over time as the chain progresses and more accounts/identities are created). --- ## 6. cry01 Configuration `hubzilla/addon/cry01/config.json` (host-only, not in repo): ```json "g1_rpc_endpoint": "http://10.0.0.105:9945" ``` Currently points at the new full node (port 9945). Per §4.2, the light node (port 9944) would also work for balance lookups now that the Twox128 fix is in place — both were verified to return identical correct results for the test account. The choice of which to point at is not load-bearing for correctness; it is an operational/redundancy decision left open for now. --- ## 7. Tools Used for Diagnosis - **scalecodec** (Python, `pip install scalecodec`) — decodes `state_getMetadata` output to enumerate pallets/storage items and confirm hasher types. Installed in the orchestrator's existing venv at `/srv/civic-orchestrator/venv`. - **xxhash** (Python, `pip install xxhash`) — used to independently compute and cross-check Twox128/xxh64 values against the PHP implementation. - Both are isolated to the orchestrator's Python venv — not installed on the Hubzilla node. --- ## 8. Summary of Verified Facts (quick reference) | Claim | Status | |---|---| | Twox128 ≠ xxh128; Twox128 = reverse(xxh64(d,0)) + reverse(xxh64(d,1)) | ✅ Verified against live chain | | Blake2_128Concat = Blake2b-128(key) + key, Blake2b-128 is parameterized (not truncated) | ✅ Verified against RFC 7693 vectors | | Ğ1 addresses: 36-byte SS58, 2-byte prefix (0x5891), Blake2b-512/SS58PRE checksum | ✅ Verified, checksum matched | | AccountInfo.free at offset 16, 16 bytes LE, divide by 100 for Ğ1 | ✅ Verified: 1 Ğ1 account → correct result | | Light node (header-sync) can serve current-state `state_getStorage` | ✅ Verified — works identically to full node for current balances | | Light/full node disk usage at block ~1.39M | Light: ~2GB. Full (fast sync): <5GB | | Full sync (fast mode) time from genesis | ~10-15 minutes | | Neither node supports historical/archive queries or tx history | By design (`--state-pruning 256`); archive node or indexer needed for that |