15 KiB
Duniter Node Architecture & Substrate Storage Key Derivation
Status: Verified working — 2026-06-12 Context: cry01 Value Layer, Ğ1 balance lookup feature
This document records findings from implementing and debugging the Ğ1 balance
lookup feature in cry01. These were established through direct
experimentation against the live Ğ1 mainnet via the orchestrator's Duniter
nodes, and are not assumptions — every claim below was verified against
real RPC responses.
1. Substrate Storage Key Derivation
To read any value from a Substrate chain's state (e.g. an account's balance),
you construct a storage key and call state_getStorage. For a StorageMap
like System.Account, the key is:
storage_key = Twox128(PalletName) . Twox128(StorageItemName) . Hasher(map_key)
For System.Account(account_id):
storage_key = Twox128("System") . Twox128("Account") . Blake2_128Concat(account_id)
= Twox128("System") . Twox128("Account") . Blake2b_128(account_id) . account_id
1.1 Twox128 — THE CRITICAL GOTCHA
Substrate's "Twox128" is NOT the same algorithm as the generic "xxHash128"
(xxh128) that PHP's hash() function natively supports. They produce
different 16-byte outputs for the same input, despite the similar name and
identical output size. This distinction cost most of a debugging session and
must not be re-litigated.
Correct Twox128 construction:
Twox128(data) = reverse(xxh64(data, seed=0)) . reverse(xxh64(data, seed=1))
That is: two separate 64-bit xxHash digests (seeds 0 and 1), each byte-reversed, then concatenated to form 16 bytes.
PHP implementation (verified correct, PHP 8.1+):
function cry01_twox128($data) {
$h0 = strrev(hash('xxh64', $data, true, ['seed' => 0]));
$h1 = strrev(hash('xxh64', $data, true, ['seed' => 1]));
return $h0 . $h1;
}
Verification: Twox128("System") = 26aa394eea5630e07c48ae0c9558cef7 and
Twox128("Account") = b99d880ec681799c0cf30e8886371da9 — these match the
canonical System::Account storage prefix published throughout Substrate/
Polkadot documentation. This is strong independent confirmation: any
Substrate-based chain explorer or tool will recognize this prefix.
What does NOT work: hash('xxh128', $data, true). This is a different,
single-pass 128-bit xxHash variant. It passes the generic xxh128 test vectors
(e.g. hash('xxh128', 'php.watch') = 16c27099bd855aff3b3efe27980515ad,
which IS correct for plain xxh128) — but plain xxh128 is simply the wrong
algorithm for Substrate storage prefixes. A test vector passing for "xxh128
in general" tells you nothing about whether it's the right primitive for
"Substrate's Twox128" — these are unrelated facts that happen to share a
name fragment.
1.2 Blake2_128Concat — confirmed correct
Blake2_128Concat(key) = Blake2b_128(key) . key — i.e. the Blake2b-128 hash
of the key, followed by the raw key bytes appended (not replaced).
Blake2b-128 is RFC 7693 parameterized output (the output length is part
of the hash's parameter block, NOT a truncation of Blake2b-512). PHP's
hash() function does not support blake2b as an algorithm on this
PHP 8.2.31 build at all (hash_algos() does not list blake2b or
blake2b512). We vendor deemru/Blake2b (pure PHP, MIT license,
hubzilla/addon/cry01/vendor/Blake2b.php) for this.
Verified test vectors (RFC 7693, cross-checked via Python hashlib.blake2b):
Blake2b-128("") = cae66941d9efbd404e4d88758ea67670
Blake2b-128("abc") = cf4ab791c62b8d2b2109c90275287816
1.3 Full worked example
For account g1LvTpYXkKEASMiBYLp8RQmSN5kZyXtoHX8XE2FqQ9hDjqp5B:
account_id (32 bytes) = 55f2d285cf400d2da003d43fe0ccd5207b6f08780bfdd62999e00d14dd731938
storage_key = 0x26aa394eea5630e07c48ae0c9558cef7
b99d880ec681799c0cf30e8886371da9
b157780e8874e1d5aeee0f3620cf7f76
55f2d285cf400d2da003d43fe0ccd5207b6f08780bfdd62999e00d14dd731938
state_getStorage on this key returns the SCALE-encoded AccountInfo
struct (see §3).
2. SS58 Address Decoding (Ğ1 addresses)
Ğ1 addresses (e.g. g1LvTpYXkKEASMiBYLp8RQmSN5kZyXtoHX8XE2FqQ9hDjqp5B) are
SS58-encoded. The leading "g1" is NOT a literal prefix string — it is
simply the first two characters of the base58 encoding, which happen to spell
"g1" coincidentally. The actual network identifier is encoded in bytes.
Confirmed format for Ğ1 (verified against a real address with valid checksum):
- Base58-decode the full address string → 36 bytes total
- Byte layout:
2-byte network prefix (0x5891) + 32-byte account ID + 2-byte checksum - Checksum = first 2 bytes of
Blake2b-512("SS58PRE" + prefix + account_id)
This is the 14-bit extended SS58 prefix format (prefixes ≥ 64 use 2 bytes;
Ğ1's prefix 0x5891 decodes to network ID 4129... — the exact numeric value
wasn't computed, only the raw 2-byte form was needed and verified).
Implementation: cry01_ss58_decode() in cry01_chain.php. Generic
base58 decode is cry01_base58_decode() — pure PHP, byte-array accumulator,
no bcmath/gmp dependency, handles arbitrary-length input.
Caveat: other Substrate chains/older Duniter v1 addresses may decode to
a different total length (e.g. 32 bytes with no checksum at all — this was
observed for an old Cesium v1-era address during testing, and correctly
rejected by cry01_ss58_decode() as "unexpected decoded length"). The 36-byte
/ 2-prefix-byte format is specific to (at least) Ğ1 v2 addresses as currently
generated.
3. AccountInfo Decoding
state_getStorage on a System.Account key returns a SCALE-encoded
AccountInfo struct:
nonce: u32 (4 bytes)
consumers: u32 (4 bytes)
providers: u32 (4 bytes)
sufficients: u32 (4 bytes)
data.free: u128 (16 bytes) <- the spendable balance
data.reserved: u128 (16 bytes)
data.frozen: u128 (16 bytes)
data.flags: u128 (16 bytes)
All fields are little-endian, concatenated with no padding/separators (total 80 bytes when all fields present, though trailing zero fields may be omitted/truncated in the raw response — always check actual length).
free is at byte offset 16, length 16 (u128, little-endian). Duniter v2 uses
centimes (1 Ğ1 = 100 units) as the smallest unit, same as Duniter v1.
u128 arithmetic without bcmath/gmp: cry01_le_bytes_to_decimal_string()
implements little-endian byte → base-10 string conversion using only
string-based big-integer add/multiply (cry01_decimal_string_add(),
cry01_decimal_string_multiply()). No PHP extensions required.
Verified result: account with 1 Ğ1 → free raw value 100 → formatted
as 1.00 Ğ1.
4. Node Architecture: Light vs. Full
4.1 Light mirror node (duniter-mirror.service, pre-existing)
--state-pruning 256(default-ish), no explicit--syncflag- Disk usage: ~2GB at block ~1.39M
- Can serve
state_getStoragefor CURRENT state (verified — this works fine for balance lookups) - Cannot serve state for blocks older than the pruning window (~256 blocks, roughly 25 minutes of history at 6s block time)
- RPC originally bound to
127.0.0.1:9944and[::1]:9944only (loopback) — not reachable from the Hubzilla node over Wireguard until fixed (see §5)
4.2 Full-state node (duniter-full.service, new tonight)
--sync fast --state-pruning 256- "fast" sync: downloads blocks without executing them, downloads latest
state with proofs — much faster than
fullsync (full block execution from genesis) - Disk usage: under 5GB after sync to chain head (~1.39M blocks) — significantly smaller than initially estimated; the 32GB volume resize done tonight was generously oversized
- Sync time from genesis to chain head: roughly 10-15 minutes at ~1500-2500 blocks/sec, ~600-900 KiB/s
- Same current-state query capability as the light node — for the balance lookup use case, this node was not strictly necessary; the Twox128 fix alone would have made the light node work too (confirmed by testing the corrected storage key against both nodes — identical correct result)
4.3 What NEITHER node provides: full transaction history
Both nodes above use --state-pruning 256 — only recent state is retrievable.
Neither supports querying historical balances at arbitrary past blocks, nor
provides transaction history. For the planned future feature (paste an
address, see full transaction history), this requires either:
--state-pruning archive(keep state for every historical block — significantly larger disk footprint, not yet measured)- A separate indexer (e.g. Subsquid/Squid, mentioned in Duniter's own docs for "public RPC" setups) that processes blocks and stores an indexed transaction database — likely the more practical path for a transaction-history UI, since raw archive-node state queries don't give you "all transactions for address X" without scanning every block
This is future work, scoped separately.
4.4 Smith / validator node — explicitly out of scope here
A Smith (validator) node requires session keys, rotateKeys, and on-chain
Smith certification within the Ğ1 web of trust. This is a substantially
larger, separate project (new Proxmox container, 786GB available on
/var/lib/vz on proxmox1) and was not undertaken tonight. The
duniter-full instance described in §4.2 is a plain full node, not a
validator.
5. systemd Configuration Changes
5.1 duniter-mirror.service — RPC bind fix
Problem: RPC server only listened on 127.0.0.1:9944 / [::1]:9944 —
the Hubzilla node (on the Wireguard network, 10.0.0.x) could not reach it
(Connection refused).
Fix: drop-in override at
/etc/systemd/system/duniter-mirror.service.d/override.conf:
[Service]
ExecStart=
ExecStart=/usr/bin/duniter --chain ${DUNITER_CHAIN_NAME} --name ${DUNITER_NODE_NAME}_mirror --listen-addr ${DUNITER_LISTEN_ADDR} --state-pruning ${DUNITER_PRUNING_PROFILE} --base-path ${BASE_PATH} --experimental-rpc-endpoint "listen-addr=127.0.0.1:9944,methods=safe" --experimental-rpc-endpoint "listen-addr=10.0.0.105:9944,methods=safe"
Important gotchas encountered:
--experimental-rpc-endpointand the legacy--rpc-corsflag are mutually exclusive — using both is a hard error (the argument '--rpc-cors <ORIGINS>' cannot be used with '--experimental-rpc-endpoint <EXPERIMENTAL_RPC_ENDPOINT>...')- The
cors=sub-option of--experimental-rpc-endpointexpectskey=valuepairs separated by commas — passing a comma-separated list of CORS origins ascors=http://a,http://b,...breaks the parser (each origin gets misinterpreted as a separatekey=valueattempt). We omittedcors=entirely — not needed for server-to-server JSON-RPC (no browser involved). --experimental-rpc-endpointreplaces the legacy RPC config wholesale — including the default localhost binding. The Oracle (ORACLE_RPC_URL=ws://127.0.0.1:9944) depends on a localhost endpoint, so two--experimental-rpc-endpointflags are needed: one for127.0.0.1(Oracle) and one for10.0.0.105(Wireguard/Hubzilla access).methods=saferestricts to read-only RPC methods — appropriate for both endpoints here, since neither the Oracle nor cry01 need to submit transactions through these nodes.
Result confirmed:
Running JSON-RPC server: addr=127.0.0.1:9944,10.0.0.105:9944
5.2 duniter-full.service — new unit
New standalone systemd unit at /etc/systemd/system/duniter-full.service:
[Unit]
Description=Duniter full-state node.
After=network.target
[Service]
Type=simple
User=duniter
Group=duniter
ExecStart=/usr/bin/duniter --chain g1 --name CivicInfrastructure-G1-Full_full --listen-addr /ip4/0.0.0.0/tcp/30334/ws --sync fast --state-pruning 256 --base-path /home/duniter/.local/share/duniter-full --experimental-rpc-endpoint "listen-addr=127.0.0.1:9945,methods=safe" --experimental-rpc-endpoint "listen-addr=10.0.0.105:9945,methods=safe"
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
Gotcha: an IPv6 --listen-addr /ip6/[::]/tcp/30334/ws was attempted
first and failed with multiaddr parsing error: invalid IPv6 address syntax
— the shell's bracket-glob handling of [::] mangled the argument before it
reached the binary, even when quoted in the heredoc (the unit file itself
stores it correctly, but constructing/testing such strings interactively via
shell is error-prone). Omitted IPv6 listen-addr entirely — the existing
duniter-mirror unit does the same (IPv4-only /ip4/0.0.0.0/tcp/30333/ws),
so this is consistent with existing practice, not a regression.
Data directory: /home/duniter/.local/share/duniter-full, owned by
duniter:duniter, created fresh (separate from the mirror's
/home/duniter/.local/share/duniter).
Disk: orchestrator's root filesystem (/dev/loop4) was resized from
~8GB to 32GB ahead of this to provide headroom. Actual usage after full sync:
under 5GB — the resize was generous relative to actual need, but a 32GB
volume with ~27GB free leaves comfortable room for future growth (state trie
grows over time as the chain progresses and more accounts/identities are
created).
6. cry01 Configuration
hubzilla/addon/cry01/config.json (host-only, not in repo):
"g1_rpc_endpoint": "http://10.0.0.105:9945"
Currently points at the new full node (port 9945). Per §4.2, the light node (port 9944) would also work for balance lookups now that the Twox128 fix is in place — both were verified to return identical correct results for the test account. The choice of which to point at is not load-bearing for correctness; it is an operational/redundancy decision left open for now.
7. Tools Used for Diagnosis
- scalecodec (Python,
pip install scalecodec) — decodesstate_getMetadataoutput to enumerate pallets/storage items and confirm hasher types. Installed in the orchestrator's existing venv at/srv/civic-orchestrator/venv. - xxhash (Python,
pip install xxhash) — used to independently compute and cross-check Twox128/xxh64 values against the PHP implementation. - Both are isolated to the orchestrator's Python venv — not installed on the Hubzilla node.
8. Summary of Verified Facts (quick reference)
| Claim | Status |
|---|---|
| Twox128 ≠ xxh128; Twox128 = reverse(xxh64(d,0)) + reverse(xxh64(d,1)) | ✅ Verified against live chain |
| Blake2_128Concat = Blake2b-128(key) + key, Blake2b-128 is parameterized (not truncated) | ✅ Verified against RFC 7693 vectors |
| Ğ1 addresses: 36-byte SS58, 2-byte prefix (0x5891), Blake2b-512/SS58PRE checksum | ✅ Verified, checksum matched |
| AccountInfo.free at offset 16, 16 bytes LE, divide by 100 for Ğ1 | ✅ Verified: 1 Ğ1 account → correct result |
Light node (header-sync) can serve current-state state_getStorage |
✅ Verified — works identically to full node for current balances |
| Light/full node disk usage at block ~1.39M | Light: ~2GB. Full (fast sync): <5GB |
| Full sync (fast mode) time from genesis | ~10-15 minutes |
| Neither node supports historical/archive queries or tx history | By design (--state-pruning 256); archive node or indexer needed for that |