From 5b03f95e9e166e6a47d236c3a58b09055178903f Mon Sep 17 00:00:00 2001 From: TheRON Date: Fri, 12 Jun 2026 14:46:49 -0400 Subject: [PATCH] Initial push --- docs/DUNITER-RPC-FINDINGS.md | 357 +++++++++++++++++++++++++++++++++++ 1 file changed, 357 insertions(+) create mode 100644 docs/DUNITER-RPC-FINDINGS.md diff --git a/docs/DUNITER-RPC-FINDINGS.md b/docs/DUNITER-RPC-FINDINGS.md new file mode 100644 index 0000000..7b46ef5 --- /dev/null +++ b/docs/DUNITER-RPC-FINDINGS.md @@ -0,0 +1,357 @@ +# Duniter Node Architecture & Substrate Storage Key Derivation + +**Status:** Verified working — 2026-06-12 +**Context:** cry01 Value Layer, Ğ1 balance lookup feature + +This document records findings from implementing and debugging the Ğ1 balance +lookup feature in `cry01`. These were established through direct +experimentation against the live Ğ1 mainnet via the orchestrator's Duniter +nodes, and are not assumptions — every claim below was verified against +real RPC responses. + +--- + +## 1. Substrate Storage Key Derivation + +To read any value from a Substrate chain's state (e.g. an account's balance), +you construct a storage key and call `state_getStorage`. For a `StorageMap` +like `System.Account`, the key is: + +``` +storage_key = Twox128(PalletName) . Twox128(StorageItemName) . Hasher(map_key) +``` + +For `System.Account(account_id)`: + +``` +storage_key = Twox128("System") . Twox128("Account") . Blake2_128Concat(account_id) + = Twox128("System") . Twox128("Account") . Blake2b_128(account_id) . account_id +``` + +### 1.1 Twox128 — THE CRITICAL GOTCHA + +**Substrate's "Twox128" is NOT the same algorithm as the generic "xxHash128" +(xxh128) that PHP's `hash()` function natively supports.** They produce +different 16-byte outputs for the same input, despite the similar name and +identical output size. This distinction cost most of a debugging session and +must not be re-litigated. + +**Correct Twox128 construction:** + +``` +Twox128(data) = reverse(xxh64(data, seed=0)) . reverse(xxh64(data, seed=1)) +``` + +That is: two separate 64-bit xxHash digests (seeds 0 and 1), each +**byte-reversed**, then concatenated to form 16 bytes. + +**PHP implementation** (verified correct, PHP 8.1+): + +```php +function cry01_twox128($data) { + $h0 = strrev(hash('xxh64', $data, true, ['seed' => 0])); + $h1 = strrev(hash('xxh64', $data, true, ['seed' => 1])); + return $h0 . $h1; +} +``` + +**Verification:** `Twox128("System") = 26aa394eea5630e07c48ae0c9558cef7` and +`Twox128("Account") = b99d880ec681799c0cf30e8886371da9` — these match the +canonical `System::Account` storage prefix published throughout Substrate/ +Polkadot documentation. This is strong independent confirmation: any +Substrate-based chain explorer or tool will recognize this prefix. + +**What does NOT work:** `hash('xxh128', $data, true)`. This is a different, +single-pass 128-bit xxHash variant. It passes the generic xxh128 test vectors +(e.g. `hash('xxh128', 'php.watch')` = `16c27099bd855aff3b3efe27980515ad`, +which IS correct for plain xxh128) — but plain xxh128 is simply the wrong +algorithm for Substrate storage prefixes. A test vector passing for "xxh128 +in general" tells you nothing about whether it's the right primitive for +"Substrate's Twox128" — these are unrelated facts that happen to share a +name fragment. + +### 1.2 Blake2_128Concat — confirmed correct + +`Blake2_128Concat(key) = Blake2b_128(key) . key` — i.e. the Blake2b-128 hash +of the key, followed by the raw key bytes appended (not replaced). + +Blake2b-128 is **RFC 7693 parameterized** output (the output length is part +of the hash's parameter block, NOT a truncation of Blake2b-512). PHP's +`hash()` function does **not** support `blake2b` as an algorithm on this +PHP 8.2.31 build at all (`hash_algos()` does not list `blake2b` or +`blake2b512`). We vendor `deemru/Blake2b` (pure PHP, MIT license, +`hubzilla/addon/cry01/vendor/Blake2b.php`) for this. + +Verified test vectors (RFC 7693, cross-checked via Python `hashlib.blake2b`): +``` +Blake2b-128("") = cae66941d9efbd404e4d88758ea67670 +Blake2b-128("abc") = cf4ab791c62b8d2b2109c90275287816 +``` + +### 1.3 Full worked example + +For account `g1LvTpYXkKEASMiBYLp8RQmSN5kZyXtoHX8XE2FqQ9hDjqp5B`: + +``` +account_id (32 bytes) = 55f2d285cf400d2da003d43fe0ccd5207b6f08780bfdd62999e00d14dd731938 +storage_key = 0x26aa394eea5630e07c48ae0c9558cef7 + b99d880ec681799c0cf30e8886371da9 + b157780e8874e1d5aeee0f3620cf7f76 + 55f2d285cf400d2da003d43fe0ccd5207b6f08780bfdd62999e00d14dd731938 +``` + +`state_getStorage` on this key returns the SCALE-encoded `AccountInfo` +struct (see §3). + +--- + +## 2. SS58 Address Decoding (Ğ1 addresses) + +Ğ1 addresses (e.g. `g1LvTpYXkKEASMiBYLp8RQmSN5kZyXtoHX8XE2FqQ9hDjqp5B`) are +SS58-encoded. **The leading "g1" is NOT a literal prefix string** — it is +simply the first two characters of the base58 encoding, which happen to spell +"g1" coincidentally. The actual network identifier is encoded in bytes. + +**Confirmed format for Ğ1 (verified against a real address with valid +checksum):** + +- Base58-decode the full address string → **36 bytes total** +- Byte layout: `2-byte network prefix (0x5891) + 32-byte account ID + 2-byte checksum` +- Checksum = first 2 bytes of `Blake2b-512("SS58PRE" + prefix + account_id)` + +This is the 14-bit extended SS58 prefix format (prefixes ≥ 64 use 2 bytes; +Ğ1's prefix `0x5891` decodes to network ID 4129... — the exact numeric value +wasn't computed, only the raw 2-byte form was needed and verified). + +**Implementation:** `cry01_ss58_decode()` in `cry01_chain.php`. Generic +base58 decode is `cry01_base58_decode()` — pure PHP, byte-array accumulator, +no bcmath/gmp dependency, handles arbitrary-length input. + +**Caveat:** other Substrate chains/older Duniter v1 addresses may decode to +a different total length (e.g. 32 bytes with no checksum at all — this was +observed for an old Cesium v1-era address during testing, and correctly +rejected by `cry01_ss58_decode()` as "unexpected decoded length"). The 36-byte +/ 2-prefix-byte format is specific to (at least) Ğ1 v2 addresses as currently +generated. + +--- + +## 3. AccountInfo Decoding + +`state_getStorage` on a `System.Account` key returns a SCALE-encoded +`AccountInfo` struct: + +``` +nonce: u32 (4 bytes) +consumers: u32 (4 bytes) +providers: u32 (4 bytes) +sufficients: u32 (4 bytes) +data.free: u128 (16 bytes) <- the spendable balance +data.reserved: u128 (16 bytes) +data.frozen: u128 (16 bytes) +data.flags: u128 (16 bytes) +``` + +All fields are little-endian, concatenated with no padding/separators +(total 80 bytes when all fields present, though trailing zero fields may be +omitted/truncated in the raw response — always check actual length). + +`free` is at byte offset 16, length 16 (u128, little-endian). Duniter v2 uses +**centimes** (1 Ğ1 = 100 units) as the smallest unit, same as Duniter v1. + +**u128 arithmetic without bcmath/gmp:** `cry01_le_bytes_to_decimal_string()` +implements little-endian byte → base-10 string conversion using only +string-based big-integer add/multiply (`cry01_decimal_string_add()`, +`cry01_decimal_string_multiply()`). No PHP extensions required. + +**Verified result:** account with 1 Ğ1 → `free` raw value `100` → formatted +as `1.00 Ğ1`. + +--- + +## 4. Node Architecture: Light vs. Full + +### 4.1 Light mirror node (`duniter-mirror.service`, pre-existing) + +- `--state-pruning 256` (default-ish), no explicit `--sync` flag +- Disk usage: ~2GB at block ~1.39M +- **Can serve `state_getStorage` for CURRENT state** (verified — this works + fine for balance lookups) +- Cannot serve state for blocks older than the pruning window (~256 blocks, + roughly 25 minutes of history at 6s block time) +- RPC originally bound to `127.0.0.1:9944` and `[::1]:9944` only (loopback) — + **not reachable from the Hubzilla node over Wireguard** until fixed (see §5) + +### 4.2 Full-state node (`duniter-full.service`, new tonight) + +- `--sync fast --state-pruning 256` +- "fast" sync: downloads blocks without executing them, downloads latest + state with proofs — much faster than `full` sync (full block execution + from genesis) +- Disk usage: **under 5GB** after sync to chain head (~1.39M blocks) — + significantly smaller than initially estimated; the 32GB volume resize + done tonight was generously oversized +- Sync time from genesis to chain head: **roughly 10-15 minutes** at + ~1500-2500 blocks/sec, ~600-900 KiB/s +- Same current-state query capability as the light node — **for the balance + lookup use case, this node was not strictly necessary**; the Twox128 fix + alone would have made the light node work too (confirmed by testing the + corrected storage key against both nodes — identical correct result) + +### 4.3 What NEITHER node provides: full transaction history + +Both nodes above use `--state-pruning 256` — only recent state is retrievable. +**Neither supports querying historical balances at arbitrary past blocks, nor +provides transaction history.** For the planned future feature (paste an +address, see full transaction history), this requires either: + +- `--state-pruning archive` (keep state for every historical block — + significantly larger disk footprint, not yet measured) +- A separate indexer (e.g. Subsquid/Squid, mentioned in Duniter's own docs + for "public RPC" setups) that processes blocks and stores an indexed + transaction database — likely the more practical path for a + transaction-history UI, since raw archive-node state queries don't give + you "all transactions for address X" without scanning every block + +This is future work, scoped separately. + +### 4.4 Smith / validator node — explicitly out of scope here + +A Smith (validator) node requires session keys, `rotateKeys`, and on-chain +Smith certification within the Ğ1 web of trust. This is a substantially +larger, separate project (new Proxmox container, 786GB available on +`/var/lib/vz` on `proxmox1`) and was **not** undertaken tonight. The +`duniter-full` instance described in §4.2 is a plain full node, not a +validator. + +--- + +## 5. systemd Configuration Changes + +### 5.1 `duniter-mirror.service` — RPC bind fix + +**Problem:** RPC server only listened on `127.0.0.1:9944` / `[::1]:9944` — +the Hubzilla node (on the Wireguard network, 10.0.0.x) could not reach it +(`Connection refused`). + +**Fix:** drop-in override at +`/etc/systemd/system/duniter-mirror.service.d/override.conf`: + +```ini +[Service] +ExecStart= +ExecStart=/usr/bin/duniter --chain ${DUNITER_CHAIN_NAME} --name ${DUNITER_NODE_NAME}_mirror --listen-addr ${DUNITER_LISTEN_ADDR} --state-pruning ${DUNITER_PRUNING_PROFILE} --base-path ${BASE_PATH} --experimental-rpc-endpoint "listen-addr=127.0.0.1:9944,methods=safe" --experimental-rpc-endpoint "listen-addr=10.0.0.105:9944,methods=safe" +``` + +**Important gotchas encountered:** + +- `--experimental-rpc-endpoint` and the legacy `--rpc-cors` flag are + **mutually exclusive** — using both is a hard error + (`the argument '--rpc-cors ' cannot be used with + '--experimental-rpc-endpoint ...'`) +- The `cors=` sub-option of `--experimental-rpc-endpoint` expects + `key=value` pairs separated by commas — passing a comma-separated list of + CORS origins as `cors=http://a,http://b,...` breaks the parser (each + origin gets misinterpreted as a separate `key=value` attempt). **We + omitted `cors=` entirely** — not needed for server-to-server JSON-RPC + (no browser involved). +- `--experimental-rpc-endpoint` **replaces** the legacy RPC config wholesale + — including the default localhost binding. The Oracle + (`ORACLE_RPC_URL=ws://127.0.0.1:9944`) depends on a localhost endpoint, so + **two** `--experimental-rpc-endpoint` flags are needed: one for + `127.0.0.1` (Oracle) and one for `10.0.0.105` (Wireguard/Hubzilla access). +- `methods=safe` restricts to read-only RPC methods — appropriate for both + endpoints here, since neither the Oracle nor cry01 need to submit + transactions through these nodes. + +**Result confirmed:** +``` +Running JSON-RPC server: addr=127.0.0.1:9944,10.0.0.105:9944 +``` + +### 5.2 `duniter-full.service` — new unit + +New standalone systemd unit at `/etc/systemd/system/duniter-full.service`: + +```ini +[Unit] +Description=Duniter full-state node. +After=network.target + +[Service] +Type=simple +User=duniter +Group=duniter +ExecStart=/usr/bin/duniter --chain g1 --name CivicInfrastructure-G1-Full_full --listen-addr /ip4/0.0.0.0/tcp/30334/ws --sync fast --state-pruning 256 --base-path /home/duniter/.local/share/duniter-full --experimental-rpc-endpoint "listen-addr=127.0.0.1:9945,methods=safe" --experimental-rpc-endpoint "listen-addr=10.0.0.105:9945,methods=safe" +Restart=always +RestartSec=10 + +[Install] +WantedBy=multi-user.target +EOF +``` + +**Gotcha:** an IPv6 `--listen-addr /ip6/[::]/tcp/30334/ws` was attempted +first and failed with `multiaddr parsing error: invalid IPv6 address syntax` +— the shell's bracket-glob handling of `[::]` mangled the argument before it +reached the binary, even when quoted in the heredoc (the unit file itself +stores it correctly, but constructing/testing such strings interactively via +shell is error-prone). **Omitted IPv6 listen-addr entirely** — the existing +`duniter-mirror` unit does the same (IPv4-only `/ip4/0.0.0.0/tcp/30333/ws`), +so this is consistent with existing practice, not a regression. + +**Data directory:** `/home/duniter/.local/share/duniter-full`, owned by +`duniter:duniter`, created fresh (separate from the mirror's +`/home/duniter/.local/share/duniter`). + +**Disk:** orchestrator's root filesystem (`/dev/loop4`) was resized from +~8GB to 32GB ahead of this to provide headroom. Actual usage after full sync: +under 5GB — the resize was generous relative to actual need, but a 32GB +volume with ~27GB free leaves comfortable room for future growth (state trie +grows over time as the chain progresses and more accounts/identities are +created). + +--- + +## 6. cry01 Configuration + +`hubzilla/addon/cry01/config.json` (host-only, not in repo): + +```json +"g1_rpc_endpoint": "http://10.0.0.105:9945" +``` + +Currently points at the new full node (port 9945). Per §4.2, the light +node (port 9944) would also work for balance lookups now that the Twox128 +fix is in place — both were verified to return identical correct results +for the test account. The choice of which to point at is not +load-bearing for correctness; it is an operational/redundancy decision left +open for now. + +--- + +## 7. Tools Used for Diagnosis + +- **scalecodec** (Python, `pip install scalecodec`) — decodes + `state_getMetadata` output to enumerate pallets/storage items and confirm + hasher types. Installed in the orchestrator's existing venv at + `/srv/civic-orchestrator/venv`. +- **xxhash** (Python, `pip install xxhash`) — used to independently compute + and cross-check Twox128/xxh64 values against the PHP implementation. +- Both are isolated to the orchestrator's Python venv — not installed on the + Hubzilla node. + +--- + +## 8. Summary of Verified Facts (quick reference) + +| Claim | Status | +|---|---| +| Twox128 ≠ xxh128; Twox128 = reverse(xxh64(d,0)) + reverse(xxh64(d,1)) | ✅ Verified against live chain | +| Blake2_128Concat = Blake2b-128(key) + key, Blake2b-128 is parameterized (not truncated) | ✅ Verified against RFC 7693 vectors | +| Ğ1 addresses: 36-byte SS58, 2-byte prefix (0x5891), Blake2b-512/SS58PRE checksum | ✅ Verified, checksum matched | +| AccountInfo.free at offset 16, 16 bytes LE, divide by 100 for Ğ1 | ✅ Verified: 1 Ğ1 account → correct result | +| Light node (header-sync) can serve current-state `state_getStorage` | ✅ Verified — works identically to full node for current balances | +| Light/full node disk usage at block ~1.39M | Light: ~2GB. Full (fast sync): <5GB | +| Full sync (fast mode) time from genesis | ~10-15 minutes | +| Neither node supports historical/archive queries or tx history | By design (`--state-pruning 256`); archive node or indexer needed for that |