Initial push

This commit is contained in:
2026-06-12 14:46:49 -04:00
parent 0355ae14de
commit 5b03f95e9e

View File

@@ -0,0 +1,357 @@
# Duniter Node Architecture & Substrate Storage Key Derivation
**Status:** Verified working — 2026-06-12
**Context:** cry01 Value Layer, Ğ1 balance lookup feature
This document records findings from implementing and debugging the Ğ1 balance
lookup feature in `cry01`. These were established through direct
experimentation against the live Ğ1 mainnet via the orchestrator's Duniter
nodes, and are not assumptions — every claim below was verified against
real RPC responses.
---
## 1. Substrate Storage Key Derivation
To read any value from a Substrate chain's state (e.g. an account's balance),
you construct a storage key and call `state_getStorage`. For a `StorageMap`
like `System.Account`, the key is:
```
storage_key = Twox128(PalletName) . Twox128(StorageItemName) . Hasher(map_key)
```
For `System.Account(account_id)`:
```
storage_key = Twox128("System") . Twox128("Account") . Blake2_128Concat(account_id)
= Twox128("System") . Twox128("Account") . Blake2b_128(account_id) . account_id
```
### 1.1 Twox128 — THE CRITICAL GOTCHA
**Substrate's "Twox128" is NOT the same algorithm as the generic "xxHash128"
(xxh128) that PHP's `hash()` function natively supports.** They produce
different 16-byte outputs for the same input, despite the similar name and
identical output size. This distinction cost most of a debugging session and
must not be re-litigated.
**Correct Twox128 construction:**
```
Twox128(data) = reverse(xxh64(data, seed=0)) . reverse(xxh64(data, seed=1))
```
That is: two separate 64-bit xxHash digests (seeds 0 and 1), each
**byte-reversed**, then concatenated to form 16 bytes.
**PHP implementation** (verified correct, PHP 8.1+):
```php
function cry01_twox128($data) {
$h0 = strrev(hash('xxh64', $data, true, ['seed' => 0]));
$h1 = strrev(hash('xxh64', $data, true, ['seed' => 1]));
return $h0 . $h1;
}
```
**Verification:** `Twox128("System") = 26aa394eea5630e07c48ae0c9558cef7` and
`Twox128("Account") = b99d880ec681799c0cf30e8886371da9` — these match the
canonical `System::Account` storage prefix published throughout Substrate/
Polkadot documentation. This is strong independent confirmation: any
Substrate-based chain explorer or tool will recognize this prefix.
**What does NOT work:** `hash('xxh128', $data, true)`. This is a different,
single-pass 128-bit xxHash variant. It passes the generic xxh128 test vectors
(e.g. `hash('xxh128', 'php.watch')` = `16c27099bd855aff3b3efe27980515ad`,
which IS correct for plain xxh128) — but plain xxh128 is simply the wrong
algorithm for Substrate storage prefixes. A test vector passing for "xxh128
in general" tells you nothing about whether it's the right primitive for
"Substrate's Twox128" — these are unrelated facts that happen to share a
name fragment.
### 1.2 Blake2_128Concat — confirmed correct
`Blake2_128Concat(key) = Blake2b_128(key) . key` — i.e. the Blake2b-128 hash
of the key, followed by the raw key bytes appended (not replaced).
Blake2b-128 is **RFC 7693 parameterized** output (the output length is part
of the hash's parameter block, NOT a truncation of Blake2b-512). PHP's
`hash()` function does **not** support `blake2b` as an algorithm on this
PHP 8.2.31 build at all (`hash_algos()` does not list `blake2b` or
`blake2b512`). We vendor `deemru/Blake2b` (pure PHP, MIT license,
`hubzilla/addon/cry01/vendor/Blake2b.php`) for this.
Verified test vectors (RFC 7693, cross-checked via Python `hashlib.blake2b`):
```
Blake2b-128("") = cae66941d9efbd404e4d88758ea67670
Blake2b-128("abc") = cf4ab791c62b8d2b2109c90275287816
```
### 1.3 Full worked example
For account `g1LvTpYXkKEASMiBYLp8RQmSN5kZyXtoHX8XE2FqQ9hDjqp5B`:
```
account_id (32 bytes) = 55f2d285cf400d2da003d43fe0ccd5207b6f08780bfdd62999e00d14dd731938
storage_key = 0x26aa394eea5630e07c48ae0c9558cef7
b99d880ec681799c0cf30e8886371da9
b157780e8874e1d5aeee0f3620cf7f76
55f2d285cf400d2da003d43fe0ccd5207b6f08780bfdd62999e00d14dd731938
```
`state_getStorage` on this key returns the SCALE-encoded `AccountInfo`
struct (see §3).
---
## 2. SS58 Address Decoding (Ğ1 addresses)
Ğ1 addresses (e.g. `g1LvTpYXkKEASMiBYLp8RQmSN5kZyXtoHX8XE2FqQ9hDjqp5B`) are
SS58-encoded. **The leading "g1" is NOT a literal prefix string** — it is
simply the first two characters of the base58 encoding, which happen to spell
"g1" coincidentally. The actual network identifier is encoded in bytes.
**Confirmed format for Ğ1 (verified against a real address with valid
checksum):**
- Base58-decode the full address string → **36 bytes total**
- Byte layout: `2-byte network prefix (0x5891) + 32-byte account ID + 2-byte checksum`
- Checksum = first 2 bytes of `Blake2b-512("SS58PRE" + prefix + account_id)`
This is the 14-bit extended SS58 prefix format (prefixes ≥ 64 use 2 bytes;
Ğ1's prefix `0x5891` decodes to network ID 4129... — the exact numeric value
wasn't computed, only the raw 2-byte form was needed and verified).
**Implementation:** `cry01_ss58_decode()` in `cry01_chain.php`. Generic
base58 decode is `cry01_base58_decode()` — pure PHP, byte-array accumulator,
no bcmath/gmp dependency, handles arbitrary-length input.
**Caveat:** other Substrate chains/older Duniter v1 addresses may decode to
a different total length (e.g. 32 bytes with no checksum at all — this was
observed for an old Cesium v1-era address during testing, and correctly
rejected by `cry01_ss58_decode()` as "unexpected decoded length"). The 36-byte
/ 2-prefix-byte format is specific to (at least) Ğ1 v2 addresses as currently
generated.
---
## 3. AccountInfo Decoding
`state_getStorage` on a `System.Account` key returns a SCALE-encoded
`AccountInfo` struct:
```
nonce: u32 (4 bytes)
consumers: u32 (4 bytes)
providers: u32 (4 bytes)
sufficients: u32 (4 bytes)
data.free: u128 (16 bytes) <- the spendable balance
data.reserved: u128 (16 bytes)
data.frozen: u128 (16 bytes)
data.flags: u128 (16 bytes)
```
All fields are little-endian, concatenated with no padding/separators
(total 80 bytes when all fields present, though trailing zero fields may be
omitted/truncated in the raw response — always check actual length).
`free` is at byte offset 16, length 16 (u128, little-endian). Duniter v2 uses
**centimes** (1 Ğ1 = 100 units) as the smallest unit, same as Duniter v1.
**u128 arithmetic without bcmath/gmp:** `cry01_le_bytes_to_decimal_string()`
implements little-endian byte → base-10 string conversion using only
string-based big-integer add/multiply (`cry01_decimal_string_add()`,
`cry01_decimal_string_multiply()`). No PHP extensions required.
**Verified result:** account with 1 Ğ1 → `free` raw value `100` → formatted
as `1.00 Ğ1`.
---
## 4. Node Architecture: Light vs. Full
### 4.1 Light mirror node (`duniter-mirror.service`, pre-existing)
- `--state-pruning 256` (default-ish), no explicit `--sync` flag
- Disk usage: ~2GB at block ~1.39M
- **Can serve `state_getStorage` for CURRENT state** (verified — this works
fine for balance lookups)
- Cannot serve state for blocks older than the pruning window (~256 blocks,
roughly 25 minutes of history at 6s block time)
- RPC originally bound to `127.0.0.1:9944` and `[::1]:9944` only (loopback) —
**not reachable from the Hubzilla node over Wireguard** until fixed (see §5)
### 4.2 Full-state node (`duniter-full.service`, new tonight)
- `--sync fast --state-pruning 256`
- "fast" sync: downloads blocks without executing them, downloads latest
state with proofs — much faster than `full` sync (full block execution
from genesis)
- Disk usage: **under 5GB** after sync to chain head (~1.39M blocks) —
significantly smaller than initially estimated; the 32GB volume resize
done tonight was generously oversized
- Sync time from genesis to chain head: **roughly 10-15 minutes** at
~1500-2500 blocks/sec, ~600-900 KiB/s
- Same current-state query capability as the light node — **for the balance
lookup use case, this node was not strictly necessary**; the Twox128 fix
alone would have made the light node work too (confirmed by testing the
corrected storage key against both nodes — identical correct result)
### 4.3 What NEITHER node provides: full transaction history
Both nodes above use `--state-pruning 256` — only recent state is retrievable.
**Neither supports querying historical balances at arbitrary past blocks, nor
provides transaction history.** For the planned future feature (paste an
address, see full transaction history), this requires either:
- `--state-pruning archive` (keep state for every historical block —
significantly larger disk footprint, not yet measured)
- A separate indexer (e.g. Subsquid/Squid, mentioned in Duniter's own docs
for "public RPC" setups) that processes blocks and stores an indexed
transaction database — likely the more practical path for a
transaction-history UI, since raw archive-node state queries don't give
you "all transactions for address X" without scanning every block
This is future work, scoped separately.
### 4.4 Smith / validator node — explicitly out of scope here
A Smith (validator) node requires session keys, `rotateKeys`, and on-chain
Smith certification within the Ğ1 web of trust. This is a substantially
larger, separate project (new Proxmox container, 786GB available on
`/var/lib/vz` on `proxmox1`) and was **not** undertaken tonight. The
`duniter-full` instance described in §4.2 is a plain full node, not a
validator.
---
## 5. systemd Configuration Changes
### 5.1 `duniter-mirror.service` — RPC bind fix
**Problem:** RPC server only listened on `127.0.0.1:9944` / `[::1]:9944`
the Hubzilla node (on the Wireguard network, 10.0.0.x) could not reach it
(`Connection refused`).
**Fix:** drop-in override at
`/etc/systemd/system/duniter-mirror.service.d/override.conf`:
```ini
[Service]
ExecStart=
ExecStart=/usr/bin/duniter --chain ${DUNITER_CHAIN_NAME} --name ${DUNITER_NODE_NAME}_mirror --listen-addr ${DUNITER_LISTEN_ADDR} --state-pruning ${DUNITER_PRUNING_PROFILE} --base-path ${BASE_PATH} --experimental-rpc-endpoint "listen-addr=127.0.0.1:9944,methods=safe" --experimental-rpc-endpoint "listen-addr=10.0.0.105:9944,methods=safe"
```
**Important gotchas encountered:**
- `--experimental-rpc-endpoint` and the legacy `--rpc-cors` flag are
**mutually exclusive** — using both is a hard error
(`the argument '--rpc-cors <ORIGINS>' cannot be used with
'--experimental-rpc-endpoint <EXPERIMENTAL_RPC_ENDPOINT>...'`)
- The `cors=` sub-option of `--experimental-rpc-endpoint` expects
`key=value` pairs separated by commas — passing a comma-separated list of
CORS origins as `cors=http://a,http://b,...` breaks the parser (each
origin gets misinterpreted as a separate `key=value` attempt). **We
omitted `cors=` entirely** — not needed for server-to-server JSON-RPC
(no browser involved).
- `--experimental-rpc-endpoint` **replaces** the legacy RPC config wholesale
— including the default localhost binding. The Oracle
(`ORACLE_RPC_URL=ws://127.0.0.1:9944`) depends on a localhost endpoint, so
**two** `--experimental-rpc-endpoint` flags are needed: one for
`127.0.0.1` (Oracle) and one for `10.0.0.105` (Wireguard/Hubzilla access).
- `methods=safe` restricts to read-only RPC methods — appropriate for both
endpoints here, since neither the Oracle nor cry01 need to submit
transactions through these nodes.
**Result confirmed:**
```
Running JSON-RPC server: addr=127.0.0.1:9944,10.0.0.105:9944
```
### 5.2 `duniter-full.service` — new unit
New standalone systemd unit at `/etc/systemd/system/duniter-full.service`:
```ini
[Unit]
Description=Duniter full-state node.
After=network.target
[Service]
Type=simple
User=duniter
Group=duniter
ExecStart=/usr/bin/duniter --chain g1 --name CivicInfrastructure-G1-Full_full --listen-addr /ip4/0.0.0.0/tcp/30334/ws --sync fast --state-pruning 256 --base-path /home/duniter/.local/share/duniter-full --experimental-rpc-endpoint "listen-addr=127.0.0.1:9945,methods=safe" --experimental-rpc-endpoint "listen-addr=10.0.0.105:9945,methods=safe"
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
```
**Gotcha:** an IPv6 `--listen-addr /ip6/[::]/tcp/30334/ws` was attempted
first and failed with `multiaddr parsing error: invalid IPv6 address syntax`
— the shell's bracket-glob handling of `[::]` mangled the argument before it
reached the binary, even when quoted in the heredoc (the unit file itself
stores it correctly, but constructing/testing such strings interactively via
shell is error-prone). **Omitted IPv6 listen-addr entirely** — the existing
`duniter-mirror` unit does the same (IPv4-only `/ip4/0.0.0.0/tcp/30333/ws`),
so this is consistent with existing practice, not a regression.
**Data directory:** `/home/duniter/.local/share/duniter-full`, owned by
`duniter:duniter`, created fresh (separate from the mirror's
`/home/duniter/.local/share/duniter`).
**Disk:** orchestrator's root filesystem (`/dev/loop4`) was resized from
~8GB to 32GB ahead of this to provide headroom. Actual usage after full sync:
under 5GB — the resize was generous relative to actual need, but a 32GB
volume with ~27GB free leaves comfortable room for future growth (state trie
grows over time as the chain progresses and more accounts/identities are
created).
---
## 6. cry01 Configuration
`hubzilla/addon/cry01/config.json` (host-only, not in repo):
```json
"g1_rpc_endpoint": "http://10.0.0.105:9945"
```
Currently points at the new full node (port 9945). Per §4.2, the light
node (port 9944) would also work for balance lookups now that the Twox128
fix is in place — both were verified to return identical correct results
for the test account. The choice of which to point at is not
load-bearing for correctness; it is an operational/redundancy decision left
open for now.
---
## 7. Tools Used for Diagnosis
- **scalecodec** (Python, `pip install scalecodec`) — decodes
`state_getMetadata` output to enumerate pallets/storage items and confirm
hasher types. Installed in the orchestrator's existing venv at
`/srv/civic-orchestrator/venv`.
- **xxhash** (Python, `pip install xxhash`) — used to independently compute
and cross-check Twox128/xxh64 values against the PHP implementation.
- Both are isolated to the orchestrator's Python venv — not installed on the
Hubzilla node.
---
## 8. Summary of Verified Facts (quick reference)
| Claim | Status |
|---|---|
| Twox128 ≠ xxh128; Twox128 = reverse(xxh64(d,0)) + reverse(xxh64(d,1)) | ✅ Verified against live chain |
| Blake2_128Concat = Blake2b-128(key) + key, Blake2b-128 is parameterized (not truncated) | ✅ Verified against RFC 7693 vectors |
| Ğ1 addresses: 36-byte SS58, 2-byte prefix (0x5891), Blake2b-512/SS58PRE checksum | ✅ Verified, checksum matched |
| AccountInfo.free at offset 16, 16 bytes LE, divide by 100 for Ğ1 | ✅ Verified: 1 Ğ1 account → correct result |
| Light node (header-sync) can serve current-state `state_getStorage` | ✅ Verified — works identically to full node for current balances |
| Light/full node disk usage at block ~1.39M | Light: ~2GB. Full (fast sync): <5GB |
| Full sync (fast mode) time from genesis | ~10-15 minutes |
| Neither node supports historical/archive queries or tx history | By design (`--state-pruning 256`); archive node or indexer needed for that |