Manual Node Recovery Guide
The procedure node providers run, under direction from DFINITY, to bring a node back online during an announced subnet recovery.
Manual node recovery is the in-the-field part of a coordinated subnet recovery. A recovery coordinator at DFINITY hands you a small set of parameters; you connect a console to each affected node, type the parameters in, and verify a matching pair of hashes. The node downloads the recovery artifacts, validates them, and rejoins the subnet.
This runbook only applies during an announced recovery. Read the security warning carefully before doing anything else.
Security warning
[!WARNING] Don't get tricked into compromising your nodes. Only complete a manual node recovery if all of the following are true:
- A subnet recovery has been announced on the Internet Computer Status Page.
- The DFINITY team has contacted you on the dedicated Matrix channel
#ic-node-providers-incident-response:matrix.org.- The instructions came from that channel — only DFINITY can post under normal conditions.
If any of those conditions is not met, stop and ask in the Node Provider Matrix channel before proceeding.
Prerequisites
The recovery coordinator will provide four values. Two are short inputs you type at the prompt; two are full hashes you use to verify what the node downloaded.
Recovery input parameters
VERSION— the 40-character commit ID of the recovery GuestOS update image.RECOVERY-HASH-PREFIX— a 6-character prefix of the recovery artifacts' hash.
Recovery full-hashes (for verification only)
VERSION-HASH— the 64-character hash of the recovery GuestOS update image.RECOVERY-HASH— the 64-character hash of the recovery artifacts.
The coordinator will also tell you which of your nodes are in the target subnet and therefore need the procedure run on them.
[!NOTE] Recovery can be completed from a physical console or from the node's remote BMC virtual console view. Use whichever is faster for you to reach.
Recovery steps
1. Obtain console access
Connect to the node — either a physical crash cart or the BMC virtual console. You should see the prompt:
limited-console>
Type help to list the available commands.
2. Initiate the manual recovery TUI
At the limited-console> prompt, run:
manual-recovery
The text user interface (TUI) starts. If it fails to render, jump to the Manual recovery fallback section below.
3. Enter the recovery parameters
When prompted, enter the VERSION and RECOVERY-HASH-PREFIX values
the coordinator provided.
[!WARNING] Type the characters precisely. If a single character is wrong, the recovery will not succeed. If your BMC offers a Virtual Clipboard feature, paste rather than retyping.
4. Confirm the calculated full-hashes
The TUI downloads the recovery artifacts and prints the calculated
VERSION-HASH and RECOVERY-HASH.
[!WARNING] Verify that the calculated full-hashes exactly match the values the recovery coordinator provided. Do not approve the recovery unless both match. A mismatch means the node downloaded something other than what the coordinator intended — abort and report it on the Matrix channel.
5. Monitor the recovery
Watch the recovery logs. After roughly thirty seconds you should see:
========================================================================
SUCCESS: Recovery completed successfully!
========================================================================
Standard boot logs follow. If you see a recovery error page instead, the most likely cause is a mistyped parameter — press enter, return to step 2, and try again.
6. Notify the channel
Post on the recovery Matrix channel confirming that the node has completed recovery. Include the node ID.
Wait for recovery confirmation
Stay on the channel until the coordinator confirms the subnet itself is back online. You may be asked to re-run the procedure, restart a node, or provide additional diagnostics.
Manual recovery fallback
If the manual-recovery TUI fails to render, fall back to running the recovery launcher directly from the restricted bash console.
1. Enter the rbash console
At the limited-console> prompt, run:
rbash-console
2. Run the recovery launcher
Run:
sudo /opt/ic/bin/guestos-recovery-launcher.sh mode=run version=<VERSION> recovery-hash-prefix=<RECOVERY-HASH-PREFIX>
Replace <VERSION> and <RECOVERY-HASH-PREFIX> with the values from
the coordinator. Then resume from step 4 (Confirm the calculated
full-hashes) of the main procedure.
Related
- Node Provider Documentation — the role overview.
- Node Provider Maintenance Guide — context for when recoveries happen.
- Node Provider Troubleshooting — the troubleshooting index that escalates into a recovery.