§ Wiki · Wiki entry

Manual Node Recovery Guide

The procedure node providers run, under direction from DFINITY, to bring a node back online during an announced subnet recovery.

Manual node recovery is the in-the-field part of a coordinated subnet recovery. A recovery coordinator at DFINITY hands you a small set of parameters; you connect a console to each affected node, type the parameters in, and verify a matching pair of hashes. The node downloads the recovery artifacts, validates them, and rejoins the subnet.

This runbook only applies during an announced recovery. Read the security warning carefully before doing anything else.

Security warning

[!WARNING] Don't get tricked into compromising your nodes. Only complete a manual node recovery if all of the following are true:

  • A subnet recovery has been announced on the Internet Computer Status Page.
  • The DFINITY team has contacted you on the dedicated Matrix channel #ic-node-providers-incident-response:matrix.org.
  • The instructions came from that channel — only DFINITY can post under normal conditions.

If any of those conditions is not met, stop and ask in the Node Provider Matrix channel before proceeding.

Prerequisites

The recovery coordinator will provide four values. Two are short inputs you type at the prompt; two are full hashes you use to verify what the node downloaded.

Recovery input parameters

  • VERSION — the 40-character commit ID of the recovery GuestOS update image.
  • RECOVERY-HASH-PREFIX — a 6-character prefix of the recovery artifacts' hash.

Recovery full-hashes (for verification only)

  • VERSION-HASH — the 64-character hash of the recovery GuestOS update image.
  • RECOVERY-HASH — the 64-character hash of the recovery artifacts.

The coordinator will also tell you which of your nodes are in the target subnet and therefore need the procedure run on them.

[!NOTE] Recovery can be completed from a physical console or from the node's remote BMC virtual console view. Use whichever is faster for you to reach.

Recovery steps

1. Obtain console access

Connect to the node — either a physical crash cart or the BMC virtual console. You should see the prompt:

limited-console>

Type help to list the available commands.

2. Initiate the manual recovery TUI

At the limited-console> prompt, run:

manual-recovery

The text user interface (TUI) starts. If it fails to render, jump to the Manual recovery fallback section below.

3. Enter the recovery parameters

When prompted, enter the VERSION and RECOVERY-HASH-PREFIX values the coordinator provided.

[!WARNING] Type the characters precisely. If a single character is wrong, the recovery will not succeed. If your BMC offers a Virtual Clipboard feature, paste rather than retyping.

4. Confirm the calculated full-hashes

The TUI downloads the recovery artifacts and prints the calculated VERSION-HASH and RECOVERY-HASH.

[!WARNING] Verify that the calculated full-hashes exactly match the values the recovery coordinator provided. Do not approve the recovery unless both match. A mismatch means the node downloaded something other than what the coordinator intended — abort and report it on the Matrix channel.

5. Monitor the recovery

Watch the recovery logs. After roughly thirty seconds you should see:

========================================================================
SUCCESS: Recovery completed successfully!
========================================================================

Standard boot logs follow. If you see a recovery error page instead, the most likely cause is a mistyped parameter — press enter, return to step 2, and try again.

6. Notify the channel

Post on the recovery Matrix channel confirming that the node has completed recovery. Include the node ID.

Wait for recovery confirmation

Stay on the channel until the coordinator confirms the subnet itself is back online. You may be asked to re-run the procedure, restart a node, or provide additional diagnostics.

Manual recovery fallback

If the manual-recovery TUI fails to render, fall back to running the recovery launcher directly from the restricted bash console.

1. Enter the rbash console

At the limited-console> prompt, run:

rbash-console

2. Run the recovery launcher

Run:

sudo /opt/ic/bin/guestos-recovery-launcher.sh mode=run version=<VERSION> recovery-hash-prefix=<RECOVERY-HASH-PREFIX>

Replace <VERSION> and <RECOVERY-HASH-PREFIX> with the values from the coordinator. Then resume from step 4 (Confirm the calculated full-hashes) of the main procedure.