Recover a Non-Operational Controller Cluster

When two of the three Avi Controller nodes within a cluster are permanently down and not recoverable, the remaining Controller node in the cluster will be marked operationally down due to the lack of a cluster quorum. Note that all Service Engines will continue to operate in so-called “headless” fashion. This article describes steps to be used to return to a highly available three-node cluster.

Notes

  1. To recover the cluster, the remaining healthy Controller node needs to first be converted to a single-node cluster configuration. Thereafter, two new nodes can be added to the cluster.
  2. There are two ways of recovering a Controller, with configuration and without configuration. It is important to recover one node with configuration to ensure it is made the Controller leader, while other nodes are added as followers to the cluster:
    • To recover a Controller with configuration, use the /opt/avi/scripts/recover_cluster.py script.
    • To recover a Controller without configuration (essentially a factory reset; rarely necessary), use the /opt/avi/scripts/clean_cluster.py script instead. This is not reversible. The Controller will take more time to recreate the database. The /opt/avi/scripts/clean_cluster.py script performs the below tasks:
      • By default, this script reboots the connected SEs, unless the script is run with the switch: /opt/avi/scripts/clean_cluster.py --skip-se-reboot.
      • The only way to login to the Controller node after running the Script is to reset the admin password through the UI.

Typical Recovery

To convert the remaining Controller node to a single-node cluster while preserving the Avi Vantage configuration, run the following script from the root account. If you attempt to run it from a non-root account, the script will fail with a Permission denied message. Run sudo and enter the admin password to be promoted to root before running the script.

root@controller1:/home/admin# /opt/avi/scripts/recover_cluster.py

The script will ask for confirmation as a precaution and remind the user to run the script as root.  

It is highly recommended to power off the other Controllers that were part of the cluster when running the recover_cluster.py script. Failure to do so can put the current and other nodes in an inoperable state.

The script stops all services on the Controller and restarts them. The Controller will be down and inaccessible for a few minutes.

Once the script finishes, you will be able to log into the Controller node as a single-node cluster. To make this a highly available three-node cluster, add two new, unconfigured Controllers nodes to the cluster.

Note: Ensure the Controllers are on the same BASE and PATCH version.

Related article: Backup and Restore of Avi Vantage Configuration