Azure Storage Explorer on Linux
April 25, 2019
Mate desktop
How to add a GUI on CentOS 7 and connect to it via xRDP – step by step instructions
December 9, 2020
Show all

Azure Linux – troubleshooting the “no boot/no ssh” scenario

repair vm

"no boot/no ssh" scenario

Did you ever need to troubleshoot an Azure Linux “no boot/no ssh” scenario?

Azure started making efforts into this direction started about 2 years ago, with the public release of the “Serial Console” . As a result,  customers were able to reach out to their VM’s from the Azure Portal, without the need of an SSH connection.

This was continued with a nice blog post, explaining how this new feature can be used and how to prepare your VM’s to use it.

The Azure Portal has also some new option – the “Swap OS disk” and the “Disk Encryption”.

Therefore,  what was possible through scripting only, is now just at a click of a button away:

Azure SwapOS

Azure Linux “no boot/no ssh” scenario

So..what’s the usage of all this?

The “Swap OS disk”  is one of the useful options in troubleshooting the “no boot/no ssh” scenario.

How that works?

For the “Swap OS disk”, in case you can’t access your VM or your VM is no longer booting right, you need to follow some simple steps:

  1. Create a rescue VM, as similar as possible to the originally affected VM
  2. Create a snapshot from the Affected VM OS disk
  3. From the snapshot, deploy a new disk (make sure it has the same characteristics as the original OS disk on the affected VM)
  4. Attach the new disk to the rescue VM.
  5. Investigate and solve the issue on the affected disk
  6. Once the issue is found and fixed, detach the disk from the rescue VM
  7. Finally, use the “Swap OS disk” option to replace the original OS disk on the affected VM with the one you just fixed on the rescue VM.
  8. Reboot the affected VM

Sounds easy, no? All the above steps, except the step 5, which, of course, requires working on the copy of the affected VM OS disk via the rescue VM, can be performed from Azure Portal by simply using your mouse.

But to make it even faster, we have the “az vm repair” option.

The process is:

1. Open the CloudShell on your Azure Portal page (enable it if needed) and switch to “Bash”

2. Enable the “vm-repair” extension

az extension add -n vm-repair

3. Create a rescue VM and attach a copy of the affected VM OS disk to the rescue VM:

az vm repair create -g "affected_vm_RG_name"  -n  "affected_VM_name"

The above will copy the OS disk from a problematic VM. It will also create a new rescue VM using the same characteristics of the original VM. In the end, it will attach the copy of the disk to the rescue VM. During the process it will ask you to set an username and a passwordfor the repair VM.

4. Now you can connect to the rescue VM and simply fix the issue. Once the problem is solved, you can again take advantage of the “az vm repair restore” , instead of using the “swap os disk” API to swap the OS disk back:

az vm repair restore -g "affected_vm_RG_name" -n "affected_VM_name"

The above will:

  • automatically identify the repair  VM;
  • detach the data disk on the repair VM;
  • replace the OS disk on the affected VM, using the “swap os disk” feature
  • it will prompt you to delete the repair VM and related resources.

Therefore, instead of searching for buttons to click in the Azure Portal, you can run 3 simple commands via AzCLI 2.0 and be done in several minutes.

No headache, no complicated bash or powershell command, no scripts.

Now, you have new tools available for your troubleshooting, therefore, enjoy using them if you ever need them!


Later edit:

I worked together with a friend on an AzCLI 2.0  script that makes this even easier for you: https://github.com/marinnedea/Repair-and-Restore-VM

The script simply prompts you for the subscription ID, the Resource Group name and the VM name for the affected VM, then creates the rescue environment for you and lets you know the IP of the rescue VM so you can easily ssh into it and further troubleshooting the “no boot/no ssh” scenario.

Once done, you just need to call the same script and tell it you wish to restore the VM. It will use the data already provided during the rescue VM creation phase, so you don’t need to provide that information again.

Enjoy!

Marin Nedea
Marin Nedea
I'm passionate about open source software and technologies. In my spare time I build simple and functional websites from scratch, using PHP+HTML5+CSS3+MySQL and when I'm bored, I write simple PHP_CLI or bash scripts to play around on my Linux machine.

Comments are closed.

Hire us!