September 25, 2010

Don’t reboot your t1.micro [EC2 epic fail]

If you have a t1.micro running an image of Ubuntu 10.04 LTS (Lucid Lynx), don’t reboot it. When I first wrote about t1.micros a few days ago, I forgot to mention that the first instance that I brought up failed, quite catastrophically, upon reboot. I didn’t actually think much of it at the time because I wasn’t that far into configuring the machine. But then, yesterday, Alestic released this note referencing this bug report saying that there is a bug where t1.micro instances running Lucid won’t come back up after a restart and that the bug has been fixed. It’s short, so I’ll let you read it, but basically the cloud-init package was broken and didn’t properly expose the ephemeral0 device causing reboots to fail. Alestic says that all you need to do is do an apt-get update && apt-get upgrade and you’re golden.

Let me tell you first hand…that doesn’t work. This morning, feeling brave, I decided to test the theory out. I was running a t1.micro instance using the old Canonical Ubuntu AMI ami-1634de7f on which I performed an apt-get update and an apt-get upgrade. I saw that the cloud-init package was upgraded, as expected. I initiated a restart and my machine never came back. I initiated a reboot request with ec2-reboot-instances and no dice. Finally, I stopped the instance and then started it with ec2-stop-instances and ec2-start-instances and I still didn’t have any luck. If I were smart, I would have done this with a test instance first, but I was feeling brave and decided I should test my configuration documentation out anyhow. Mostly, I just wanted to make sure that, if my instance was unable to reboot, it did so at a moment when I had the time and ambition to fix it instead of failing at some inopportune time.

Because everything is EBS backed, using an elastic IP, and my documentation is decent, I was able to detach the volumes from the old instance, attach them to the new instance, and get everything running in less than 30 minutes. At some point when I’m feeling very ambitious, I intend to put all the configuration in Puppet to mostly automate the process of migrating to a new instance type, but I’m not quite there yet.

If you have a t1.micro instance running Lucid, my recommendation is to spin up a new instance with the most recent AMI (the most current AMI ID is available at Alestic) and move everything over instead of bothering to perform the apt-get upgrade, which clearly did not work in my case.