Bug in Xenial stemcell 621.151 and 621.154 causes "device change" storm and high CPU IO wait
search cancel

Bug in Xenial stemcell 621.151 and 621.154 causes "device change" storm and high CPU IO wait

book

Article ID: 293889

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

Customer upgrades Xenial stemcell to 621.151 or 621.154 and observes high CPU IO wait.
 
# example output from top

# top -c
top - 16:08:56 up 1:14, 1 user, load average: 1.53, 1.64, 1.59
Tasks: 150 total, 1 running, 103 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.8 us, 8.1 sy, 0.0 ni, 44.4 id, 42.4 wa, 0.0 hi, 1.3 si, 0.0 st
...

You may have the following symptoms.
# there is a process ejecting CD-ROM

$ ps -ef | grep eject
root   24756 24754 0 17:56 ?    00:00:00 /lib/udev/cdrom_id --eject-media /dev/sr0
 
# `udevadm monitor -u -k` is showing you an endless stream of "device change" events

The problem is caused by a linux kernel bug: You could workaround the issue by the following steps.
  1. sudo -i mv /lib/udev/cdrom_id /tmp
  2. wait for the "device change" storm to stop
  3. sudo -i mv /tmp/cdrom_id /lib/udev/cdrom_id
To apply the workaround to all BOSH deployed VMs, please run
bosh -d <dep> ssh -c 'sudo -i mv /lib/udev/cdrom_id /tmp;sleep 3;sudo -i mv /tmp/cdrom_id /lib/udev/cdrom_id'

NOTE: this issue has been observed on Azure platform.

Environment

Product Version: 2.10
OS: Ubuntu

Resolution

Xenial stemcell 621.151 has been removed from Tanzu Network due to another issue with persistent disk attachment (KB article). If you upgrade stemcell to 621.154 on Azure platform, you may still hit the ""device change" storm issue. Please either revert the stemcell version or apply the workaround as mentioned above.

The linux kernel bug has been fixed. Upcoming Xenial stemcell will pick up the fix. Please watch stemcell release note for update.