Spurred by Matty’s blog entry on Monitoring system utilization on Linux hosts, I decided to take a fresh look at SystemTap. SystemTap is a way for users to tap into the Linux kernel to see what’s going on at any given point, and resembles DTrace. One way to think about SystemTap is as a global strace with a built-in scripting language.
Installing SystemTap
Thankfully, SystemTap is now easy to load into most Linux distributions. For Ubuntu, run:
sudo apt-get install systemtap sudo apt-get install linux-image-debug-generic sudo ln -s /boot/vmlinux-debug-$(uname -r) /lib/modules/$(uname -r)/vmlinux
If you’re a Fedora user, try:
yum install systemtap kernel-devel yum --enablerepo=fedora-debuginfo --enablerepo=updates-debuginfo install kernel-debuginfo
Who’s using my disk I/O?
One of my favorite (now retired) interview questions for UNIX administrators is “iostat shows a lot of write activity to a device. How do you determine what program is responsible for writing the most data to this device?”. Before SystemTap, the only way to answer this under Linux was to install kernel counter patches, or try to parse block_dump. Using the disktop script from the SystemTap Scripts & Tools page, it’s trivial to find this kind of information out:
% stap disktop.stp
[sudo] password for jars:
Wed Nov 28 03:08:51 2007 , Average: 204Kb/sec, Read: 919Kb, Write: 100Kb
UID PID PPID CMD DEVICE T BYTES
1000 18668 18664 firefox-bin sda1 R 633140
1000 6900 6849 metacity sda1 R 229200
1000 18668 18664 firefox-bin sda1 W 103405
1000 18664 18652 run-mozilla.sh sda1 R 11196
1000 18659 18658 ls sda1 R 10272
1000 6903 6849 gnome-panel sda1 R 9482
1000 18652 1 firefox sda1 R 5960
1000 18653 18652 dirname sda1 R 3290
1000 18654 18652 basename sda1 R 3290
1000 18655 18652 pwd sda1 R 3290
The disktop.stp script is about 60 lines long. This is a bit hefty to start with as your first script, but imagine how many lines of C it would take to get the same data!
gettimeofday() tracking
If you look at the most common system calls executed on your system (trivial to do in SystemTap, btw), you’ll note that gettimeofday() is called an awful lot. Someone a few weeks ago was complaining to me that a recent Linux kernel upgrade made a database drastically slower due to a change in how gettimeofday() was handled. Evidently this application called gettimeofday() hundreds of times a second. So, I decided to checkout what applications were calling gettimeofday() on my Ubuntu instance. Using the SystemTap Documentation, I was able to whip something up in a few minutes:
% stap -e 'probe kernel.function("sys_gettimeofday").return { \
printf("%d\t%s\n",pid(),execname()); }'
Running this shows the following output:
... 6959 gnome-screensav 6959 gnome-screensav 6959 gnome-screensav 6959 gnome-screensav 6609 Xorg 6609 Xorg ...
As you can see, I set a trap for when any gettimeofday() system call returns, and output the name and pid of the program. What I really wanted was a count of how many times an application called gettimeofday() over a short period of time. I only care about programs that call it more than once a second. So, what I did was turn it into a real script:
global progs
probe kernel.function("sys_gettimeofday").return {
progs[execname()]++
}
probe timer.ms(5000) {
exit()
}
probe end {
log("PROGRAM\t\tCOUNT\tCOUNT/SEC")
foreach (prog+ in progs) {
rate = progs[prog] / 5
if (rate > 1) {
printf("%s\t%d\t%d\n", prog, progs[prog], rate)
}
}
}
This time things are a little more complex. We define a global associative array named “progs”, which is keyed off of the name of the program. Each time gettimeofday() returns, we increment progs[execname]. After 5 seconds, we exit, and display the list of programs and their associated counts that exceeded 1 call per second. This is what it looks like when executed:
% stap timeofday.stp PROGRAM COUNT COUNT/SEC Xorg 17 3 firefox-bin 21 4 gnome-panel 15 3 gnome-screensav 38 7 gnome-terminal 27 5 vmware-user 428 85
While vmware-user might call it quite a bit, it doesn’t look like anything on this system is abusing it. Lets move on.
Tracking bad stat() calls
If you’ve ever strace’d a python program, you’ll often notice dozen’s of stat() calls for library files which don’t exist. While stat() is cheap on local disks, it can be a real drag when NFS mounted resources are in use. For kicks, I decided to code something up that reported on stat() calls which returned an error. For whatever reason, getting return code’s for system calls is not very well documented under SystemTap. Here goes my bad-stats.stp script:
global last_stat
probe syscall.*stat {
last_stat[pid()]=argstr
}
probe syscall.*stat.return {
if (returnstr(1) != "0") {
printf ("%s(%d) %s(%s) -> %s\n", execname(), pid(),
probefunc(), last_stat[pid()], returnstr(1))
}
}
probe timer.ms(10000) {
exit()
}
If the code smells bad, it’s because it is. The reason I use *stat is because stat did not work (lstat? stat64?). The only time I can get the arguments to a system call appears to be at it’s inception. The only time I can get the error code to a system call appears to be at it’s return. So, I have an associative array that stores the last arguments passed to *stat() by the pid. Though unlikely, this assumption of the return corresponding to the last stat opened could possibly entertain a race condition.
I also could not figure out how to get an integer for the return code, which is why I compare returnstr(1) to “0″ instead of an integer comparison. In any case, it seems to work:
% stap bad-stats.stp
lsb_release(19298) sys_stat64("/usr/lib/python2.5/lib-tk/_struct", 0xbf9d84c4)
-> -2 (ENOENT)
lsb_release(19298) sys_stat64("/usr/lib/python2.5/lib-dynload/_struct", 0xbf9d84c4)
-> -2 (ENOENT)
lsb_release(19298) sys_stat64("/usr/bin/commands", 0xbf9e2154) -> -2 (ENOENT)
lsb_release(19298) sys_stat64("/usr/lib/python2.5/commands", 0xbf9e2154)
-> -2 (ENOENT)
sh(19299) sys_stat64("/sbin/apt-cache", 0xbfc2a78c) -> -2 (ENOENT)
sh(19299) sys_stat64("/bin/apt-cache", 0xbfc2a78c) -> -2 (ENOENT)
sh(19299) sys_stat64("/usr/sbin/apt-cache", 0xbfc2a78c) -> -2 (ENOENT)
I’ve only scratched the surface with SystemTap. For some real serious fun, checkout the histogram generation functions (@hist_linear, @hist_log) in the language reference.
Add New Comment
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Add New Comment
Trackbacks
(Trackback URL)