KVM guest reports errors – How to troubleshoot or track it ? systemtap ?

I have seen reports saying that – under some scenarios, the KVM guest system reports I/O errors
in its ‘dmesg’ against virtio block devices and also, some times it cause the filesystems inside the guest to go ‘READONLY’ mode.

Tracking such issues are bit difficult and its a pain. I believe we should make use of different
profiling tools to find out whats happening behind the scenes.

I like systemtap for such issues and I will describe a particular scenario in this article.

Below is one of the excerpt [where we are almost sure about the return code] which you may see in such situations.

While going through the ‘qemu-kvm’ code path, we can see 2 tail functions which can reveal or give some more
data on this error message.

One being ‘virtio_blk_handle_write()’ (caller) and other being ‘virtio_blk_rw_complete()’ (callee)

The virtio block request will be processed by backend functions and these functions are called at almost end of it.

Mostly when guest report i/o errors, the virtio_blk_rw_complete() would have received ‘EIO’ ( Error code : -5)
or an Error from virtio_blk_handle_write() function.

As you can see in the below code snippet 1, there are 2 places where the caller pass ‘EIO’ to the callee.

[snip 1]

So above will only pass “EIO” to the callee, its triggered as soon as the ‘request’ is invalid.

Lets see second part of the same func()

[snip 2]

I am more interested in blkreq[*num_writes].cb = virtio_blk_rw_complete;

This is the callback from the block layer, so I am interested to see the ‘error’ returned from it.

I have written a system tap script to track more information about this scenario when this issue happens.

For general availability, I have pushed this to my github account :

github.com/humblec/qemu-kvm-stap

The output would look similar to below:

As you can see above, the ‘Return Code Received” would be something other than ‘0’ when issue happens.

Rest of the fields are self explanatory I feel.

Please shoot your questions, if you have any..

Make sure you try this sytemtap, before applying in any critical servers/situations..
Also, play it with your own risk.. ūüôā

How-ever let me know if I can help you with..

ps# Thanks to ‘fche’ ( web.elastic.org/~fche/ ) for some systemtap foo.

linux system call error codes ( errnos) to human readable error strings..

 

 

Q) ¬† “I got an ‘errno’ returned from the system call and want to map to the exact error message or to the error string which is human readable..

How can I do that ??

This was answered quite few times, how-ever I forgot to share it here, but not this time.

Here is the complete listing of linux error code numbers with its explanation:

 

Why I pasted all these and From where I got it ?

1) ( Why )=> Ans: As a quick reference

2) ( From where) => ‘Errnos’ upto “35” is available in /usr/include/asm-generic/errno-base.h and >35 is available in /usr/include/asm-generic/errno.h

 

A quick start (beginners guide) on system tap and its usage!!

Yes, instead of pointing to urls which slowly conclude the things, I will give a quick start here:

Why a quick start/read:? 

At times it is not a good choice for some people¬† to go and read all the document and get it done, rather they would¬† like to know the basic syntax/usage and a quick overview, so the post ūüôā

The system tap scripts mainly composed with below notation:

probe <event> <handler>

Probes may be broadly classified into “synchronous” and “asynchronous”.¬† A “synchronous” event is deemed to occur when any processor exe‚Äź
cutes an instruction matched by the specification.¬† This gives these probes a reference point (instruction address) from which more¬† con‚Äź
textual data may be available.¬† Other families of probe points refer to “asynchronous” events such as timers/counters rolling over, where
there is no fixed reference point that is related.¬† Each probe point specification may match multiple locations (for example, using wild‚Äź
cards  or  aliases),  and  all them are then probed.  A probe declaration may also contain several comma-separated specifications, all of
which are probed.

EVENT can be any of the following type..
—————————————————————
kernel.function or kernel.statement
process.statement
timer.ms
begin
end
(tapset) aliases

HANDLER
—————————————————————–

filtering/conditionals ( if.. next)
control structures (foreach, while)
Helper functions: pid,execname, log

Below are examples of ‘valid’ probe points..

¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† kernel.function(“foo”).return
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† process(“/bin/vi”).statement(0x2222)
              end
              syscall.*
              syscall.*.return.maxactive(10)
              sys**open
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† kernel.function(“no_such_function”) ?
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† module(“awol”).function(“no_such_function”) !
              signal.*? if (switch)
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† kprobe.function(“foo”)

For ex:

     To trace entry and exit from a function, use a pair of probes:
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† probe kernel.function(“sys_mkdir”) { println (“enter”) }
¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† probe kernel.function(“sys_mkdir”).return { println (“exit”) }

Note, expressions like ‘*’, ‘?’ ..etc , those are totally valid and much helpful in certain scenarios.

HOW TO RUN:
———————————————
SystemTap scripts are run through the command ‘stap’. stap can run SystemTap scripts from standard input or from file. System tap have 5 level of execution.

Below is a list of commonly used stap options:

-v
Makes the output of the SystemTap session more verbose. You can repeat this option (for example, stap -vvv script.stp) to provide more details on the script’s execution. This option is particularly useful if you encounter any errors in running the script.

-o filename
Sends the standard output to file (filename).
-S size,count
Limit files to size megabytes and limit the the number of files kept around to count. The file names will have a sequence number suffix. This option implements logrotate operations for SystemTap.

-x process ID
Sets the SystemTap handler function target() to the specified process ID. For more information about target(), refer to SystemTap Functions.

-c ‘command’
Sets the SystemTap handler function target() to the specified command and runs the SystemTap instrumentation for the duration of the specified command. For more information about target(), refer to SystemTap Functions.

-e ‘script’
Use script string rather than a file as input for systemtap translator.

-F
Use SystemTap’s Flight recorder mode and make the script a background process. For more information about flight recorder mode, refer to Section 2.3.1, ‚ÄúSystemTap Flight Recorder Mode‚ÄĚ.

stap -e “script” -c “target program”
stap script.stp -c “target program”

Using “-p4” you can create a system tap module as shown below;

¬†¬†¬†¬†¬†¬†¬† $ stap -p4 -e ‘probe begin { printf(“Hello World!\n”); exit() }’
/home/user/.systemtap/cache/85/stap_8553d83f78c_265.ko

Run staprun with the pathname to the module as an argument.

        $ staprun /home/user/.systemtap/cache/stap_8553d83f78c_265.ko
Hello World!

Now if you want to know more , please refer #

sourceware.org/systemtap/SystemTap_Beginners_Guide/

How to create a system tap module from a system tap script ?

Yes, simple , Isn’t it ? but it may not be simple for someone who just started with it, so the post

You could try something like below to generate a module

# stap -k -p4 script.stp

  -k     Keep  the  temporary directory after all processing.  This may be useful in order to examine the generated C code, or to reuse the
compiled kernel object.

The temp directory will be called some thing like below in “tmp” fileststem

“/tmp/stapKaKM2c”

-p4 , does “4th level” of pasrsing which end up in module creation..

Also check “-m” option of stap.

Dropwatch to see where the packets are dropped in kernel stack ..

Ah…… Why I forgot to write about dropwatch ?

“Packets are dropped” , these strings are quite common in IT world, but how to prove it and where it is dropped ?

Till now, people were behind ‘netstat’ ( for ex:netstat -s ) utiilty to see it , but retrieving more information on where it is dropped was still a mystery for most of the people . Dropwatch helps here.
Dropwatch is a project drived by Neil Horman and in his own words, “to improve the visibility developers and sysadmins have into the Linux networking stack. Specifically I am aiming to improve our ability to detect and understand packets that get dropped within the stack. I’ve spent some time talking with many people about what they see as shorcommings in this area, and have come away with 4 points:

Consolidation: Finding dropped packets in the network stack is currently very fragmented. There are numerous statistics proc files and other utilities that need to be consulted in order to have a full view of what packets are getting dropped within the stack. Consolidating all these utilities into one place is very helpful

Clarity: Understanding which statistics and utility outputs correlate to actual dropped packets requres a good deal of knoweldge. Being able to simplify the ability to recognize a dropped packet is helpful

Disambiguation: There is a gap between the recognition of a dropped packet and its root cause. Several statistics can be incremented at multiple points in the kernel, and sometimes for multiple reasons. Being able to point out, with specificity where and why a packet was dropped decreases the time it takes for a admin or developer to correct the problem.

Performance: Checking the current user space utilities and stats for dropped packets is currently an exercise in polling. Its performance is sub-optimal and makes sysadmins hesitant to implement investigations on production systems due to potential performance impact. Improving performance would make admins more likely to use the tools to diagnose the problems.

Normally, monitoring for dropped packets requires the creation of a script that periodically polls all the aformentioned interfaces, checking for a change in various counter values. Dropwatch instead listens on a netlink socket for the kernel to inform userspace (apps like dropwatch and any others), that a packet has been dropped. This of course implies that the kernel has some sort of functionality to this end.”

The dropwatch behaviour can be mimic’d by placing probes on ‘kfree_skb’ and printing ‘symnames’ and ‘count’ as seen here www.sourceware.org/systemtap/examples/network/dropwatch.stp

Dropwatch is available in fedora and can be used to find out dropped packets :

More details can be found in below Urls:

fedorahosted.org/dropwatch/

How to use it:

I had installed dropwatch package in my Fedora-16 system :

#yum install dropwatch.

Once it is installed, you can use dropwatch with any of the swtiches mentioned here:

So, here onwards use it and have a good visibility at packet drop..

References:

fedorahosted.org/dropwatch/

www.sourceware.org/systemtap/examples/network/dropwatch.stp