Monday, January 21, 2013

Linux High CPU Waiting

Below contents is quoted from http://www.chileoffshore.com/en/interesting-articles/126-linux-wait-io-problem


to determine who is consuming CPU IO

# vmstat 2
# ps -auxf

Here the most important column for us is the STAT, which means some thing as follow:
       D    Uninterruptible sleep (usually IO)
       R    Running or runnable (on run queue)
       S    Interruptible sleep (waiting for an event to complete)
       T    Stopped, either by a job control signal or because it is being traced.
       W    paging (not valid since the 2.6.xx kernel)
       X    dead (should never be seen)
       Z    Defunct ("zombie") process, terminated but not reaped by its parent.
So, just as mentioned above, if a process with its stat with "D", it means it is actually taking all CPU resource with no any possible interruption. This means your Linux Box will wait on IO and does not responding any other commands if such process is always there.
To nail down which process is "eating" your CPU time, you can use this command:
# while true; do date; ps auxf | awk '{if($8=="D") print $0;}'; sleep 1; done
Tue Aug 23 20:03:42 CLT 2011
Tue Aug 23 20:03:43 CLT 2011
root       321  0.0  0.0      0     0 ?        D    May22   4:11  \_ [jbd2/dm-0-8]
Tue Aug 23 20:03:44 CLT 2011
Tue Aug 23 20:03:45 CLT 2011
Tue Aug 23 20:03:46 CLT 2011
...
Tue Aug 23 20:03:47 CLT 2011
Tue Aug 23 20:03:53 CLT 2011
Tue Aug 23 20:03:54 CLT 2011
root       302  0.0  0.0      0     0 ?        D    May22   2:58  \_ [kdmflush]
root       321  0.0  0.0      0     0 ?        D    May22   4:11  \_ [jbd2/dm-0-8]
Tue Aug 23 20:03:55 CLT 2011
Tue Aug 23 20:03:56 CLT 2011
Tue Aug 23 20:03:57 CLT 2011
Tue Aug 23 20:03:58 CLT 2011
Tue Aug 23 20:03:59 CLT 2011
root       302  0.0  0.0      0     0 ?        D    May22   2:58  \_ [kdmflush]
root       321  0.0  0.0      0     0 ?        D    May22   4:11  \_ [jbd2/dm-0-8]
Tue Aug 23 20:04:00 CLT 2011
Tue Aug 23 20:04:01 CLT 2011
Tue Aug 23 20:04:02 CLT 2011
From the result above, you see there are two process which are consume your CPU with Wait IO, kdmflush and jbd2/dm-0-8
Also you can use the following command to realize a monitoring on these two processes:
# while true; do ps auxf | grep D | grep -E "(jbd2\/dm\.*|kdmflush)"; sleep 1; done
root       302  0.0  0.0      0     0 ?        D    May22   2:58  \_ [kdmflush]
root       321  0.0  0.0      0     0 ?        D    May22   4:11  \_ [jbd2/dm-0-8]
root       321  0.0  0.0      0     0 ?        D    May22   4:11  \_ [jbd2/dm-0-8]
root       321  0.0  0.0      0     0 ?        D    May22   4:11  \_ [jbd2/dm-0-8]
root       321  0.0  0.0      0     0 ?        D    May22   4:11  \_ [jbd2/dm-0-8]
root       302  0.0  0.0      0     0 ?        D    May22   2:58  \_ [kdmflush]
root       321  0.0  0.0      0     0 ?        D    May22   4:11  \_ [jbd2/dm-0-8]
root       302  0.0  0.0      0     0 ?        D    May22   2:58  \_ [kdmflush]
root       321  0.0  0.0      0     0 ?        D    May22   4:11  \_ [jbd2/dm-0-8]
As you can see, these two processes are responsible for Wait IO of your linux server. 



Solution


First of all, the reason of high WA is not always the same. But the solution will always on those processes which are with STAT as D. In this case, the configuration of "Journal Disk" should be reconsidered. If the server is a machine for development, it is not recommended to use Journal to protect the hard disk. If the server is a product server, some kind of RAID should be used to protect the failure of disks.
So, in my recommendation, take off the Journal any way.


Other References:

http://serverfault.com/questions/236836/kjournald-reasons-for-high-usage
 



No comments:

Post a Comment