to determine who is consuming CPU IO
# vmstat 2
# ps -auxf
Here the most important column for us is the STAT, which means some thing as follow:
D Uninterruptible sleep (usually IO) R Running or runnable (on run queue) S Interruptible sleep (waiting for an event to complete) T Stopped, either by a job control signal or because it is being traced. W paging (not valid since the 2.6.xx kernel) X dead (should never be seen) Z Defunct ("zombie") process, terminated but not reaped by its parent.
So, just as mentioned above, if a process with its stat with "D", it
means it is actually taking all CPU resource with no any possible
interruption. This means your Linux Box will wait on IO and does not
responding any other commands if such process is always there.
To nail down which process is "eating" your CPU time, you can use this command:
# while true; do date; ps auxf | awk '{if($8=="D") print $0;}'; sleep 1; done Tue Aug 23 20:03:42 CLT 2011 Tue Aug 23 20:03:43 CLT 2011 root 321 0.0 0.0 0 0 ? D May22 4:11 \_ [jbd2/dm-0-8] Tue Aug 23 20:03:44 CLT 2011 Tue Aug 23 20:03:45 CLT 2011 Tue Aug 23 20:03:46 CLT 2011 ... Tue Aug 23 20:03:47 CLT 2011 Tue Aug 23 20:03:53 CLT 2011 Tue Aug 23 20:03:54 CLT 2011 root 302 0.0 0.0 0 0 ? D May22 2:58 \_ [kdmflush] root 321 0.0 0.0 0 0 ? D May22 4:11 \_ [jbd2/dm-0-8] Tue Aug 23 20:03:55 CLT 2011 Tue Aug 23 20:03:56 CLT 2011 Tue Aug 23 20:03:57 CLT 2011 Tue Aug 23 20:03:58 CLT 2011 Tue Aug 23 20:03:59 CLT 2011 root 302 0.0 0.0 0 0 ? D May22 2:58 \_ [kdmflush] root 321 0.0 0.0 0 0 ? D May22 4:11 \_ [jbd2/dm-0-8] Tue Aug 23 20:04:00 CLT 2011 Tue Aug 23 20:04:01 CLT 2011 Tue Aug 23 20:04:02 CLT 2011
From the result above, you see there are two process which are consume your CPU with Wait IO, kdmflush and jbd2/dm-0-8
Also you can use the following command to realize a monitoring on these two processes:
# while true; do ps auxf | grep D | grep -E "(jbd2\/dm\.*|kdmflush)"; sleep 1; done root 302 0.0 0.0 0 0 ? D May22 2:58 \_ [kdmflush] root 321 0.0 0.0 0 0 ? D May22 4:11 \_ [jbd2/dm-0-8] root 321 0.0 0.0 0 0 ? D May22 4:11 \_ [jbd2/dm-0-8] root 321 0.0 0.0 0 0 ? D May22 4:11 \_ [jbd2/dm-0-8] root 321 0.0 0.0 0 0 ? D May22 4:11 \_ [jbd2/dm-0-8] root 302 0.0 0.0 0 0 ? D May22 2:58 \_ [kdmflush] root 321 0.0 0.0 0 0 ? D May22 4:11 \_ [jbd2/dm-0-8] root 302 0.0 0.0 0 0 ? D May22 2:58 \_ [kdmflush] root 321 0.0 0.0 0 0 ? D May22 4:11 \_ [jbd2/dm-0-8]
As you can see, these two processes are responsible for Wait IO of your linux server.
Solution
First of all, the reason of high WA is not always the same. But the solution will always on those processes which are with STAT as D. In this case, the configuration of "Journal Disk" should be reconsidered. If the server is a machine for development, it is not recommended to use Journal to protect the hard disk. If the server is a product server, some kind of RAID should be used to protect the failure of disks.
So, in my recommendation, take off the Journal any way.
Other References:
http://serverfault.com/questions/236836/kjournald-reasons-for-high-usage
Other References:
http://serverfault.com/questions/236836/kjournald-reasons-for-high-usage
No comments:
Post a Comment