Kiến thức về CPU trong linux

Các thông số của CPU khi sử dụng lệnh top
Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
0.1%us CPU đang sử dụng 0.1% năng lực xử lý cho running user space processes, tức các tiến trình không thuộc về hoạt động của kernel như shell, các trình biên dịch, database, webserver, và các chương trình khác. Nếu CPU không ở trạng thái nghỉ, idle thì thường phần lớn năng lực của nó đang dùng cho các tiến trình user như thế này.

99.7%id Con số này nói cho chúng ta biết, CPU dành 99.7% thời gian của nó để nghỉ ngơi trong lần lấy mẫu gần nhất, không có gì để làm. Tổng của các con số us, id, ni càng gần 100% càng tốt, nếu không, có lẽ có gì đó sai sai đang xảy ra.
0.2%sy Con số này là lượng thời gian mà CPU dành cho xử lý của kernel, tất cả các process và tài nguyên hệ thống được phụ trách bởi linux kernel. Khi các tiến trình của người dùng (user space processes) cần cái gì đó từ hệ thống như xin cấp phát bộ nhớ, đọc ghi, tạo thêm tiến trình con, thì kernel sẽ làm những việc này. Trong thực tế, kernel sẽ tự đặt lịch, quyết định tiến trình nào sẽ chạy trước, cái nào chạy sau. Con số CPU dành cho phần này càng ít càng tốt, nếu nhiều hơn, có lẽ có vấn đề gì đó như I/O đang tăng cao quá mức.
0.0%ni Con số này phần trăm CPU sử dụng để thay đổi độ ưu tiên các process.
0.0%wa Con số thể hiện %CPU đang dành cho việc ngồi đợi I/O
0.0%hi Con số thể hiện %CPU dùng cho việc xử lý phần cứng bị gián đoạn. Phần cứng gián đoạn từ nhiều phạm vi khác nhau như ổ đĩa lởm, đường truyền mạng tậm tịt, etc, gây ra gián đoạn những gì CPU đang xử lý.
0.0%si Con số thể hiện %CPU dùng cho việc xử lý khi có phần mềm bị gián đoạn. Không thường xảy ra ở mức CPU, thường xảy ra từ lớp kernel trở lên.
0.0%st COn số này dành cho các máy chủ ảo, khi linux chạy trong máy ảo, trên lớp hypervisor, số st ( short for stolen) thể hiện bao nhiêu CPU đã sử dụng để đợi thằng hypervisor, khi nó phục vụ tài nguyên cho các con CPU khác. Con số này bắt nguồn từ thực tế rằng, có nhiều processor ảo dùng chung processor vật lý với nhau.

Một số vấn đề với các con số trên: (để nguyên tiếng anh)

High user mode – If a system suddenly jumps from having spare CPU cycles to running flat out, then the first thing to check is the amount of time the CPU spends running user space processes. If this is high then it probably means that a process has gone crazy and is eating up all the CPU time. Using the top command you will be able to see which process is to blame and restart the service or kill the process.

High kernel usage – Sometimes this is acceptable. For example a program that does lots of console I/O can cause the kernel usage to spike. However if it remains higher for long periods of time then it could be an indication that something isn’t right. A possible cause of such spikes could be a problem with a driver/kernel module.

High niced value – If the amount of time the CPU is spending running processes with a niced priority value jumps then it means that someone has started some intensive CPU jobs on the system, but they have niced the task.
If the niceness level is greater than zero then the user has been courteous enough lower to the priority of the process and therefore avoid a CPU overload. There is probably little that needs to be done in this case, other than maybe find out who has started the process and talk about how you can help out!
But if the niceness level is less than 0, then you will need to investigate what is happening and who is responsible, as such a task could easily cripple the responsiveness of the system.

High waiting on I/O – This means that there are some intensive I/O tasks running on the system that don’t use up much CPU time. If this number is high for anything other than short bursts then it means that either the I/O performed by the task is very inefficient, or the data is being transferred to a very slow device, or there is a potential problem with a hard disk that is taking a long time to process reads & writes.

High interrupt processing – This could be an indication of a broken peripheral that is causing lots of hardware interrupts or of a process that is issuing lots of software interrupts.

Large stolen time – Basically this means that the host system running the hypervisor is too busy. If possible, check the other virtual machines running on the hypervisor, and/or migrate to your virtual machine to another host.

Thẻ: Kiến thức về CPU trong linux

Các thông số của CPU trên linux