Speeding up ps and top
This talk is about a new interface to get information about processes, called task_diag, which we developed.
Currently /proc file system is used to get information about the processes running on the system. All information are presented as text files, which is convenient for humans, but not for programs such as ps and top. This incurs significant delays, especially on a systems with lots of containers running, which is frequently the case nowdays.
Ideally, tools such top and ps would get information in binary format, and use flexible means to specify which kinds of information and for which tasks is required. Presented is a new interface with all these features, called task_diag.
task_diag is based on netlink sockets and looks like socket-diag, which is used to get information about sockets. It uses the request-response model. An request specifies a set of processes and required properties for them. A response contains requested information and can be divided into a few netlink packets if it's too long.
The task diag is much faster than the /proc file system. For example, when reading from /proc, ps opens, reads, and closes many files -- and iterates this for every single processes. With task_diag, it's just sending a request and getting a response.
Except for ps and top, the proposed interface is to be used by CRIU, a containers checkpoint/restore and live migration mechanism. Also, developers of perf tool found that it can be useful to them and implemented a prototype which show a big performance improvements in case of using task_diag instead of procfs.
Our performance measurements show that the ps tool works at least four times faster if task_diag is used instead of procfs.