Android 性能监控-CPU

Android性能监控需要读取CPU和内存基础数据，本文对CPU基础信息获取的实现做个记录，在做了资料查询和Demo测试之后，最后确认在C++层从系统proc伪文件中获取系统和独立进程的CPU数据是可行的方法。

.
getRuntime
().exec()执行shell命令权限限制问题。Android平台版本碎片化严重，一个功能开发出来适配成本大于收益是没有实现价值的。如果要实现Android全平台的兼容性，并在兼顾监控程序性能占用方面，直接下沉到C++是最简单直接的做法。

一、系统CPU详细数据解析

adb shell cat /proc/stat

cpu  1144049 459555 789659 4887508 6606 491 47150 0 0 0
cpu0 227914 104493 193950 4833754 5650 356 36381 0 0 0
cpu1 210497 114938 157488 7360 181 39 4317 0 0 0
cpu2 226087 118127 158548 5856 212 64 1290 0 0 0
cpu3 235205 116848 156808 5840 240 30 507 0 0 0
cpu4 62327 1232 31285 8640 72 0 1209 0 0 0
cpu5 63849 1155 31776 8699 117 1 1235 0 0 0
cpu6 59805 1389 29626 8675 82 1 1143 0 0 0
cpu7 58365 1373 30178 8684 52 0 1068 0 0 0
intr 109000163 0 0 0 6302256 0 3492058 0 0 0 1464 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50732 402867 0 102 0 11347700 0 0 0 0 0 0 111902 817986 142140 10 115125 55002 1 13 0 0 0 0 135683 3185266 305460 929762 43320 352542 0 4910 0 17157074 0 0 0 30 0 0 0 99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9068 44175 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 283 0 0 0 0 0 239712 0 0 0 0 23 8148 0 0 0 0 0 80 0 0 0 0 0 0 0 10 0 0 2 390205 0 1399 0 0 0 1345 0 0 1 2 14 0 0 0 5 4 0 0 0 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13024096 0 1093
ctxt 170696008
btime 1582799381
processes 216427
procs_running 1
procs_blocked 0
softirq 47365292 12136 6295363 9691 235323 724848 9681 25322977 5878597 13480 8863196

cpu  1144049 459555 789659 4887508 6606 491 47150 0 0 0

(1) user（1144049）
Time spent in user mode.
(2) nice（459555）
Time spent in user mode with low priority (nice).
(3) system（789659）
Time spent in system mode.
(4) idle（4887508）
Time spent in the idle task. This value should be USER_HZ times the second entry in the
/proc/uptime pseudo-file.
(5) iowait（6606）(since Linux 2.5.41)
Time waiting for I/O to complete. This value is not reliable, for the following reasons:
1. The CPU will not wait for I/O to complete; iowait is the time that a task is waiting for
I/O to complete. When a CPU goes into idle state for outstanding task I/O, another task
will be scheduled on this CPU.
2. On a multi-core CPU, the task waiting for I/O to complete is not running on any CPU, so the
iowait of each CPU is difficult to calculate.
3. The value in this field may decrease in certain conditions.
(6) irq（491）(since Linux 2.6.0)
Time servicing interrupts.
(7) softirq（47150） (since Linux 2.6.0
Time servicing softirqs.
(8) steal（0） (since Linux 2.6.11)
Stolen time, which is the time spent in other operating systems when running in a virtualized environment
(9) guest（0） (since Linux 2.6.24)
Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.
(10) guest_nice（0） (since Linux 2.6.33)
Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel).

totalCPUTime = user + nice + system + idle + iowait + irq + softirq + steal + guest + guest_nice

二、系统总CPU使用率计算

1）通过采样两个时间点的数据来计算CPU占用，比如间隔1秒，读取/proc/stat两份数据。

• CPUT1 (user1, nice1, system1, idle1, iowait1, irq1, softirq1, steal1, guest1, guest_nice1);
• CPUT2 (user2, nice2, system2, idle2, iowait2, irq2, softirq2, steal2, guest2, guest_nice2);

2）计算总的CPU时间

• CPUTime1 = user1 + nice1 + system1 + idle1 + iowait1 + irq1 + softirq1 +steal1 + guest1 + guest_nice1;
• CPUTime2 = user2 + nice2 + system2 + idle2 + iowait2 + irq2 + softirq2 +steal2 + guest2 + guest_nice2;
totalCPUTime = CPUTime2 – CPUTime1;

3）计算CPU空闲时间

idleCPUTime = idle2 – idle1;

4）计算总的CPU使用率

totalCPURate = (totalCPUTime – idleCPUTime) / totalCPUTime;

TIPS

• 多核CPU情况的计算不需要totalCPURate乘以CPU核数，因为这边已经包括了所有核数的数据。
• 有的时候会出现负值的情况，这种情况下要持续采样数据直到数据非负。
• 从测试情况来看，也会出现类似225%的CPU占用，此时CPU已经过载，CPU超过100%情况属于过载的异常值。

三、独立进程CPU数据解析

adb shell cat /proc//stat

944 (navi.test) S 611 611 0 0 -1 1077952832 23348 0 102 0 54 14 0 0 10 -10 27 0 79497755 2505949184 24555 18446744073709551615 1 1 0 0 0 0 4612 0 34040 0 0 0 17 4 0 0 12 0 0 0 0 0 0 0 0 0 0

(1) pid %d （944）
The process ID.
(2) comm %s（navi.test）
The filename of the executable, in parentheses.
This is visible whether or not the executable is swapped out.
(3) state %c（S）
One of the following characters, indicating process
state:
R Running
S Sleeping in an interruptible wait
D Waiting in uninterruptible disk sleep
Z Zombie
T Stopped (on a signal) or (before Linux 2.6.33)
trace stopped
t Tracing stop (Linux 2.6.33 onward)
W Paging (only before Linux 2.6.0)
X Dead (from Linux 2.6.0 onward)
x Dead (Linux 2.6.33 to 3.13 only)
K Wakekill (Linux 2.6.33 to 3.13 only)
W Waking (Linux 2.6.33 to 3.13 only)
P Parked (Linux 3.9 to 3.13 only)
(4) ppid %d（611）
The PID of the parent of this process.
(5) pgrp %d（611）
The process group ID of the process.
(6) session %d（0）
The session ID of the process.
(7) tty_nr %d（0）
The controlling terminal of the process. (The minor device number is contained in the combination of
bits 31 to 20 and 7 to 0; the major device number is in bits 15 to 8.)
(8) tpgid %d（-1）
The ID of the foreground process group of the controlling terminal of the process.
(9) flags %u（1077952832）
The kernel flags word of the process. For bit meanings, see the PF_* defines in the Linux kernel
source file include/linux/sched.h. Details depend on the kernel version.
The format for this field was %lu before Linux 2.6.
(10) minflt %lu（23348）
The number of minor faults the process has made which have not required loading a memory page from disk.
(11) cminflt %lu（0）
The number of minor faults that the process's waited-for children have made.
(12) majflt %lu（102）
The number of major faults the process has made which have required loading a memory page from disk.
(13) cmajflt %lu（0）
The number of major faults that the process's waited-for children have made.
(14) utime %lu（54）
Amount of time that this process has been scheduled in user mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK)). This includes guest time, guest_time (time spent running a virtual CPU, see
below), so that applications that are not aware of the guest time field do not lose that time from
their calculations.
(15) stime %lu（14）
Amount of time that this process has been scheduled in kernel mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK)).
(16) cutime %ld（0）
Amount of time that this process's waited-for children have been scheduled in user mode, measured in
clock ticks (divide by sysconf(_SC_CLK_TCK)). (See also times(2).) This includes guest time,
cguest_time (time spent running a virtual CPU, see below).
(17) cstime %ld（0）
Amount of time that this process's waited-for children have been scheduled in kernel mode, measured in
clock ticks (divide by sysconf(_SC_CLK_TCK)).
(18) priority %ld（10）
(Explanation for Linux 2.6) For processes running a real-time scheduling policy (policy below; see sched_setscheduler(2)),
this is the negated scheduling priority, minus one; that is, a number in the range -2 to -100,
corresponding to real-time priorities 1 to 99. For processes running under a nonreal-time scheduling policy,
this is the raw nice value (setpriority(2)) as represented in the kernel. The kernel stores nice values as numbers in the
range 0 (high) to 39 (low), corresponding to the user-visible nice range of -20 to 19.
Before Linux 2.6, this was a scaled value based on the scheduler weighting given to this process.
(19) nice %ld（-10）
The nice value (see setpriority(2)), a value in the range 19 (low priority) to -20 (high priority).
Number of threads in this process (since Linux 2.6). Before kernel 2.6, this field was hard coded to 0 as
a placeholder for an earlier removed field.
(21) itrealvalue %ld（0）
The time in jiffies before the next SIGALRM is sent to the process due to an interval timer.
Since kernel 2.6.17, this field is no longer maintained, and is hard coded as 0.
(22) starttime %llu（79497755）
The time the process started after system boot. In kernels before Linux 2.6, this value was expressed
in jiffies. Since Linux 2.6, the value is expressed in clock ticks (divide by sysconf(_SC_CLK_TCK)).
The format for this field was %lu before Linux 2.6.
(23) vsize %lu（2505949184）
Virtual memory size in bytes.
Resident Set Size: number of pages the process has in real memory. This is just the pages which count
toward text, data, or stack space. This does not include pages which have not been demand-loaded in,
or which are swapped out.
Current soft limit in bytes on the rss of the process; see the description of RLIMIT_RSS in getrlimit(2).
(26) startcode %lu [PT]（1）
The address above which program text can run.
(27) endcode %lu [PT]（1）
The address below which program text can run.
(28) startstack %lu [PT]（0）
The address of the start (i.e., bottom) of the stack.
(29) kstkesp %lu [PT]（0）
The current value of ESP (stack pointer), as found in the kernel stack page for the process.
(30) kstkeip %lu [PT]（0）
The current EIP (instruction pointer).
(31) signal %lu（0）
The bitmap of pending signals, displayed as a decimal number. Obsolete, because it does not provide
information on real-time signals; use /proc/[pid]/status instead.
(32) blocked %lu（4612）
The bitmap of blocked signals, displayed as a decimal number. Obsolete, because it does not provide
information on real-time signals; use /proc/[pid]/status instead.
(33) sigignore %lu（0）
The bitmap of ignored signals, displayed as a decimal number. Obsolete, because it does not provide
information on real-time signals; use /proc/[pid]/status instead.
(34) sigcatch %lu（34040）
The bitmap of caught signals, displayed as a decimal number. Obsolete, because it does not provide
information on real-time signals; use /proc/[pid]/status instead.
(35) wchan %lu [PT]（0）
This is the "channel" in which the process is waiting. It is the address of a location in the kernel
where the process is sleeping. The corresponding symbolic name can be found in /proc/[pid]/wchan.
(36) nswap %lu（0）
Number of pages swapped (not maintained).
(37) cnswap %lu（0）
Cumulative nswap for child processes (not maintained).
(38) exit_signal %d (since Linux 2.1.22)（17）
Signal to be sent to parent when we die.
(39) processor %d (since Linux 2.2.8)（4）
CPU number last executed on.
(40) rt_priority %u (since Linux 2.5.19)（0）
Real-time scheduling priority, a number in the range 1 to 99 for processes scheduled under a real-time
policy, or 0, for non-real-time processes (see sched_setscheduler(2)).
(41) policy %u (since Linux 2.5.19)（0）
Scheduling policy (see sched_setscheduler(2)).
Decode using the SCHED_* constants in linux/sched.h. The format for this field was %lu before Linux 2.6.22.
(42) delayacct_blkio_ticks %llu (since Linux 2.6.18)（12）
Aggregated block I/O delays, measured in clock ticks(centiseconds).
(43) guest_time %lu (since Linux 2.6.24)（0）
Guest time of the process (time spent running a virtual CPU for a guest operating system),
measured in clock ticks (divide by sysconf(_SC_CLK_TCK)).
(44) cguest_time %ld (since Linux 2.6.24)（0）
Guest time of the process's children, measured in clock ticks (divide by sysconf(_SC_CLK_TCK)).
(45) start_data %lu (since Linux 3.3) [PT]（0）
Address above which program initialized and uninitialized (BSS) data are placed.
(46) end_data %lu (since Linux 3.3) [PT]（0）
Address below which program initialized and uninitialized (BSS) data are placed.
(47) start_brk %lu (since Linux 3.3) [PT]（0）
Address above which program heap can be expanded with brk(2).
(48) arg_start %lu (since Linux 3.5) [PT]（0）
Address above which program command-line arguments (argv) are placed.
(49) arg_end %lu (since Linux 3.5) [PT]（0）
Address below program command-line arguments (argv) are placed.
(50) env_start %lu (since Linux 3.5) [PT]（0）
Address above which program environment is placed.
(51) env_end %lu (since Linux 3.5) [PT]（0）
Address below which program environment is placed.
(52) exit_code %d (since Linux 3.5) [PT]（0）
The thread's exit status in the form reported by waitpid(2).

processCPUTime = utime + stime + cutime + cstime

四、独立进程CPU使用率计算

1）采样两个足够短时间间隔的系统CPU数据和对应的独立进程CPU数据

• CPUT1 (user1, nice1, system1, idle1, iowait1, irq1, softirq1, steal1, guest1, guest_nice1);
• CPUT2 (user2, nice2, system2, idle2, iowait2, irq2, softirq2, steal2, guest2, guest_nice2);

• ProcessT1 (utime1, stime1, cutime1, cstime1);
• ProcessT2 (utime2, stime2, cutime2, cstime2);

2）计算总的 CPU 时间 totalCPUTime 和进程时间 processTime：

• CPUTime1 = user1 + nice1 + system1 + idle1 + iowait1 + irq1 + softirq1 +steal1 + guest1 + guest_nice1;
• CPUTime2 = user2 + nice2 + system2 + idle2 + iowait2 + irq2 + softirq2 +steal2 + guest2 + guest_nice2;
totalCPUTime = CPUTime2 – CPUTime1;
• processTime1 = utime1 + stime1 + cutime1 + cstime1;
• processTime2 = utime2 + stime2 + cutime1 + cstime2;
processTime = processTime2 – processTime1;

3）计算该进程的CPU使用率 processCPURate

processCPURate = processTime / totalCPUTime;

五、线程CPU使用率计算

adb shell ls /proc//task/

11297 11356 11444 11481 11499 11507 11529 11546 11562 11573 11633 11650 11729
11331 11405 11446 11483 11500 11508 11530 11548 11563 11574 11637 11656 11923
11333 11411 11449 11485 11501 11509 11531 11553 11565 11591 11638 11657 11935
......


adb shell cat /proc//task//stat

• threadT1 (utime1, stime1);
• threadT2 (utime2, stime2);

• threadTime1 = utime1 + stime1;
• threadTime2 = utime2 + stime2;
threadTime = threadTime2 – threadTime1;

threadCPURate = threadTime / totalCPUTime;

六、系统和进程CPU信息获取实现

1. 系统和独立进程CPU信息获取C++实现

#define MAX_LINE 1024

char processCpu[128];

// 读取/proc/stat 系统总CPU信息

// 读取/proc//stat 监控进程CPU信息, monitorProcessPid在执行程序的时候传入
sprintf(processCpu, "/proc/%d/stat", monitorProcessPid);

static int read_procs(char* fileName, FILE* output){
FILE *statFile;
char buf[MAX_LINE];
memset(buf, 0, MAX_LINE);
statFile = fopen(fileName, "r");
if (!statFile) {
printf("Could not open %s .\n", fileName);
return 1;
}
fgets(buf, MAX_LINE, statFile);
fclose(statFile);
fprintf(output, "%s", buf);
return 0;
}

2. Socket服务端

C++

#define DEFAULT_SOCKET_NAME "performance_monitor"

static int start_server(char* sockname)
{
int fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (fd < 0)
{
perror("creating socket");
return fd;
}
sizeof(sa_family_t) + strlen(sockname) + 1) < 0)
{
perror("binding socket");
close(fd);
return -1;
}
listen(fd, 1);
return fd;
}

// Main里面执行Socket初始化逻辑

FILE* output;
int server_fd = start_server(DEFAULT_SOCKET_NAME);
if (server_fd < 0)
{
printf("Unable to start server on %s\n", DEFAULT_SOCKET_NAME);
return -4;
}
if (client_fd < 0)
{
printf("client_fd < 0 ");
}
// 打开/proc 文件目录，在读取/proc目录下的信息的时候需要先做这个动作
DIR * proc_dir = opendir("/proc");
if (!proc_dir){
printf("Could not open /proc.\n");
}
while(client_fd){
output = fdopen(dup(client_fd), "w");
if (output == NULL)
{
printf("monitor socket output is null \n");
}
// 在这做CPU获取输出逻辑
fclose(output);
sleep(second);
}
closedir(proc_dir);
close(client_fd);
close(server_fd);