Wednesday, February 10, 2016

Why Puppet was using 96% CPU while ran as a daemon or through command prompt?

Problem :  Puppet was consuming 96% CPU while it ran as a daemon or ran as "puppet agent -t".
Environment: RHEL 6
Solution : Make sure application puppet module is tested with out any error and unwanted issue.
Root Cause Analysis: 

top has shown following:


PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11403 root 20 0 259m 140m 4516 R 96.7 1.9 9:08.77 puppet
1. Traced puppet agent which was run at command prompt while I kept daemon off.
$strace -Tvv  puppet agent -t  &> ./strace_log

2. read() system calls are spawned a lot and continuously printing following messages:
[...]
22:36:37 read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0 .000031="">
[...]
3. While I was tracking filename of corresponding file descriptor, I found /home/qpadm/temp was opened that had size 101GB.
4. read() call was continuously trying to read data from this file and read size was 4096 Bytes in every call. Trace looked below:
22:36:37 open("/home/qpadm/temp", O_RDONLY) = 4 <0 .000032="">
22:36:37 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0 .000026="">
22:36:37 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0 .000026="">
22:36:37 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0 .000027="">
22:36:37 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0 .000026="">
22:36:37 fstat(4, {st_dev=makedev(202, 1), st_ino=790228, st_mode=S_IFREG|0600, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=209715224, st_size=107374182400, st_atime=2016/02/03-07:37:15, st_mtime=2016/02/03-05:11:57, st_ctime=2016/02/03-07:36:48}) = 0 <0 .000027="">
22:36:37 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6d03f3f000 <0 .000030="">
22:36:37 read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0 .000035="">
22:36:37 read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0 .000031="">
In above trace, you see file /home/qpadm/temp is opened in read mode as FD(file descriptor) is 4 and same FD is passed to read() call that returns 4096 Bytes incrementally. It supposed to take long time to read 101GB file[:) would never be finished using huge read calls those would have neen performed by kernel]. That's why CPU spiked to 60-80% consumed by puppet agent.
We need to check why above file is there and what is the usage of it and why it has large size.
This one is I think main culprit.
Another thing I noticed is either ovo was reinstalled or being migrated to newer version. In relation to this, I found following calls :
22:36:00 open("/root/ovoinstall/HPOvPacc.xml", O_RDONLY) = 4 <0 .000022="">

22:36:07 open("/root/ovoinstall/OVO-Agent-Migrate.xml", O_RDONLY) = 4 <0 .000032="">

22:36:07 fstat(4, {st_dev=makedev(202, 1), st_ino=786799, st_mode=S_IFREG|0744, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=4240, st_size=2168841, st_atime=2016/02/03-17:04:19, st_mtime=2016/01/03-22:12:54, st_ctime=2016/01/03-22:15:32}) = 0 <0 .000027="">
22:36:07 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6d03f3f000 <0 .000029="">
22:36:07 read(4, "\355\253\356\333\3\0\0\0\0\1HPOvConf-11.11.025-1\0\0"..., 4096) = 4096 <0 .000032="">
22:36:07 read(4, "\"Unable to copy file\"\nOVCSL_ERRO"..., 4096) = 4096 <0 .000029="">

Application team found that temp was not needed but was called in application puppet module. They disabled it and removed that file. It fixed the issue.


No comments:

Post a Comment