|
Loading...
|
torqueusers@supercluster.org
[Prev] Thread [Next] | [Prev] Date [Next]
[torqueusers] pbs_mom logging loads of Success(0) get_proc_stat Michael Meier Wed Jan 23 00:00:25 2008
There already was a discussion about this subject in July, but judging from the mailing list archive, it seems to have died off without a solution. Since we have the exact same problem, I tried to track it down a little.Mom logs stuff like that:
12/07/2007 00:04:10;0001; pbs_mom;Svr;pbs_mom;Success (0) in cput_sum, 7058: get_proc_stat 12/07/2007 00:04:10;0001; pbs_mom;Svr;pbs_mom;Inappropriate ioctl for device (25) in mem_sum, 7058: get_proc_stat 12/07/2007 00:04:10;0001; pbs_mom;Svr;pbs_mom;Inappropriate ioctl for device (25) in resi_sum, 7058: get_proc_stat 12/07/2007 00:04:18;0001; pbs_mom;Svr;pbs_mom;Success (0) in sessions, 7058: get_proc_stat 12/07/2007 00:04:18;0001; pbs_mom;Svr;pbs_mom;Success (0) in sessions, 7058: get_proc_stat 12/07/2007 00:04:18;0001; pbs_mom;Svr;pbs_mom;Success (0) in nusers, 7058: get_proc_stat
PID 7058 is the following process from the IB stack:
the mom tries to parse that line in the following way (from torque-2.3.0-snap.200712061242/src/resmom/linux/mom_mach.c):# cat /proc/7058/stat 7058 (ib_fmr(mthca0)) S 11 0 0 0 -1 32832 0 0 0 0 0 0 0 0 7 -10 1 0 5112 0 0 18446744073709551615 0 0 0 0 0 0 2147483647 65536 0 18446744071563457653 0 0 17 1 0 0
fscanf(fd,"%d (%[^)]) %c %d %d %dThat will probably brake on parsing the '(ib_fmr(mthca0))', because it will assume the first ')' is the closing bracket. Which is just not true. 'man 5 proc' suggests to use '%s', but that will be even worse than the current '%[^)]', breaking on every executable name that contains a space. And what if someone wants run a monster like the following: 6849 (te (s)( ))t)) S 25614 6849 25614 34838 6849 4194304 161 0 0 0 0 0 0 0 20 0 1 0 36168980 2564096 77 18446744073709551615 4194304 4195956 140736421683184 18446744073709551615 47252866936498 0 0 0 0 0 0 0 17 0 0 0 0 The only proper fix would probably be to look for the last ')' in the whole string.
-- Michael Meier, HPC Services Friedrich-Alexander-Universitaet Erlangen-Nuernberg Regionales Rechenzentrum Erlangen Martensstrasse 1, 91058 Erlangen, Germany Tel.: +49 9131 85-28973, Fax: +49 9131 302941 [EMAIL PROTECTED] www.rrze.uni-erlangen.de/hpc/ _______________________________________________ torqueusers mailing list [EMAIL PROTECTED] http://www.supercluster.org/mailman/listinfo/torqueusers
- [torqueusers] pbs_mom logging loads of Success(0) get_proc_stat Michael Meier 2008/01/23 <=
- [torqueusers] Re: pbs_mom logging loads of Success(0) get_proc_stat Michael Meier 2008/01/23
- Re: [torqueusers] Re: pbs_mom logging loads of Success(0) get_proc_stat Garrick Staples 2008/01/23
- Re: [torqueusers] Re: pbs_mom logging loads of Success(0) get_proc_stat Michael Meier 2008/01/23
- Re: [torqueusers] Re: pbs_mom logging loads of Success(0) get_proc_stat Garrick Staples 2008/01/23
- Re: [torqueusers] Re: pbs_mom logging loads of Success(0) get_proc_stat Garrick Staples 2008/01/23