|
Loading...
|
moabusers@supercluster.org
[Prev] Thread [Next] | [Prev] Date [Next]
Re: [Moabusers] job disappeared from qstat and showq, still running on node Pim Schravendijk Mon Mar 21 20:00:31 2011
I had set Nodeavailability combined:procs A bit worrying that this still allows load to go above the number of cores! I'm not sure if this is an issue of torque or moab. Clearly, torque knows which jobs are running but either doesn't tell this to moab, or moab doesn't read it from torque. The fact that after restarting moab, moab still doesn't sync with torque makes me think it's a moab issue ?!? Torque is 2.5.2 as I wrote earlier. I was planning to do the check via a crontab, but that healthcheck seems to have fancy integration in moab so I'll try that. Am 21.03.2011 23:58 schrieb <[EMAIL PROTECTED]>: Hi Pim, At least that can be done with NODEAVAILABILITYPOLICY – but I see Chris got in first! but I’d prefer DEDICATED and to fix the problems of orphaned jobs. Perhaps the node health check could be used to detect spurious load and avoid allocating extra jobs to badly behaving nodes – or even to identify and kill orphan processes. http://www.adaptivecomputing.com/resources/docs/torque/11.2healthcheck.php others have mentioned epilogue in this thread and there are sporadic discussions on torqueusers about killing off processes that should not be running – which is relatively easy if your nodes are not shared but more difficult in the general case – and clearly you have to be very careful not to kill the wrong processes! - Gareth ------------------------------ *From:* Pim Schravendijk [mailto:[EMAIL PROTECTED] *Sent:* Tuesday, 22 March 2011 9:45 AM *To:* Williams, Gareth (CSIRO IM&T, Docklands) *Cc:* [EMAIL PROTECTED]; [EMAIL PROTECTED] *Subject:* Re: RE: [Moabusers] job disappeared from qstat and showq, still running on node Hi Gareth, Thank you for your thoughts on the matter! I checked top before and after purging ...
_______________________________________________ moabusers mailing list [EMAIL PROTECTED] http://www.supercluster.org/mailman/listinfo/moabusers
- Re: [Moabusers] job disappeared from qstat and showq, still running on node, (continued)
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Pim Schravendijk 2011/03/21
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Christopher Samuel 2011/03/21
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Pim Schravendijk 2011/03/21
- Message not available
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Pim Schravendijk 2011/03/22
- Message not available
- Message not available
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Pim Schravendijk 2011/03/22
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Gareth.Williams 2011/03/21
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Pim Schravendijk 2011/03/21
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Christopher Samuel 2011/03/21
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Lloyd Brown 2011/03/21
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Gareth.Williams 2011/03/21
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Pim Schravendijk 2011/03/21 <=
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Christopher Samuel 2011/03/21
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Pim Schravendijk 2011/03/21
- Re: [Moabusers] job disappeared from qstat and showq, still running on node Christopher Samuel 2011/03/21
Re: [Moabusers] job disappeared from qstat and showq, still running on node Christopher Samuel 2011/03/21