|
Loading...
|
torqueusers@supercluster.org
[Prev] Thread [Next] | [Prev] Date [Next]
Re: [torqueusers] sporadic scp failures Joshua Bernstein Wed Feb 17 11:00:51 2010
Jeff,
Have you looked through the pbs_mom log files, or even
/var/log/messages on the
headnode? You might be running into a situation where either pbs_mom or sshd
(on
the headnode) is running out of open file descriptors. If you're using bash
shell, you have a look at the maximum number of open files per process using:
$ ulimit -n
1024
Generally this number is set to 1024 by default, but if you have a large
cluster, and the headnode is rather busy, SSHD may not be able to fork() in
order to receive the incoming SCP connection.
-Joshua Bernstein
Senior Software Engineer
Penguin Computing
Jeff Anderson-Lee wrote:
> I'm getting sporadic failures when it tries to copy the results .ER and
> .OU files back. It is not 100% of the time, nor is is 100% consistent
> on which hosts have problems. Sometimes the same host will succeed for
> one or both files and sometimes it will fail for both.
>
> I'm wondering if this might have something to do with too many scp
> requests showing up simultaneously and some sort of rate-limiting
> happening. Any suggestions on where I might look? What I might tweak?
> Is there some way to increase the default socket backlog, or that used
> by inetd/sshd?
>
> Thanks.
>
> Jeff Anderson-Lee
>
>> PBS Job Id: 958.XXX.berkeley.edu
>> Job Name: STDIN
>> Exec host: s103/11
>> An error has occurred processing your job, see below.
>> Post job file processing error; job 958.XXX.berkeley.edu on host s103/11
>>
>> Unable to copy file /var/spool/torque/spool/958.XXX.berkeley.edu.OU to
>> [EMAIL PROTECTED]:/home/cs/jonah/STDIN.o958
>> *** error from copy
>> ssh_exchange_identification: Connection closed by remote host
>> lost connection
>> *** end error output
>> Output retained on that host in:
>> /var/spool/torque/undelivered/958.XXX.berkeley.edu.OU
>
> _______________________________________________
> torqueusers mailing list
> [EMAIL PROTECTED]
> http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
[EMAIL PROTECTED]
http://www.supercluster.org/mailman/listinfo/torqueusers
- [torqueusers] sporadic scp failures Jeff Anderson-Lee 2010/02/17
- Re: [torqueusers] sporadic scp failures Joshua Bernstein 2010/02/17 <=
- Re: [torqueusers] sporadic scp failures Jeff Anderson-Lee 2010/02/17
- Re: [torqueusers] sporadic scp failures Jeff Anderson-Lee 2010/02/17
- Re: [torqueusers] sporadic scp failures Joshua Bernstein 2010/02/17