Loading...

torqueusers@supercluster.org

[Prev] Thread [Next]  |  [Prev] Date [Next]

Re: [torqueusers] Performance of non-GPU codes on GPU nodes reduced by nvidia-smi overhead Doug Johnson Wed Feb 15 16:01:17 2012

Hi David,

I was going to send a separate email about '--with-nvml-include' once
I had more time to look at the problem.  It seems that nvml.h no
longer exists in the newer versions of the CUDA SDK.  We have version
4.1.28 of both the gpucomputingsdk and cudatoolkit, there is no nvml.h
and enabling this option in torque results in failure to build.  I
Haven't had a chance to take a look at older versions or the release
notes for descriptions of when this changed.

Is it safe to assume that if we were able to use this code, a context
to the cards would be kept open by the mom?

Doug

At Wed, 15 Feb 2012 16:22:09 -0700,
David Beer wrote:
> 
> [1  <multipart/alternative (7bit)>]
> [1.1  <text/plain; ISO-8859-1 (7bit)>]
> 
> [1.2  <text/html; ISO-8859-1 (quoted-printable)>]
> Doug,
> 
> Have you tried using the --with-nvml-include=<path> option in configure? This 
> has pbs_mom use the
> nvidia API for these calls, and should speed things up a bit. The path should 
> be the path to the nvml.h
> file and is usually:
> /usr/local/cuda/CUDAToolsSDK/NVML/
> 
> David
> 
> On Wed, Feb 15, 2012 at 4:15 PM, Doug Johnson <[EMAIL PROTECTED]> wrote:
> 
>     Hi,
>    
>     Has anyone noticed the overhead when enabling GPU support in torque?
>     The nvidia-smi process requires about 4 cpu seconds for each
>     invocation.  When executing a non-GPU code that uses all the cores
>     this results in a bit of oversubscription of the cores.  Since
>     nvidia-smi is executed every 30 seconds to collect card state this
>     results in a measurable decrease in performance.
>    
>     As a workaround I've enabled 'persistence mode' for the card.  When
>     not in use, the card is apparently not initialized.  With persistence
>     mode enabled the cpu time to execute the command is reduced to ~0.02.
>     This will also help with the execution time of short kernels, as the
>     card will be ready to go.
>    
>     Do other people run with persistence mode enabled?  Are there any
>     downsides?
>    
>     Doug
>    
>     PS. I think if X were running this would not be an issue.
>     _______________________________________________
>     torqueusers mailing list
>     [EMAIL PROTECTED]
>     http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> --
> David Beer | Software Engineer
> Adaptive Computing
> 
> 
> [2  <text/plain; us-ascii (7bit)>]
> _______________________________________________
> torqueusers mailing list
> [EMAIL PROTECTED]
> http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
[EMAIL PROTECTED]
http://www.supercluster.org/mailman/listinfo/torqueusers