Loading...

linux-cluster@redhat.com

[Prev] Thread [Next]  |  [Prev] Date [Next]

Re: [Linux-cluster] Clearing a glock Jones, Dave Tue Jul 27 09:00:42 2010

Understood
 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steven Whitehouse
Sent: Tuesday, July 27, 2010 9:25 AM
To: linux clustering
Subject: Re: [Linux-cluster] Clearing a glock

Hi,

On Tue, 2010-07-27 at 08:58 -0500, Jones, Dave wrote:
> 
> Maybe a bit off topic, but IMO Red Hat should really consider changing
> the name of this process.
> Thought I somehow crossed mailing lists and was about to read a post
> about a jammed polymer-framed 9mm pistol. 
> 
> D
> 
:-)

The name is not perhaps ideal, and its origins are somewhat lost in the
mists of time. The main issue is that if we did change it, it would make
things even more confusing at least while the transistion was in effect,

Steve.

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Scooter Morris
> Sent: Monday, July 26, 2010 6:55 PM
> To: linux clustering
> Subject: [Linux-cluster] Clearing a glock
> 
> 
>   We've got two nodes of a three node gfs2 cluster that seem to be in 
> some sort of deadlock.  We're seeing a number of gfs2-related stack 
> traces in dmesg:
> 
> INFO: task igtcpython.sh:22945 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> igtcpython.sh D ffff81011cabd218     0 22945  22874

> (NOTLB)
>   ffff8101f9fedcc8 0000000000000086 ffff8101f9fede38 ffffffff8000a604
>   ffff8102b257f778 0000000000000006 ffff8105819d37a0 ffff8107a7f93820
>   0004084c69951298 00000000000365c6 ffff8105819d3988 000000042d07a810
> Call Trace:
>   [<ffffffff8000a604>] __link_path_walk+0xdf8/0xf42
>   [<ffffffff8002c9e4>] mntput_no_expire+0x19/0x89
>   [<ffffffff8000ea46>] link_path_walk+0xa6/0xb2
>   [<ffffffff887d6ee7>] :gfs2:just_schedule+0x0/0xe
>   [<ffffffff887d6ef0>] :gfs2:just_schedule+0x9/0xe
>   [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e
>   [<ffffffff887d6ee7>] :gfs2:just_schedule+0x0/0xe
>   [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78
>   [<ffffffff800a0aec>] wake_bit_function+0x0/0x23
>   [<ffffffff887d6ee2>] :gfs2:gfs2_glock_wait+0x2b/0x30
>   [<ffffffff887e679e>] :gfs2:gfs2_permission+0x83/0xd5
>   [<ffffffff887e6796>] :gfs2:gfs2_permission+0x7b/0xd5
>   [<ffffffff8000d918>] permission+0x81/0xc8
>   [<ffffffff8003c0c0>] open_exec+0x60/0xc0
>   [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>   [<ffffffff8003ed4d>] do_execve+0x46/0x1f7
>   [<ffffffff8005516d>] sys_execve+0x36/0x4c
>   [<ffffffff8005d4d3>] stub_execve+0x67/0xb0
> 
> and gfs2_hangalyzer is saying:
> 
> ./gfs2_hangalyzer -n wilkins-pi -a
> wilkins-pi: UsrLocal: G:  s:UN n:2/a5b67f f:l t:SH d:EX/0 l:0 a:0 r:58
> wilkins-pi: UsrLocal:  H: s:SH f:W e:0 p:21810 [python] 
> gfs2_readpage+0x61/0x199 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:W e:0 p:21809 [python] 
> gfs2_readpage+0x61/0x199 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:W e:0 p:1897 [python] 
> gfs2_readpage+0x61/0x199 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:6307 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:10436 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:11000 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:11003 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:12140 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:21499 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:32601 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:653 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:10078 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:14436 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:7500 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:22815 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26056 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26062 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26122 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26124 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:26128 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:31825 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:32125 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:2441 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:2444 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:21792 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:21793 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:21794 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:31941 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:31970 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:8584 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:8590 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:8821 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:8822 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:9487 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:9488 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:9489 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:9490 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18878 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:325 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:3139 [ls] 
> gfs2_getattr+0x7d/0xc4 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:3144 [ls] 
> gfs2_getattr+0x7d/0xc4 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:7450 [ls] 
> gfs2_getattr+0x7d/0xc4 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:31741 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:4982 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18258 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18262 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18263 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18265 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:18269 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20039 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20042 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20043 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20044 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20046 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
> wilkins-pi: UsrLocal:  H: s:SH f:aW e:0 p:20763 [igtcpython.sh] 
> gfs2_permission+0x7b/0xd5 [gfs2]
>                          lkb_id N RemoteID  pid exflg lkbflgs stat gr 
> rq    waiting n ln             resource name
> wilkins-pi: UsrLocal:  26513a1 3  1385218 21810     0       0 wait -1

> 3          0 3 24 "       2          a5b67f"
> 
> 
> 
> There is 1 glock with waiters.
> wilkins-pi.compbio.ucsf.edu, pid 21810 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 21809 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 1897 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 6307 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 10436 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 11000 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 11003 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 12140 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 21499 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 32601 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 653 is waiting for glock 2/a5b67f,
but 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 10078 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 14436 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 7500 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 22815 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 26056 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 26062 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 26122 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 26124 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 26128 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 31825 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 32125 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 2441 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 2444 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 21792 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 21793 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 21794 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 31941 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 31970 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 8584 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 8590 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 8821 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 8822 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 9487 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 9488 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 9489 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 9490 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 18878 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 325 is waiting for glock 2/a5b67f,
but 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 3139 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 3144 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 7450 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 31741 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 4982 is waiting for glock 2/a5b67f,
but
> 
> no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 18258 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 18262 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 18263 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 18265 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 18269 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 20039 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 20042 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 20043 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 20044 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 20046 is waiting for glock 2/a5b67f, 
> but no holder was found.
> wilkins-pi.compbio.ucsf.edu, pid 20763 is waiting for glock 2/a5b67f, 
> but no holder was found.
>           The dlm has granted lkb "       2          a5b67f" to pid 
> 391344724
> 
> Clearly, I've got a hung lock of some sort.  Is there any way to clear

> the glock to free up all of these processes?  I really hate to reboot 
> the cluster to clear this up since it's only effecting one
pipeline....
> 
> Thanks in advance!
> 
> -- scooter
> 
> --
> Linux-cluster mailing list
> [EMAIL PROTECTED]
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> Confidentiality Warning:  This e-mail contains information intended
only for the use of the individual or entity named above.  If the reader
of this e-mail is not the intended recipient or the employee or agent
responsible for delivering it to the intended recipient, any
dissemination, publication or copying of this e-mail is strictly
prohibited.  The sender does not accept any responsibility for any loss,
disruption or damage to your data or computer system that may occur
while using data contained in, or transmitted with, this e-mail.  
> If you have received this e-mail in error, please immediately notify
us by return e-mail.  Thank you.
> 
> 
> 
> 
> --
> Linux-cluster mailing list
> [EMAIL PROTECTED]
> https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
[EMAIL PROTECTED]
https://www.redhat.com/mailman/listinfo/linux-cluster

Confidentiality Warning:  This e-mail contains information intended only for 
the use of the individual or entity named above.  If the reader of this e-mail 
is not the intended recipient or the employee or agent responsible for 
delivering it to the intended recipient, any dissemination, publication or 
copying of this e-mail is strictly prohibited.  The sender does not accept any 
responsibility for any loss, disruption or damage to your data or computer 
system that may occur while using data contained in, or transmitted with, this 
e-mail.  
If you have received this e-mail in error, please immediately notify us by 
return e-mail.  Thank you.




--
Linux-cluster mailing list
[EMAIL PROTECTED]
https://www.redhat.com/mailman/listinfo/linux-cluster