freebsd-current
[Prev] Thread [Next] | [Prev] Date [Next]
Re: "New" ZFS crash on FS (pool?) unmount/export Thomas Backman Wed Jun 24 13:00:36 2009
On Sat, Jun 20, 2009 at 12:11 AM, Thomas Backman<[EMAIL PROTECTED]> wrote:I just ran into this tonight. Not sure exactly what triggered it - the box stopped responding to pings at 02:07AM and it has a cron backup job using zfs send/recv at 02:00, so I'm guessing it's related, even though the backupprobably should have finished before then... Hmm. Anyway. r194478. kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x288 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff805a4989 stack pointer = 0x28:0xffffff803e8b57e0 frame pointer = 0x28:0xffffff803e8b5840 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 57514 (zpool) panic: from debugger cpuid = 0 Uptime: 10h22m13s Physical memory: 2027 MB (kgdb) bt #0 doadump () at pcpu.h:223 #1 0xffffffff8059c409 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:419 #2 0xffffffff8059c85c in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:575#3 0xffffffff801f1377 in db_panic (addr=Variable "addr" is not available.) at /usr/src/sys/ddb/db_command.c:478 #4 0xffffffff801f1781 in db_command (last_cmdp=0xffffffff80c38620, cmd_table=Variable "cmd_table" is not available. ) at /usr/src/sys/ddb/db_command.c:445 #5 0xffffffff801f19d0 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498#6 0xffffffff801f3969 in db_trap (type=Variable "type" is not available.) at /usr/src/sys/ddb/db_main.c:229#7 0xffffffff805ce465 in kdb_trap (type=12, code=0, tf=0xffffff803e8b5730)at /usr/src/sys/kern/subr_kdb.c:534#8 0xffffffff8088715d in trap_fatal (frame=0xffffff803e8b5730, eva=Variable"eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:847 #9 0xffffffff80887fb2 in trap (frame=0xffffff803e8b5730) at /usr/src/sys/amd64/amd64/trap.c:345 #10 0xffffffff8086e007 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:223 #11 0xffffffff805a4989 in _sx_xlock_hard (sx=0xffffff0043557d50, tid=18446742975830720512, opts=Variable "opts" is not available. ) at /usr/src/sys/kern/kern_sx.c:575#12 0xffffffff805a52fe in _sx_xlock (sx=Variable "sx" is not available.) at sx.h:155#13 0xffffffff80fe2995 in zfs_freebsd_reclaim () from /boot/kernel/ zfs.ko#14 0xffffffff808cefca in VOP_RECLAIM_APV (vop=0xffffff0043557d38, a=0xffffff0043557d50) at vnode_if.c:1926#15 0xffffffff80626f6e in vgonel (vp=0xffffff00437a7938) at vnode_if.h:830#16 0xffffffff8062b528 in vflush (mp=0xffffff0060f2a000, rootrefs=0, flags=0, td=0xffffff0061528000) at /usr/src/sys/kern/vfs_subr.c:2450 #17 0xffffffff80fdd3a8 in zfs_umount () from /boot/kernel/zfs.ko #18 0xffffffff8062420a in dounmount (mp=0xffffff0060f2a000, flags=1626513408, td=Variable "td" is not available. ) at /usr/src/sys/kern/vfs_mount.c:1287 #19 0xffffffff80624975 in unmount (td=0xffffff0061528000, uap=0xffffff803e8b5c00) at /usr/src/sys/kern/vfs_mount.c:1172 #20 0xffffffff8088783f in syscall (frame=0xffffff803e8b5c90) at /usr/src/sys/amd64/amd64/trap.c:984 #21 0xffffffff8086e290 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:364 #22 0x000000080104e49c in ?? () Previous frame inner to this frame (corrupt stack?) BTW, I got a (one) "force unmount is experimental" on the console. Onregular shutdown I usually get one per filesystem, it seems (at least 10) and this pool should contain exactly as many filesystems as the root pool since it's a copy of it. On running the backup script manually post- crash,though, I didn't get any. Also worth noting is that I was running DTrace all night to test its stability. I'm pretty sure the script was dtrace -n 'syscall::open:entry { @a[copyinstr(arg0)] = count(); }'0 swap was used and 277700 pages (~1084 MB or 50%) RAM was free, accordingto the core.txt. Regards, Thomas
On Jun 20, 2009, at 10:23 AM, Kip Macy wrote:
A force unmount causes all vnodes dirty data to be flushed and then the vnodes to be released. A regular unmount will not complete if there are open files referencing it (hence the use of force unmount on shutdown). Pawel had previously disabled forced unmount because of known "issues". I chose to enable it because it guarantees that dirty data is flushed to disk before shutdown. It looks like we may be reclaiming a vnode that has already been freed. I haven't seen this issue myself, so if you can identify more specifics on how to reproduce it would greatly increase the likelihood of my being able to fix it in the near future. Thanks, Kip
(First off, sorry if the quoting is screwed up.)I got the same panic again, same backtrace, and the exact same point in the backup script (and with a force unmount message as well, which as I said before I've never gotten during backups before, only during shutdown).
[...]receiving incremental stream of tank/test/[EMAIL PROTECTED] into slave/test/[EMAIL PROTECTED]
received 312B stream in 1 seconds (312B/sec) found clone origin slave/usr/[EMAIL PROTECTED]receiving incremental stream of tank/usr/[EMAIL PROTECTED] into slave/usr/[EMAIL PROTECTED]
received 162MB stream in 68 seconds (2.38MB/sec)receiving incremental stream of tank/usr/[EMAIL PROTECTED] into slave/usr/[EMAIL PROTECTED]
received 108KB stream in 1 seconds (108KB/sec)local fs slave/usr/src does not have fromsnap (backup-20090624-0200 in stream); must have been deleted locally; ignoring
Exporting pool Read from remote host 192.168.1.10: Operation timed out Connection to 192.168.1.10 closed.I think it has something to do with the clones. Between this backup one the previous, I did
zfs rename tank/usr/src tank/usr/src_old zfs clone tank/usr/[EMAIL PROTECTED] tank/usr/srcThen an svn update and build{world,kernel} and install, and post- install, run a backup and get the panic.
Unread portion of the kernel message buffer:
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x288
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff805ad7d9
stack pointer = 0x28:0xffffff803e9557e0
frame pointer = 0x28:0xffffff803e955840
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 1278 (zpool)
Locked vnodes (neat, I took a picture with a camera, didn't realize
they'd end up here! Also, no clue if it helps at all):
0xffffff000280f3b0: tag zfs, type VDIR
usecount 1, writecount 0, refcount 2 mountedhere 0
flags ()
v_object 0xffffff0012a68510 ref 0 pages 0
lock type zfs: SHARED (count 1)
0xffffff0002ded000: tag zfs, type VDIR
usecount 1, writecount 0, refcount 2 mountedhere 0
flags ()
lock type zfs: SHARED (count 1)
0xffffff002ec5c000: tag zfs, type VDIR
usecount 1, writecount 0, refcount 1 mountedhere 0xffffff002ec3b5e0
flags ()
lock type zfs: EXCL by thread 0xffffff002ec14ab0 (pid 1278)
0xffffff002ecae588: tag zfs, type VDIR
usecount 0, writecount 0, refcount 1 mountedhere 0
flags (VI_DOOMED)
VI_LOCKed lock type zfs: EXCL by thread 0xffffff002ec14ab0 (pid
1278)
0xffffff000280f3b0: tag zfs, type VDIR
usecount 1, writecount 0, refcount 2 mountedhere 0
flags ()
v_object 0xffffff0012a68510 ref 0 pages 0
lock type zfs: SHARED (count 1)
0xffffff0002ded000: tag zfs, type VDIR
usecount 1, writecount 0, refcount 2 mountedhere 0
flags ()
lock type zfs: SHARED (count 1)
0xffffff002ec5c000: tag zfs, type VDIR usecount 1, writecount 0,
refcount 1 mountedhere 0xffffff002ec3b5e0
flags ()
lock type zfs: EXCL by thread 0xffffff002ec14ab0 (pid 1278)
0xffffff002ecae588: tag zfs, type VDIR
usecount 0, writecount 0, refcount 1 mountedhere 0
flags (VI_DOOMED)
VI_LOCKed lock type zfs: EXCL by thread 0xffffff002ec14ab0 (pid
1278)
panic: from debugger cpuid = 0 Uptime: 8m39s Physical memory: 2027 MB ...#10 0xffffffff80879647 in calltrap () at /usr/src/sys/amd64/amd64/ exception.S:223 #11 0xffffffff805ad7d9 in _sx_xlock_hard (sx=0xffffff002ea63480, tid=18446742974982343344, opts=Variable "opts" is not available.
)
at /usr/src/sys/kern/kern_sx.c:575
#12 0xffffffff805ae14e in _sx_xlock (sx=Variable "sx" is not available.
) at sx.h:155
#13 0xffffffff80fe19e5 in zfs_freebsd_reclaim () from /boot/kernel/
zfs.ko
#14 0xffffffff808da87a in VOP_RECLAIM_APV (vop=0xffffff002ea63468,
a=0xffffff002ea63480) at vnode_if.c:1926
#15 0xffffffff80630e7e in vgonel (vp=0xffffff002ecae588) at vnode_if.h:
830
#16 0xffffffff80635438 in vflush (mp=0xffffff002ec3b5e0, rootrefs=0,
flags=0, td=0xffffff002ec14ab0)
at /usr/src/sys/kern/vfs_subr.c:2450
#17 0xffffffff80fdc3f8 in zfs_umount () from /boot/kernel/zfs.ko
#18 0xffffffff8062e11a in dounmount (mp=0xffffff002ec3b5e0,
flags=784578016, td=Variable "td" is not available.
)
at /usr/src/sys/kern/vfs_mount.c:1287
#19 0xffffffff8062e885 in unmount (td=0xffffff002ec14ab0,
uap=0xffffff803e955c00)
at /usr/src/sys/kern/vfs_mount.c:1172
#20 0xffffffff80892edf in syscall (frame=0xffffff803e955c90) at /usr/
src/sys/amd64/amd64/trap.c:984
#21 0xffffffff808798d0 in Xfast_syscall () at /usr/src/sys/amd64/amd64/
exception.S:364
#22 0x000000080104e06c in ?? () Previous frame inner to this frame (corrupt stack?) Regards, Thomas _______________________________________________ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
- "New" ZFS crash on FS (pool?) unmount/export Thomas Backman
- Re: "New" ZFS crash on FS (pool?) unmount/export Kip Macy
- Re: "New" ZFS crash on FS (pool?) unmount/export Thomas Backman <=