Re: System deadlock when using mksnap_ffs



On Wed, Nov 12, 2008 at 08:42:00PM -0800, Jeremy Chadwick wrote:
On Thu, Nov 13, 2008 at 12:41:02AM +0000, Tim Bishop wrote:
On Wed, Nov 12, 2008 at 09:47:35PM +0200, Kostik Belousov wrote:
On Wed, Nov 12, 2008 at 05:58:26PM +0000, Tim Bishop wrote:
I've been playing around with snapshots lately but I've got a problem on
one of my servers running 7-STABLE amd64:

FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon Nov 10 20:49:51 GMT 2008 tdb@paladin:/usr/obj/usr/src/sys/PALADIN amd64

I run the mksnap_ffs command to take the snapshot and some time later
the system completely freezes up:

paladin# cd /u2/.snap/
paladin# mksnap_ffs /u2 test.1

It only happens on this one filesystem, though, which might be to do
with its size. It's not over the 2TB marker, but it's pretty close. It's
also backed by a hardware RAID system, although a smaller filesystem on
the same RAID has no issues.

Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/da0s1a 2078881084 921821396 990749202 48% /u2

To clarify "completely freezes up": unresponsive to all services over
the network, except ping. On the console I can switch between the ttys,
but none of them respond. The only way out is to hit the reset button.

You need to provide information described in the
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
and especially
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

Ok, I've done that, and removed the patch that seemed to fix things.

The first thing I notice after doing this on the console is that I can
still ctrl+t the process:

load: 0.14 cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k

But the top and ps I left running on other ttys have all stopped
responding.

Then in my book, the patch didn't fix anything. :-) The system is
still "deadlocking"; snapshot generation **should not** wedge the system
hard like this.
You systematically mix two completely different issues:
- first one is the _deadlock_ experienced by Tim;
- second one is the slowdown during snapshot creation.
In fact, I may count third, where dump itself hangs, as a usermode process,
but kernel still normally operates.

Patch posted should fix or paper over the first issue for practical means.
Third issue most likely fixed by the subr_sleepqueue race fix.

Attachment: pgpCahuqY8QRV.pgp
Description: PGP signature



Relevant Pages

  • Re: System deadlock when using mksnap_ffs
    ... paladin# mksnap_ffs /u2 test.1 ... To clarify "completely freezes up": ... Ok, I've done that, and removed the patch that seemed to fix things. ...
    (freebsd-stable)
  • Re: System deadlock when using mksnap_ffs
    ... paladin# mksnap_ffs /u2 test.1 ... To clarify "completely freezes up": ... Ok, I've done that, and removed the patch that seemed to fix things. ... But the top and ps I left running on other ttys have all stopped ...
    (freebsd-stable)
  • Re: BCP error during DB Sync on Filtered Columns
    ... You should recreate and redeploy your snapshot to fix this. ... > I then took a backup of the database and restore it at the subscriber. ... > not directly creating a format file for it to use. ...
    (microsoft.public.sqlserver.replication)
  • Still probs w/ Kb891711
    ... I spent wks trying to figure out how to fix it. ... I went to reboot, and the PC ... When it freezes online, IE 6.0, I ... else still having prob's after deleting the Critical Update? ...
    (microsoft.public.win2000.windows_update)
  • Re: Still probs w/ Kb891711
    ... I spent wks trying to figure out how to fix it. ... When it freezes online, IE 6.0, I have to reboot. ... I want to know if other's are having problems after deleting the update from "Add/Remove". ...
    (microsoft.public.win2000.windows_update)