Re: Hangs with UFS2 snapshots

From: Kris Kennaway (kkenn_at_xor.obsecurity.org)
Date: 06/18/05


Date: Fri, 17 Jun 2005 18:29:43 -0400

On 2005-06-17, Skylar Thompson <skylar@os2.dhs.org> wrote:
> On Mon, 6 Jun 2005 09:56:53 +0200, Sime Mikecin <simemaknime@logos.hr> wrote:
>>
>> "Skylar Thompson" <skylar@os2.dhs.org> wrote in message
>> news:slrnda725g.j8i.skylar@amayatra.os2.dhs.org...
>>> I'm running FreeBSD 5.4-RELEASE on a dual processor P-III, with 512MB RAM
>>> and a Mylex AcceleRAID controller. I'm trying to do live filesystem
>>> backups to a hot-spare system with UFS2 snapshots. I create the snapshots
>>> with mksnap_ffs, mount them, and then rsync the data over to the hot spare
>>> over NFS. I can very reliably cause the system to hang on disk requests to
>>> certain filesystems, requiring a reboot. I can also get this to happen
>> with
>>> dump's "-L" option, but have yet to experience it with background fscks.
>>> Has anyone experienced this, or know of a fix?
>>
>> Haven't experienced that. Could you give us details how to repeat (cause the
>> system to hang)?
>
> I had the system wired up to take a snapshot once an hour of our
> filesystems (/, /usr, /var, and /clients). The system would fairly
> consistently (every day or every other day) lock up. According to our
> Nagios server, this happened either at 3AM when we do a full dump of our
> Postgres database (at several gigabytes, it's fairly I/O intensive), or at
> 4AM when we do our daily dump of the filesystem (again, fairly I/O
> intensive and generates another snapshot). This system happens to be an old
> beige box with a Mylex AcceleRAID controller, but I can reproduce the same
> crash (albeit less consistently) on a new Dell Poweredge 2650 with a
> PERC3 RAID controller. The symptoms would be that the system locked up
> hard, and would frequently lock-up again during the background fsck unless
> I booted into single-user mode and removed the snapshots manually.

In order to proceed with this you'd need to set up DDB and see if you
can break to it when the deadlock occurs (see the developers'
handbook), then identify the process(es) that have deadlocked and
obtain tracebacks, and the output of 'show lockedvnods'. Post this
information to the fs@freebsd.org mailing list and hope that a
developer is available to help you.

Kris



Relevant Pages

  • RE: Snapshot Help
    ... distribution database portion of snapshot generation processing. ... involved in the deadlock (the snapshot agent is obviously a participant) ... Is it possible to schedule the processing of the other connections to run ...
    (microsoft.public.sqlserver.replication)
  • Re: [PATCH 3/3] Add timeout feature
    ... do the snapshot op. ... run_frozen -t timeout mountpoint do-snapshot ... Userland apps can be swapped out and need kernel memory allocations ... even trivial applications can deadlock. ...
    (Linux-Kernel)
  • Re: matching special characters
    ... website, it is an alpha version available only as a snapshot and ... recommended only for developers. ...
    (comp.editors)
  • [git pull] 2.6.28-rc2 device-mapper fixes
    ... dm snapshot: fix register_snapshot deadlock ... wait for chunks in destructor ...
    (Linux-Kernel)