Filesystem snapshots dog slow
- From: Jeremy Chadwick <koitsu@xxxxxxxxxxx>
- Date: Tue, 16 Oct 2007 04:30:46 -0700
Since the snapshot code (e.g. mksnap_ffs(8) and friends) was introduced,
dump(8) was modified to nag you if you didn't use the -L argument. "Um,
okay, I'd better use -L" is what came out of my mouth, and I'm sure a
lot of other administrators' when they saw this message.
But it seems the making a snapshot is an incredibly slow/intensive task.
The documentation I've read indicates that making a snapshot "is
incredibly fast" -- based on my experiences, it isn't. At least it's no
where near as fast as, say, a Netapp filer.
I've found 3 threads (dating 2003, 2005, and 2007) about this problem:
http://lists.freebsd.org/pipermail/freebsd-current/2003-August/009135.html
http://lists.freebsd.org/pipermail/freebsd-fs/2005-July/001216.html
http://lists.freebsd.org/pipermail/freebsd-stable/2007-January/031882.html
This issue is still present on RELENG_7, and I can confirm it on
multiple machines (some running *completely* different hardware than
others).
osiris# df -ki /disk2
Filesystem 1024-blocks Used Avail Capacity iused ifree %iused Mounted on
/dev/ad6s1d 236511738 4 217590796 0% 2 30570492 0% /disk2
osiris# time mksnap_ffs /disk2 /disk2/mysnapshot
0.000u 1.012s 5:12.23 0.3% 5+1149k 7803+18819io 0pf+0w
While mksnap_ffs runs, the process remains in wdrain state. gstat(8)
shows immense disk I/O. ms/r occasionally jumps up to 1100 or higher,
but usually hovers around 40-60.
osiris# gstat -I500ms -f'ad6'
dT: 0.501s w: 0.500s filter: ad6
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
2 80 52 830 38.6 28 447 22.4 100.2| ad6
2 80 52 830 38.6 28 447 22.4 100.2| ad6s1
0 0 0 0 0.0 0 0 0.0 0.0| ad6s1c
2 80 52 830 38.6 28 447 22.4 100.2| ad6s1d
Now for snapshot removal:
osiris# time rm /disk2/mysnapshot
override r--r----- root/operator snapshot for /disk2/mysnapshot? y
0.000u 0.285s 1:58.03 0.2% 16+1161k 7456+7456io 0pf+0w
While rm runs, the process remains in biord state.
During either of these operations, the system can occasionally go into a
"stalled" state, where any disk operations remain deadlocked until the
mksnap_ffs or rm are finished.
I ran a second mksnap_ffs "just to see" what happened. Between the
first time and this time, *nothing* happened on the filesystem (no disk
reads or writes AFAIK):
osiris# time mksnap_ffs /disk2 /disk2/mysnapshot
0.016u 1.352s 10:13.73 0.2% 5+1164k 14501+27931io 0pf+0w
The time doubled. This isn't good.
Disks are getting larger, filesystems growing, people storing more data.
Hitachi, for example, has guaranteed 4TB disks by the end of 2011. If
this problem has sat idle for at least 4 years already, we'll be in a
lot of trouble come 2011. And let's not forget that every piece of
FreeBSD documentation tells admins to "use dump, it's the best!". This
issue is a good reason to consider using tools like rsync or tar
instead. :-(
I will gladly work with anyone who wishes to tackle this, either by
providing hardware (MB/disks/etc.) for free, or by giving the individual
access to a box that has serial console + a serial debugger available.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
_______________________________________________
freebsd-hackers@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@xxxxxxxxxxx"
- Follow-Ups:
- Re: Filesystem snapshots dog slow
- From: Eric Anderson
- Re: Filesystem snapshots dog slow
- Prev by Date: Re: amrd disk performance drop after running under high load
- Next by Date: Re: Filesystem snapshots dog slow
- Previous by thread: Inner workings of turnstiles and sleepqueues
- Next by thread: Re: Filesystem snapshots dog slow
- Index(es):
Relevant Pages
- Re: running mksnap_ffs
... I got the following Filesystem: ... Filesystem Size Used Avail Capacity iused
ifree %iused ... The disk was not released, ... Our experience with a semi full
1.3T volume is that the snapshot ... (freebsd-stable) - Re: SNAP & SAN
... If you start your snapshot (starting with the first disk), ...
writes a checkpoint so that all data will be on disk and the ... the external recovery
should be able to restore the database ... (comp.databases.informix) - Re: Filesystem snapshots dog slow
... But it seems the making a snapshot is an incredibly slow/intensive task. ...
The UFS2 snapshot and the WAFL snapshot are *completely* different, and should not be compared
in this way. ... Essentially, your disk is hammered making copies of all the cylinder groups,
skipping those that are 'busy', and coming back to them later. ... providing hardware for
free, ... (freebsd-hackers) - RE: Can Disk mirroring/db snapshots replace the usage oftaking database backups?
... IF anything goes wrong during the time the snapshot is taken.... ... they truly
support informix, they should be able to answer your concerns. ... Can Disk mirroring/db
snapshots replace the usage oftaking ... I do not know yet how the snapshot agent triggers
Informix to flush ... (comp.databases.informix) - filesystem snapshot question
... > every disk write happen twice? ... > filesystem and once to the
snapshot with old data? ... If a block is part of a snapshot, ... block on the disk
is allocated to hold the updated data. ... (freebsd-current)