Re: Logical volume management

From: Eric Anderson (anderson_at_centtech.com)
Date: 11/18/05

  • Next message: Brian Candler: "Re: Logical volume management"
    Date: Fri, 18 Nov 2005 06:39:09 -0600
    To: Brian Candler <B.Candler@pobox.com>
    
    

    Brian Candler wrote:
    > Vinum's manpage makes my head spin. I was wondering if anyone had considered
    > implementing something a bit more straightforward and also more dynamic.
    >
    > Suppose you:
    >
    > (1) Divide all your disks up-front into equal sized chunks, say 4MB.
    >
    > (2) Use an indirection table to map each volume into an arbitary set of
    > these chunks across all available disks.
    >
    > (3) Store the indirection table at the end of a partition, as other GEOM
    > modules do for their metadata, and cache it in RAM.
    >
    > (e.g. a 160GB drive, divided into 4MB blocks, each of which has a 32-bit
    > indirection table entry, would require only 160KB of indirection table)
    >
    > Why do this?
    >
    > - You can install a system with minimal /, /usr, /var and /home, and then
    > grow each one in small increments as needed just by adding spare chunks.
    > With vinum you would end up with an increasingly complex configuration with
    > more and more subdisks, since each subdisk must be a contiguous range of a
    > physical drive. If you decide to get rid of a volume, then you need to keep
    > track of those subdisk fragments. I'm not sure if it's possible to take an
    > unused subdisk and split it so you can assign part of the free space to
    > another volume. Even if you can, this still means more subdisk
    > fragmentation.
    >
    > With the above scheme an unused volume just returns its chunks into the pool
    > for reallocation.
    >
    > - You can identify 'hot' chunks and move them between disks. This is a lot
    > more flexible than fixed striping. Unlike striping, it could distribute load
    > between unevenly matched devices (e.g. 10GB on one disk and 20GB on
    > another). It could also migrate 'hot' data to faster devices, such as a
    > battery-backed RAM disk[*]. With the right tools, this could all happen
    > automatically.
    >
    > - Mapping volumes in fixed chunks in this way lends itself well to
    > visualisation, e.g. all chunks belonging to the same volume can be shown as
    > blocks in the same colour.
    >
    > - What I'm suggesting may or may not look like Linux's LVM; I've never used
    > that. If its data structure is suitable, we can just use that and gain some
    > compatibility for multi-boot systems.
    >
    > I guess you could work this way in vinum, dividing all your storage up front
    > into 4MB subdisks, but it doesn't sound like fun to me.
    >
    > I also guess there's a lot of devil-in-the-details to do with marking a
    > volume as 'up' or 'down'; but hopefully mirroring and RAID could be
    > delegated to other GEOM modules, leaving us just with logical
    > {volume,extent} to physical {drive,extent} mapping to do.
    >
    > Has something like this been proposed, discussed and/or discarded before?

    I've been sketching out nearly the exact same thing over the past few
    weeks! My goals were to come up with a way to utilize block devices in
    a very pliable way, that allows growing volumes, adding more block
    storage to a pool, etc, like you've mentioned above.

    One of the issues I was hoping to solve, is the "can't grow a stripe
    onto more disks" kind of thing. I started coming up with a featureset
    for a new volume manager, using GEOM as the base. Some of them were:

    - ability to grow volumes online
    - volume migration (online)
    - volume snapshots (online)
    - block pooling (to allow adding more blocks from a disk to the pool)
    - auto block allocation (assigning blocks from the pool to a volume as
    needed)
    - auto block promotion (moving most frequently used blocks to faster
    block storage devices, and/or auto mirroring those blocks on many
    devices for increased speed)

    It would be nice to be able to create an arbitrarily large volume, which
      only uses these volume blocks (you call them chunks) as the volume
    gets filled. This way, you could create a 2Tb volume, with only a
    single 200Gb drive, then as you neared the 200Gb used mark, you could
    add another disk, and grow on to it, or even add 5 disks, and it could
    stripe the data across them, or mirror, etc. You could also migrate
    volume blocks from one device to others, or have the volume manager
    automatically move the MFU (most frequently used) blocks to multiple
    volume block providers for striping+mirroring to gain extra performance.

    Maybe we should take this to freebsd-geom@?

    Eric

    -- 
    ------------------------------------------------------------------------
    Eric Anderson        Sr. Systems Administrator        Centaur Technology
    Anything that works is better than anything that doesn't.
    ------------------------------------------------------------------------
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
    

  • Next message: Brian Candler: "Re: Logical volume management"

    Relevant Pages

    • Logical volume management
      ... Divide all your disks up-front into equal sized chunks, ... Store the indirection table at the end of a partition, ... track of those subdisk fragments. ...
      (freebsd-current)
    • IDS disk layout & LVM
      ... informix chunk volumes.IDS use cooked files for chunks and all ... the chunk files residing on the 146gb disks. ... put physical log, index dbspace, logical log and data dbspace(there ...
      (comp.databases.informix)
    • Re: long checkpoints
      ... of the volumes on the first two disks are mirrored. ... So all chunks are on just two devices. ... parallel I/O requests to different files/chunks on the same disk, ... make sure that chunk and mirror are on separate physical disks ...
      (comp.databases.informix)
    • Re: Intel abandons USEnet news
      ... chunks of 12+1 or 13+1 disks in classic RAID, plus a couple of shared hot standby disks to step in if/when any of the chunks lose their redundancy, would be sufficient for four nines availability over a year or two, even when using plain SATA disks. ... This suggests that the probability of encountering a second whole-disk failure during a 5-hour rebuild of a failed disk would be 1/240,000. ...
      (comp.arch)

    Loading