flushing disk buffer cache

From: Siddharth Aggarwal (saggarwa_at_cs.utah.edu)
Date: 10/29/04

  • Next message: Mikhail P.: "Re: ad0: FAILURE - WRITE_DMA"
    Date: Fri, 29 Oct 2004 10:49:24 -0600 (MDT)
    To: freebsd-hackers@freebsd.org
    
    

    Hi,

    I am writing this pseudo disk driver for disk checkpointing, which
    intercepts write requests to the disk (ad0s1) and performs a copy on write
    of the old contents to another partition (ad0s4) before writing out the
    new contents. So the driver (called shd) is mounted as

    /dev/shd0a on /
    /dev/shd0f on /usr

    So each time the user creates a new checkpoint (basically initialize new
    data structures in memory for a new checkpoint), right before that inside
    the driver, I explicitly do a sync() to flush out the disk buffer cache,
    so that disk state is consistent when the checkpoint was taken.

    Then, I have hacked the reboot system call to revert to a previous
    checkpoint after unmounting all the filesystems but before halting the
    system. This revert basically involves copying some blocks from ad0s4 to
    ad0s1.

    However, when the system reboots, fsck shows up inconsistencies in the
    filesystem and so fsck needs to be run manually.

    So I suspect that the reason for this problem is that when a checkpoint is
    taken, the filesystem on ad0s1 is active and more write operations are
    coming in i.e. filesystem on ad0s1 is still dirty. Hence I explicitly
    called sync() before returning from the checkpoint command but I think
    sync() doesnt guarantee that everything was actually flushed out. So I
    implemented a more mandatory way of syncing, i.e. just got part of the
    code from boot() system call. The code is as below, and it is called
    whenever a checkpoint command is fired.

    Does anyone think if this is the right way of flushing the cache? Is there
    anything I can do to ensure the filesystem is consistent during reboot?
    I don't think this is a problem in the driver code, because when I created
    a new filesystem on ad0s3 and shadowed that using the driver, everything
    ran perfectly fine, but the difference was that I could unmount the
    filesystem before "restoring the checkpoint" and hence wasnt necessary to
    do it during reboot time.

    void sync_before_checkpoint (void)
    {
        register struct buf *bp;
        int iter, nbusy, pbusy;

        waittime = 0;
        sync(&proc0, NULL);

                    /*
                     * With soft updates, some buffers that are
                     * written will be remarked as dirty until other
                     * buffers are written.
                     */

        for (iter = pbusy = 0; iter < 20; iter++) {
            nbusy = 0;
            for (bp = &buf[nbuf]; --bp >= buf; ) {
                    if ((bp->b_flags & B_INVAL) == 0 &&
                        BUF_REFCNT(bp) > 0) {
                            nbusy++;
                    } else if ((bp->b_flags & (B_DELWRI | B_INVAL))
                                    == B_DELWRI) {
                            /* bawrite(bp);*/
                            nbusy++;
                    }
            }
            if (nbusy == 0)
                    break;
            printf("%d ", nbusy);
            if (nbusy < pbusy)
                    iter = 0;
            pbusy = nbusy;
            if (iter > 5 && bioops.io_sync)
                    (*bioops.io_sync)(NULL);
            sync(&proc0, NULL);
            DELAY(50000 * iter);
        }
                    /*
                     * Count only busy local buffers to prevent forcing
                     * a fsck if we're just a client of a wedged NFS server
                     */
        nbusy = 0;
        for (bp = &buf[nbuf]; --bp >= buf; ) {
                    if (((bp->b_flags&B_INVAL) == 0 && BUF_REFCNT(bp)) ||
                        ((bp->b_flags & (B_DELWRI|B_INVAL)) == B_DELWRI)) {
                            if (bp->b_dev == NODEV) {
                                    TAILQ_REMOVE(&mountlist,
                                        bp->b_vp->v_mount, mnt_list);
                                    continue;
                            }
                            nbusy++;
                    }
        }
        if (nbusy) {
                            /*
                             * Failed to sync all blocks. Indicate this and don't
                             * unmount filesystems (thus forcing an fsck on reboot).
                             */
                    printf("giving up on %d buffers\n", nbusy);
                    DELAY(5000000); /* 5 seconds */
        }
    }

    _______________________________________________
    freebsd-hackers@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
    To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"


  • Next message: Mikhail P.: "Re: ad0: FAILURE - WRITE_DMA"

    Relevant Pages

    • Re: DVD-RAM slowness and questions
      ... to know when it's done writing to the disk. ... > with a non-journaling filesystem such as ext2 or FAT? ... you've got the wrong idea about journalling, ... > driver responsibility, then how do I know if Linux is doing it? ...
      (comp.os.linux.hardware)
    • Re: Accessing files on FAT32 partition at boot time
      ... to be notified through a callback when the filesystem is accessible? ... If I re-start the driver after the system has fully booted, ... During boot, only the system disk is available immediately, ... Due to boot optimization, ...
      (microsoft.public.development.device.drivers)
    • Re: Accessing files on FAT32 partition at boot time
      ... to be notified through a callback when the filesystem is accessible? ... ZwCreateFile() is returning STATUS_UNRECOGNIZED_VOLUME. ... If I re-start the driver after the system has fully booted, ... During boot, only the system disk is available immediately, ...
      (microsoft.public.development.device.drivers)
    • Re: Minix uses 4GB HD at most?
      ... but not directly of the Minix filesystem. ... That would give rise to filesystems larger than any current disk. ... The AT driver can completely address disks using LBA48. ... uses 32 bit sector indexes it ...
      (comp.os.minix)
    • Re: flushing disk buffer cache
      ... Is it possible to delay or queue up disk writes until I exit from my ... driver to signal completion of flushes to disk? ... > data structures in memory for a new checkpoint), ... > filesystem and so fsck needs to be run manually. ...
      (freebsd-hackers)