Re: potential re change for 5.3?

From: Scott Long (scottl_at_freebsd.org)
Date: 08/25/04

  • Next message: Matthew Jacob: "Re: potential re change for 5.3?"
    Date: Wed, 25 Aug 2004 15:45:12 -0600
    To: Poul-Henning Kamp <phk@phk.freebsd.dk>
    
    

    [moving this to a more appropriate list]

    Poul-Henning Kamp wrote:

    > In message <412D00BE.5030406@freebsd.org>, Scott Long writes:
    >
    >
    >>>I'm not sure I understand the question in the first place, sorry...
    >>>
    >>
    >>Let me re-interate. You should not sleep in the bio path.
    >>tsleep/msleep/malloc(WAITOK)/etc should not happen because they
    >>put the entire g_down thread to sleep and block further I/O.
    >>Sleep mutexes are kind of a stretch but we seem to be lucky so far.
    >>Contrast that to pre-GEOM days where I/O was dispatched directly
    >
    >>from the process that initiated it, so sleeping wasn't a horrible
    >
    >>thing if done with care.
    >
    >
    > There are many things in this casserole, and the one thing I didn't
    > see in any driver when I started was "sleeping ... done with care".
    >
    > If for instance the process was the kernel trying to free memory
    > resources and you slept, they system wedged solid, nobody checked
    > for that.
    >

    Well, there are certainly many marginal I/O drivers hanging around,
    but the ones that are written with care make sure that they allocate
    their resources up front and don't allow sleeping in either direction.

    >
    >>If there was one thing I could change
    >>about GEOM, it would be to allow direct dispatch up and down.
    >>Don't get me wrong, I'm understand the usefulness of decoupling
    >>the I/O path, especially when it comes to locking, but it does
    >>have some down-sides.
    >
    >
    >>Incidentally, the 'no blocking in the i/o path' thing is why busdma
    >>is the way it is with deferred callbacks. If we didn't have it,
    >>PHK's mutex on GEOM would be triggering all the time under heavy
    >>load with bounce buffers.
    >
    >
    > And presumably busdma would want to sleep in memory(-space) in some
    > shape or form ? Memory which would only become available as other
    > I/O requests were completed freeing up the resources in question ?
    >

    No, the case I'm talking about is the NetBSD behaviour of sleeping in
    bus_dmamap_load() when the bounce pool is empty.

    > The issue isn't solved by allowing sleeping, allowing sleeping would
    > open us to the case where the driver goes to sleep on a large read
    > which needs to allocate some memory resource which is not available,
    > and thereby stalling the subsequent write requests in the queue which
    > upon completion would make that memory resource avaiable.
    >
    > The current situation not ideal, what happens is that the requests
    > faults with ENOMEM in the driver and is retried in GEOM above the
    > geom_disk class, provided the driver uses disksort(), other requests
    > on the queue would get a chance first. (there should actually be a
    > flag which kicked the queue into fifo mode on ENOMEM errors (reset
    > on empty queue)).
    >
    > So while it is not ideal and not optimal, it actually works in the
    > common and the extreeme case (physmem=48m, make -j 12 buildworld
    > for instance).
    >

    We are still at risk of drivers sleeping on mutexes and blocking g_down
    from processing further I/O. This isn't very apparent on a single
    controller/single disk configuration, but toss in multiple controllers
    and you have the real potential for priority inversions in the I/O path.

    > There are many unknowns still, the entire "make bio scatter/gather
    > and map/unmapped" thing for instance throws much of this up in the
    > air again, so I'm not very keen to apply workarounds at this time.
    >
    > The current rules are very restrictive, but they work, once we get
    > further down the road, we may find ways to relax it that does not
    > compromise the desirable features of our I/O path. In the meantime
    > I'd prefer to keep the handcuffs on, until we know for sure what
    > is a better way.
    >
    >

    I'm not suggesting anything different, just making a note of something
    that might be desirable in the future. In a way, I see GEOM as having
    the potential to be like Netgraph where it intercepts operations that it
    wants to process through it's framework and lets ones that it doesn't
    pass directly through without a decoupling through extra kernel threads.
    But that's only one possible strategy. Introducing the concept of a
    I/O scheduler that spawns KSE's to handle individual I/O requests is
    another possibility.

    Scott
    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


  • Next message: Matthew Jacob: "Re: potential re change for 5.3?"

    Relevant Pages

    • [PATCH 19-rc1] Fix typos in /Documentation : U-Z
      ... when the underlying device was capable of handling the i/o in one shot. ... using dev->irq by the device driver to request for interrupt service ... The EMU10K2 chips have a DSP part which can be programmed to support ... -(This acticle does not deal with the overall functionality of the ...
      (Linux-Kernel)
    • Re: [PATCH 19-rc1] Fix typos in /Documentation : U-Z
      ... +iii.Ability to represent large i/os w/o unnecessarily breaking them up (i.e ... when the underlying device was capable of handling the i/o in one shot. ... using dev->irq by the device driver to request for interrupt service ... -(This acticle does not deal with the overall functionality of the ...
      (Linux-Kernel)
    • Re: [PATCH] cgroup: limit block I/O bandwidth
      ... that exceed the limits defined in the control group filesystem. ... we will prevent requests from ... throttling these threads and not the actual processes submitting i/o. ... Obviously the problem doesn't occur if the limited cgroup performs read ...
      (Linux-Kernel)
    • [PATCH 18-rc3] Fix typos in /Documentation : S
      ... Request flows seen by I/O schedulers ... cpufreq-stats is a driver that provices CPU frequency statistics for each CPU. ... +interface will appear in a separate directory under cpufreq ... This drives supports all SMC ISA/MCA adapters. ...
      (Linux-Kernel)
    • linux-2.6.0(-mm2): I/O scheduler weirdness with LVM2
      ... order it receives I/O requests, and it appears that when writing ... this is not due to remapping to different extents. ... the number of sectors per request is 256 in most cases. ...
      (Linux-Kernel)