Re: 80386 support in -current

From: Robert Watson (rwatson_at_freebsd.org)
Date: 01/26/04

  • Next message: Tinderbox: "[current tinderbox] failure on sparc64/sparc64"
    Date: Sun, 25 Jan 2004 21:34:13 -0500 (EST)
    To: Peter Jeremy <peterjeremy@optushome.com.au>
    
    

    On Mon, 26 Jan 2004, Peter Jeremy wrote:

    > >This last point is the clincher. The chip does NOT have enough "umphf". I
    > >actually managed to boot a -current (from back then) on a 80386SX and it
    > >was torturously slow. An ls(1) on my empty home directory took 15 seconds.
    > >My VAX is faster.
    >
    > This is a bug in FreeBSD 5.x - the performance in general has degraded
    > since 4.x. Performance degradation is often more obvious in lower end
    > machines.

    There are some areas where performance is improved, and several important
    areas where it's worse. I'd encourage all FreeBSD developers to look at
    areas where it's worse and fix things :-). That said, I know there's a
    fair amount of work going on relating to performance optimization, and
    hopefully we'll start to see some of those results in the near future.
    FWIW, I actually measure a pretty dramatic improvement in network
    benchmarks on 5.x relative to 4.x in the SMP case through increased
    parallelism and asynchrony. The areas I'm aware of that require
    particular attention at this point include:

    - Improving interrupt latency. We've moved to ithreads, but haven't spent
      enough time optimizing the performance of our ithread implementation.
      Bosko did a sample i386 implementation of light weight context switches
      last year, but at that time we didn't have enough device driver locking
      to take advantage of it. We're now in much better shape locking-wise,
      with a lot more just around the corner, so we need to focus on interrupt
      latency. We held a conference call a few days ago to get some of the
      interested parties together (Bosko, Jeff, et al), and it looks like
      Peter Wemm has foolishly signed up to update/re-implement on a recent
      5.x. Use of the IO APIC is necessary for SMP systems, but also provides
      a fair amount of additional overhead. In some recent uniprocessor
      benchmarking, I saw an observable overhead for using 'device apic' -- it
      could be we want to back off the use of device apic on these systems.

    - General optimization of locking. We've put in a fair number of locks,
      and pushed Giant off some of the interesting paths (i.e., pipe
      locking). We now need to look at lock granularity. I recently
      committed some changes to our mutex profiling code to measure lock
      contention. I suspect we're not seeing a lot of contention, with the
      exception of Giant, and so we might actually want to look at reducing
      the number of locks using mutex pools (where possible) to lower memory
      overhead. We have a number of tools here that can help us, and now
      things are maturing locking wise, we should use them. We are also
      likely pretty close to pushing Giant further off a number of pieces of
      process-related code, which should help quite a bit with things like
      large builds.

    - Get the socket locking into the tree. Large parts of the network stack
      can now run Giant-free, and there are substantial outstanding patches
      for a lot more. Cleanup is required, but hopefully we'll see some
      patches posted for testing soon. There are some areas of the network
      stack that require substantial further attention -- for example, the
      KAME code requires additional locking work to run Gaint-free.

    - Reduce the overhead of in-kernel thread context switching. We do more
      context switching than we used to, not just because of ithreads, but
      also because we have used threads to increase asynchrony and serialize
      work queues.

    - Reduce the cost of lock operations. There have been some suggestions
      that our current mutexes consume more memory than necessary in
      non-debugging cases, and also are more expensive than necessary in some
      cases.

    - Explore additional use of the UMA slab allocator. In particular, see
      whether using it can help improve performance with System V IPC, where
      currently the implementation does its own memory caching and handling.
      There have also been some proposals to increase use of UMA in the
      network stack, use it further for sockets, etc. I know there has also
      been some experimentation with using UMA to replace the current mbuf
      allocator.

    - Trim unneeded fields from a number of kernel structures. As KSE went
      in, struct proc was broken out into a number of pieces. In some cases,
      variables lived on in multiple structures, and can now be cleaned out.
      Likewise in other kernel data structures.

    - Take better advantage of CPU class optimizations. There has been some
      discussion of providing HAL modules for the kernel, and libraries for
      userspace, based on the CPU type to improve performance. I.e.,
      optimized mutex, memory zeroing, context switching, et al. Right now we
      do a fairly poor job at picking up these optimizations, and carry around
      a lot of memory overhead to support a large set. We need to do a better
      job where possible -- we should really see the results if we're able to
      optimize code such as the crypto code for specific CPUs.

    Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
    robert@fledge.watson.org Senior Research Scientist, McAfee Research

    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Tinderbox: "[current tinderbox] failure on sparc64/sparc64"

    Relevant Pages