Re: 5.3-RELEASE TODO

From: Kris Kennaway (kris_at_FreeBSD.org)
Date: 09/15/04

  • Next message: TooManySecrets: "Supported Realtek? / pccard support"
    Date: Wed, 15 Sep 2004 06:39:52 +0000
    To: Ken Smith <kensmith@cse.Buffalo.EDU>
    
    

    On Thu, Sep 02, 2004 at 11:59:47AM -0400, Ken Smith wrote:
    > On Thu, Jul 15, 2004 at 03:04:47PM -0700, Kris Kennaway wrote:
    >
    > > These are the bugs I'm currently tracking (those I can remember right
    > > now, at least)

    All of these issues except for the last one seem to be resolved for me
    now. I haven't tested the last one (memory tuning on 4GB machines)
    because I have tuned my kernel configs to avoid the problem, but I can
    remove those changes and see if the problems persist.

    I am now seeing a couple of other problems:

    * softupdates stack overflow (previously reported; I've now hit this
    on two machines). I might be able to hack around it by increasing
    KSTACK_PAGES, but that doesn't help others. phk could not think of
    any way to fix the unboundedness of the dependency chains, and kirk
    replied saying he's on vacation.

    * I had an apparent scheduler hang tonight (4BSD): the only process
    that is running has a trace including sched_switch, and nothing else
    apart from the idle tasks is running or runnable. I'll try to post
    more details tomorrow.

    * There may be a problem with swapping: I had an extremely weird
    sequence of errors (binaries aborting, spurious "missing
    /libexec/ld-elf.so.1") on pointyhat at around the time it started
    swapping. I don't know if swapping was the cause or another symptom
    of some other problem. I'll try to reproduce on another machine.

    * I was able to break to KDB a few times on pointyhat to try and
    diagnose this problem, but eventually it hung trying to enter KDB.
    This happens with fairly high frequency (on SMP machines?)

    I think there are some other bugs I'm forgetting right now.

    Kris

    > > * SMP is unusable for me because of the following frequent panic
    > > (actually a panic and another kernel printf interleaved). Here is the
    > > untangled version:
    > >
    > > panic: APIC: Previous IPI is stu c k
    > > p m a
    > > _ l a z y f i x : s p
    > > u c p u i d = 0 ;
    > > n f o r 5 0 0 0 0 0 0 0
    > > c D e b u g g e r ( " p a n i
    > >
    > > jhb says:
    > >
    > > > Seems the two CPUs are deadlocked waiting on each other. The first sent a
    > > > pmap_lazyfixup IPI to the second but the second has interrupts disabled as it
    > > > is trying to send an IPI as well.
    > >
    > > He suggested a patch, but it did not fix the problem.
    >
    > Was this fixed with the IPI patches done before BETA2?
    >
    > > * linprocfs
    > >
    > > Fatal trap 12: page fault while in kernel mode
    > > cpuid = 0; apic id = 00
    > > fault virtual address = 0x8
    > > fault code = supervisor read, page not present
    > > instruction pointer = 0x8:0xc04e1870
    > > stack pointer = 0x10:0xf11e6b50
    > > frame pointer = 0x10:0xf11e6b6c
    > > code segment = base 0x0, limit 0xfffff, type 0x1b
    > > = DPL 0, pres 1, def32 1, gran 1
    > > processor eflags = interrupt enabled, resume, IOPL = 0
    > > current process = 23938 (mtree)
    > > kernel: type 12 trap, code=0
    > > Stopped at pfs_getattr+0x130: movl 0x8(%eax),%eax
    > > db> trace
    > > pfs_getattr(f11e6b78,c06fda00,cf397b2c,f11e6b98,d23e8a80) at pfs_getattr+0x130
    > > vn_stat(cf397b2c,f11e6c80,d23e8a80,0,c5eb0c60) at vn_stat+0x4f
    > > lstat(c5eb0c60,f11e6d14,2,2,297) at lstat+0x6a
    > > syscall(2f,2f,2f,805a200,805a248) at syscall+0x217
    > > Xint0x80_syscall() at Xint0x80_syscall+0x1f
    > > --- syscall (190, FreeBSD ELF32, lstat), eip = 0x280ac664, esp = 0xbfbf7594, ebp = 0xbfbf7620 ---
    > >
    > > dosirak# addr2line -e kernel.debug 0xc04e1870
    > > /usr/src/sys/i386/compile/DOSIRAK/../../../fs/pseudofs/pseudofs_vnops.c:200
    > >
    > > [...]
    > > if (pvd->pvd_pid != NO_PID) {
    > > if ((proc = pfind(pvd->pvd_pid)) == NULL)
    > > PFS_RETURN (ENOENT);
    > > --> vap->va_uid = proc->p_ucred->cr_ruid;
    > >
    > > rwatson has a patch that works around this particular null pointer
    > > deref, but the underlying cause is not addressed.
    >
    > A patch to pseudofs_vnops.c was made that checks to make sure what pfind()
    > returned was "usable". Did that solve this problem? Looks like that
    > patch went in after you reported this because it's immediately above
    > line 200 you show above.
    >
    > > * ULE has lots of problems (poor performance on HTT, unable to disable
    > > HTT, incorrect load average reporting on SMP machines, ...). Should
    > > be turned off until an active maintainer is found.
    >
    > re@ is discussing this now, it looks likely we will shift to 4BSD soon.
    >
    > > * ---
    > > Fatal trap 12: page fault while in kernel mode
    > > fault virtual address = 0x104
    > > fault code = supervisor read, page not present
    > > instruction pointer = 0x8:0xc058a8cf
    > > stack pointer = 0x10:0xdcb34cc4
    > > frame pointer = 0x10:0xdcb34cec
    > > code segment = base 0x0, limit 0xfffff, type 0x1b
    > > = DPL 0, pres 1, def32 1, gran 1
    > > processor eflags = resume, IOPL = 0
    > > current process = 50 (schedcpu)
    > > trap number = 12
    > > panic: page fault
    > >
    > > syncing disks, buffers remaining... panic: mi_switch: switch in a critical section
    > >
    > > addr2line says the panic was in kern/sched_4bsd.c:327
    > >
    > > /*
    > > * The kse slptimes are not touched in wakeup
    > > * because the thread may not HAVE a KSE.
    > > */
    > > if (ke->ke_state == KES_ONRUNQ) {
    > > awake = 1;
    > > ke->ke_flags &= ~KEF_DIDRUN;
    > > ---> } else if ((ke->ke_state == KES_THREAD) &&
    > > (TD_IS_RUNNING(ke->ke_thread))) {
    > > awake = 1;
    > >
    > > gdb -k got confused and couldn't make anything out of the backtrace.
    >
    > The code you quote above hasn't changed recently but a few kse related
    > fixes have gone in recently if I recall correctly. Is this one still
    > biting you?
    >
    > > * Machines with 4GB RAM do not auto-tune kernel memory parameters
    > > optimally and easily panic under load with a panic message that does
    > > not at least give instructions on what may be wrong and how to fix it.
    >
    > Work was done on that recently-ish, do you know off hand if that fixed
    > what you were seeing?
    >
    > Thanks...
    >
    > --
    > Ken Smith
    > - From there to here, from here to | kensmith@cse.buffalo.edu
    > there, funny things are everywhere. |
    > - Theodore Geisel |

    -- 
    --
    In God we Trust -- all others must submit an X.509 certificate.
        -- Charles Forsythe <forsythe@alum.mit.edu>
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
    

  • Next message: TooManySecrets: "Supported Realtek? / pccard support"

    Relevant Pages


    Loading