Re: 5.3-RELEASE TODO

From: Kris Kennaway (kris_at_obsecurity.org)
Date: 07/16/04

  • Next message: Norikatsu Shigemura: "panic at sched_add_internal"
    Date: Thu, 15 Jul 2004 15:04:47 -0700
    To: re@FreeBSD.org
    
    
    

    On Thu, Jul 15, 2004 at 10:24:39AM -0400, Robert Watson wrote:

    > This is a list of open issues that need to be resolved for FreeBSD 5.3. If
    > you have any updates for this list, please e-mail re@FreeBSD.org.
    >
    > Show stopper defects for 5.3-RELEASE

    These are the bugs I'm currently tracking (those I can remember right
    now, at least)

    * SMP is unusable for me because of the following frequent panic
    (actually a panic and another kernel printf interleaved). Here is the
    untangled version:

    panic: APIC: Previous IPI is stu c k
                                    p m a
     _ l a z y f i x : s p
    u c p u i d = 0 ;
     n f o r 5 0 0 0 0 0 0 0
    c D e b u g g e r ( " p a n i

    jhb says:

    > Seems the two CPUs are deadlocked waiting on each other. The first sent a
    > pmap_lazyfixup IPI to the second but the second has interrupts disabled as it
    > is trying to send an IPI as well.

    He suggested a patch, but it did not fix the problem.

    * linprocfs

    Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address = 0x8
    fault code = supervisor read, page not present
    instruction pointer = 0x8:0xc04e1870
    stack pointer = 0x10:0xf11e6b50
    frame pointer = 0x10:0xf11e6b6c
    code segment = base 0x0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, def32 1, gran 1
    processor eflags = interrupt enabled, resume, IOPL = 0
    current process = 23938 (mtree)
    kernel: type 12 trap, code=0
    Stopped at pfs_getattr+0x130: movl 0x8(%eax),%eax
    db> trace
    pfs_getattr(f11e6b78,c06fda00,cf397b2c,f11e6b98,d23e8a80) at pfs_getattr+0x130
    vn_stat(cf397b2c,f11e6c80,d23e8a80,0,c5eb0c60) at vn_stat+0x4f
    lstat(c5eb0c60,f11e6d14,2,2,297) at lstat+0x6a
    syscall(2f,2f,2f,805a200,805a248) at syscall+0x217
    Xint0x80_syscall() at Xint0x80_syscall+0x1f
    --- syscall (190, FreeBSD ELF32, lstat), eip = 0x280ac664, esp = 0xbfbf7594, ebp = 0xbfbf7620 ---

    dosirak# addr2line -e kernel.debug 0xc04e1870
    /usr/src/sys/i386/compile/DOSIRAK/../../../fs/pseudofs/pseudofs_vnops.c:200

    [...]
            if (pvd->pvd_pid != NO_PID) {
                    if ((proc = pfind(pvd->pvd_pid)) == NULL)
                            PFS_RETURN (ENOENT);
    --> vap->va_uid = proc->p_ucred->cr_ruid;

    rwatson has a patch that works around this particular null pointer
    deref, but the underlying cause is not addressed.

    * ULE has lots of problems (poor performance on HTT, unable to disable
    HTT, incorrect load average reporting on SMP machines, ...). Should
    be turned off until an active maintainer is found.

    * Frequent panic at boot time (when starting syslogd?)

    panic: mutex Giant not owned at ../../../kern/vfs_subr.c:1365
    at line 729 in file ../../../kern/kern_mutex.c
    Debugger("panic")
    Stopped at Debugger+0x54: xchgl %ebx,in_Debugger.0
    db> trace
    Debugger(c0766179,c07d1e80,2d9,c0765560,100) at Debugger+0x54
    __panic(c0765560,2d9,c07656c8,c0765803,c076e07d) at __panic+0xf5
    _mtx_assert(c07d09e0,1,c076e07d,555,c68544ec) at _mtx_assert+0x11c
    gbincore(c6889514,0,0,985,c07d5980) at gbincore+0x36
    getblk(c6889514,0,0,800,0) at getblk+0xf8
    breadn(c6889514,0,0,800,0) at breadn+0x52
    bread(c6889514,0,0,800,0) at bread+0x4c
    ffs_blkatoff(c6889514,0,0,0,e0f87998) at ffs_blkatoff+0x105
    ufs_lookup(e0f87a50,e0f87a8c,c05c77e1,e0f87a50,e0f87bc0) at ufs_lookup+0x270
    ufs_vnoperate(e0f87a50,e0f87bc0,e0f87bd4,c076e07d,c61d62a0) at ufs_vnoperate+0x18
    vfs_cache_lookup(e0f87ad0,e0f87aec,c05cca32,e0f87ad0,c61d62a0) at vfs_cache_lookup+0x301
    ufs_vnoperate(e0f87ad0,c61d62a0,0,c61d62a0,c61d62a0) at ufs_vnoperate+0x18
    lookup(e0f87bac,0,c076dac5,a2,c61d62a0) at lookup+0x312
    namei(e0f87bac,c62088b2,d,c62088c0,0) at namei+0x27e
    unp_bind(c6a09000,c62088b0,c61d62a0,e0f87ca0,c05b5e23) at unp_bind+0xb1
    uipc_bind(c6427a50,c62088b0,c61d62a0,e0f87cc8,c05ba0e7) at uipc_bind+0x2b
    sobind(c6427a50,c62088b0,c61d62a0,0,c6427a50) at sobind+0x23
    kern_bind(c61d62a0,3,c62088b0,c62088b0,0) at kern_bind+0x87
    bind(c61d62a0,e0f87d14,c,434,3) at bind+0x43
    syscall(2f,2f,2f,bfbfee10,0) at syscall+0x2a0

    I added a GIANT_REQUIRED to namei() and confirmed that giant is being
    held there, so it's being lost higher up in the stack trace.

    * ---
    Fatal trap 12: page fault while in kernel mode
    fault virtual address = 0x104
    fault code = supervisor read, page not present
    instruction pointer = 0x8:0xc058a8cf
    stack pointer = 0x10:0xdcb34cc4
    frame pointer = 0x10:0xdcb34cec
    code segment = base 0x0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, def32 1, gran 1
    processor eflags = resume, IOPL = 0
    current process = 50 (schedcpu)
    trap number = 12
    panic: page fault

    syncing disks, buffers remaining... panic: mi_switch: switch in a critical section

    addr2line says the panic was in kern/sched_4bsd.c:327

                                    /*
                                     * The kse slptimes are not touched in wakeup
                                     * because the thread may not HAVE a KSE.
                                     */
                                    if (ke->ke_state == KES_ONRUNQ) {
                                            awake = 1;
                                            ke->ke_flags &= ~KEF_DIDRUN;
    ---> } else if ((ke->ke_state == KES_THREAD) &&
                                        (TD_IS_RUNNING(ke->ke_thread))) {
                                            awake = 1;

    gdb -k got confused and couldn't make anything out of the backtrace.

    * Machines with 4GB RAM do not auto-tune kernel memory parameters
    optimally and easily panic under load with a panic message that does
    not at least give instructions on what may be wrong and how to fix it.

    * 8 Feb 2004
    [bug report to -current, confirmed locally]

    After typing "truss -f fsck -p /", I see nothing. I press ^Z
    and type kill -9 % (killing truss).

    I now have these fine processes hanging dead in memory, they are immune
    to kill -9 and don't respond to kill -CONT either, ps axl:

      UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
        0 56974 1 0 8 0 1256 744 ppwait D p1 0:00.00 fsck -p /
        0 56975 56974 0 8 0 1256 744 stopev DV p1 0:00.00 fsck -p /

    * ATA tends to panic the system when error conditions occur, e.g.

    ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=45113664
    ad0: DMA limited to UDMA33, non-ATA66 cable or device
    ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=45113664
    ad0: WARNING - removed from configuration
    ata0-master: FAILURE - WRITE_DMA timed out

    Fatal trap 12: page fault while in kernel mode
    [...]

    OTOH ATA on 4.x also gives panics with INVARIANTS when this kind of
    thing happens.

    [sparc64]

    * Likes to panic with "panic: ipi_send: couldn't send ipi"

    tmm suggested bumping the value of

    ./include/smp.h:#define IPI_RETRIES 100

    This may have "fixed" the problem, or at least reduced the frequency.

    * syscons does not work on ultra30 any more; looks like it might be
    related to differences in the keyboard controller on the u30. marius
    and kensmith are knowledgeable about this.

    Kris

    
    



  • Next message: Norikatsu Shigemura: "panic at sched_add_internal"

    Relevant Pages

    • panics and crash dumps
      ... I'm having problems getting a crash dump on my panics. ... Fatal trap 12: page fault while in kernel mode ...
      (freebsd-questions)
    • panics after updating to RELENG_4 aug 25 from May 17th
      ... panics seem to be in the same location. ... GDB is free software, covered by the GNU General Public License, and you are ... page fault while in kernel mode ...
      (freebsd-stable)
    • Re: panic: kmem_malloc; Should I increase some setting?
      ... until it reaches 120MB and than it panics. ... > Uptime: 4m1s ... page fault while in kernel mode ...
      (freebsd-current)
    • RE: 4.9-RELEASE, ACPI and DELL Latitude D600
      ... > Since i need the batterystatus (and the Bios doesnt seem to support APM ... the machine panics when closing the lid, ... page fault while in kernel mode ...
      (freebsd-hackers)
    • Need help debugging kernel
      ... system panics and reboots about every 24 hours (24 hours since last ... Kernel has following additions to GENERIC - options: ... GDB is free software, covered by the GNU General Public License, and you are ... page fault while in kernel mode ...
      (freebsd-hackers)