freebsd-5.4-stable panics

From: Antoine Pelisse (apelisse_at_gmail.com)
Date: 09/30/05

  • Next message: Brian Reichert: "Re: serial login to SBC"
    Date: Fri, 30 Sep 2005 16:25:33 +0100
    To: freebsd-hackers@freebsd.org, Robert Watson <rwatson@freebsd.org>
    
    

    On 9/30/05, John Baldwin <jhb@freebsd.org> wrote:

    > On Friday 30 September 2005 05:24 am, Antoine Pelisse wrote:
    > > Hi Robert,
    > > I don't think your patch is correct, the total linked list can be broken
    >
    > > while the lock is released, thus just passing the link may not be enough
    > > I have submitted a PR[1] for this a month ago but nobody took care of it
    > > yet Regards,
    > > Antoine Pelisse
    > >
    > > [1] http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/84684
    >
    > I think this patch looks ok. Robert, can you get the original panic on
    > this
    > thread tested against this patch?

     I had a small program which could reproduce this panic in 10 seconds, it
    was basically creating empty threads and calling kvm_getprocs() in the same
    time. Anyway the patch was able to stop the program from panicing.
    The panic is also reproducible in RELENG_6 and HEAD IIRC.

    > On 9/29/05, Robert Watson <rwatson@freebsd.org> wrote:
    > > > On Thu, 29 Sep 2005, Rob Watt wrote:
    > > > > On Thu, 29 Sep 2005, Robert Watson wrote:
    > > > >> Could you dump the contents of *td and *td->td_proc for me? I'm
    > quite
    > > > >> interested to know what the value in td->td_proc->p_state is, among
    >
    > > >
    > > > other
    > > >
    > > > >> things. If I could also have you generate a dump of the KSE group
    > > > >> structures in td->td_proc->p_ksegrps and the threads in
    > > > >> td->td_proc->p_threads.
    > > > >
    > > > > I've attached a file with many of the values you have asked for. We
    > > > > looked at some of the threads referenced by td->td_proc->p_threads,
    > but
    > > > > we weren't sure we were walking the list correctly. Do you have any
    > > > > tips
    > > > >
    > > > > for walking those thread lists?
    > > > >
    > > > >> Could you tell me if the program named by p->p_comm is linked
    > against
    > > > >> a threading library? If it's a custom app, you may already know,
    > and
    > > > >> if not, you can run ldd on the application to see what it is linked
    > > > >> against.
    > > > >
    > > > > The programs named by p->p_comm is linked against the pthreads
    > library.
    > > >
    > > > This seems to be enough information to at least track this down a bit:
    > > > td_ksegrp is NULL, rather than a corrupt value, which suggests that
    > the
    > > > thread is incompletely initialized. Other hints that this are the case
    > > > are that td_critnest is 1 (as is set when it is allocated), and the
    > state
    > > > is TDS_INACTIVE. Some other fields are set though, such as td_oncpu,
    > > > which is normally initialized to NOCPU.
    > > >
    > > > > (kgdb) p *td
    > > > > $1 = {td_proc = 0xffffff004aa9f000, td_ksegrp = 0x0, td_plist =
    > > > > {tqe_next = 0xff ffff00b4798000,
    > > > > tqe_prev = 0xffffff00a97ae010}, td_kglist = {tqe_next =
    > > > > 0xffffff00b4798000,
    > > > > tqe_prev = 0xffffff00a97ae020}, td_slpq = {tqe_next = 0x0, tqe_prev
    > > > > = 0xffff ff001fac7c10}, td_lockq = {
    > > > > tqe_next = 0xffffff00a97ae000, tqe_prev = 0xffffffffb6797a70},
    > > > > td_runq = {tq e_next = 0x0,
    > > > > tqe_prev = 0xffffffff80608180}, td_selq = {tqh_first = 0x0, tqh_last
    > > > > = 0xfff fff00633112c0},
    > > > > td_sleepqueue = 0xffffff00382b0400, td_turnstile =
    > 0xffffff00c1712900,
    > > > > td_umtx q = 0xffffff00d1207080,
    > > > > td_tid = 100253, td_flags = 16777216, td_inhibitors = 0, td_pflags =
    >
    > > > > 128, td_d upfd = 0, td_wchan = 0x0,
    > > > > td_wmesg = 0x0, td_lastcpu = 2 '\002', td_oncpu = 2 '\002',
    > > > > td_owepreempt = 0 '\0', td_locks = 0,
    > > > > td_blocked = 0x0, td_ithd = 0x0, td_lockname = 0x0, td_contested =
    > > > > {lh_first =
    > > > > 0x0}, td_sleeplocks = 0x0,
    > > > > td_intr_nesting_level = 0, td_pinned = 0, td_mailbox = 0x0, td_ucred
    > =
    > > > > 0xfffff f00ad18f200,
    > > > > td_standin = 0x0, td_upcall = 0x0, td_sticks = 0, td_uuticks = 0,
    > > > > td_usticks =
    > > > > 0, td_intrval = 0,
    > > > > td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = {__bits =
    > > > > {4294967295, 4 294967295, 4294967295,
    > > > > 4294967295}}, td_siglist = {__bits = {0, 0, 0, 0}}, td_generation
    > > > > = 14, td _sigstk = {ss_sp = 0x0,
    > > > > ss_size = 0, ss_flags = 0}, td_kflags = 0, td_xsig = 0,
    > > > > td_profil_addr = 0, td_profil_ticks = 0,
    > > > > td_base_pri = 182 '\uffff', td_priority = 182 '\uffff', td_pcb =
    > > > > 0xffffffffb68 dcd10, td_state = TDS_INACTIVE,
    > > > > td_retval = {1, 29309280}, td_slpcallout = {c_links = {sle =
    > {sle_next
    > > > > = 0x0},
    > > > > tqe = {tqe_next = 0x0,
    > > > > tqe_prev = 0xffffff001fac7d80}}, c_time = 55907602, c_arg =
    > > > > 0xffffff0063 311260,
    > > > > c_func = 0xffffffff802e32a0 <sleepq_timeout>, c_mtx = 0x0, c_flags =
    > > > > 16}, td _frame = 0xffffffffb68dcc40,
    > > > > td_kstack_obj = 0xffffff0087f93d20, td_kstack =
    > 18446744072477315072,
    > > > > td_kstac k_pages = 4,
    > > > > td_altkstack_obj = 0x0, td_altkstack = 0, td_altkstack_pages = 0,
    > > > > td_critnest = 1, td_md = {
    > > > > md_spinlock_count = 1, md_saved_flags = 582}, td_sched =
    > > > > 0xffffff0063311488}
    > > >
    > > > I'm not familiar with the internals of the thread and KSE life cycle
    > > > here,
    > > >
    > > > so I think we'll need to look to those more familiar with this to
    > > > understand what of two things may be going on:
    > > >
    > > > (1) Is the fact that td_ksegrp != NULL an invariant for a connected
    > > > thread, and that kern_proc is relying on that but the thread code is
    > > > failing to implement it safely?
    > > >
    > > > (2) Is td_ksegrp sometimes left legitimately as NULL as part of the
    > > > thread life cycle, and that kern_proc incorrectly assumes that it is
    > > > never NULL when hooked up to a thread.
    > > >
    > > > This suggests a possible work-around of simply testing td_ksegrp for
    > NULL
    > > > in kern_proc in order to avoid this, while attempting to resolve
    > whether
    > > > an invariant is violated (or incorrectly assumed), which might require
    > > > some serious thinking and a solution that is non-trivial. Something
    > like
    > > > the following might work in the mean time:
    > > >
    > > > Index: kern_proc.c
    > > > ===================================================================
    > > > RCS file: /home/ncvs/src/sys/kern/kern_proc.c,v
    > > > retrieving revision 1.231
    > > > diff -u -r1.231 kern_proc.c
    > > > --- kern_proc.c 27 Sep 2005 18:03:15 -0000 1.231
    > > > +++ kern_proc.c 29 Sep 2005 20:50:33 -0000
    > > > @@ -882,6 +882,8 @@
    > > > } else {
    > > > _PHOLD(p);
    > > > FOREACH_THREAD_IN_PROC(p, td) {
    > > > + if (td->td_ksegrp == NULL)
    > > > + continue;
    > > > fill_kinfo_thread(td, &kinfo_proc);
    > > > PROC_UNLOCK(p);
    > > > error = SYSCTL_OUT(req, (caddr_t)&kinfo_proc,
    > > >
    > > > I'm going to forward off your e-mail to the threads@ list and see if
    > > > anyone there wants to talk some more about this. If you don't mind
    > > > testing the above patch to see if this is a workable work-around, we
    > may
    > > > want to think about getting it committed in the mean time.
    > > >
    > > > Thanks,
    > > >
    > > > Robert N M Watson
    > > > _______________________________________________
    > > > freebsd-hackers@freebsd.org mailing list
    > > > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
    > > > To unsubscribe, send any mail to
    > > > "freebsd-hackers-unsubscribe@freebsd.org "
    > >
    > > _______________________________________________
    > > freebsd-hackers@freebsd.org mailing list
    > > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
    > > To unsubscribe, send any mail to "
    > freebsd-hackers-unsubscribe@freebsd.org"
    >
    > --
    > John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/>
    > "Power Users Use the Power to Serve" =
    http://www.FreeBSD.org>
    >
    _______________________________________________
    freebsd-hackers@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
    To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"


  • Next message: Brian Reichert: "Re: serial login to SBC"

    Relevant Pages

    • Re: [patch] Fix type errors in inotify interfaces
      ... (Would be nice to see an Aacked-by from Robert or John on this patch.) ... I will follow up with some suggestions for glibc ... struct inotify_event { ... The patch below makes the changes needed for inotify_rm_watch. ...
      (Linux-Kernel)
    • Re: Mozilla asks stupid questions!!
      ... all you have to do is submit a patch. ... The project leaders wanted a graphical indicator ... Robert Riches ...
      (comp.os.linux.misc)
    • Re: freebsd-5.4-stable panics
      ... On Friday 30 September 2005 05:24 am, Antoine Pelisse wrote: ... I think this patch looks ok. ... Robert, can you get the original panic on this ... To unsubscribe, ...
      (freebsd-hackers)
    • Re: Pending MFC of drm updates
      ... I have a patch available for testing at ... robert. ... Occasionally a window on the second monitor will decide to render its ... drop-down menus or other graphics on the primary ...
      (freebsd-stable)
    • Re: freebsd-5.4-stable panics
      ... On 30 Sep, John Baldwin wrote: ... >> I had a small program which could reproduce this panic in 10 seconds, ... Anyway the patch was able to stop the program from panicing. ... > the other case and cut out a bunch of the locking gymnastics as a result. ...
      (freebsd-hackers)