freebsd-5.4-stable panics

From: Antoine Pelisse (apelisse_at_gmail.com)
Date: 09/30/05

  • Next message: Yar Tikhiy: "Re: A smarter mergemaster"
    Date: Fri, 30 Sep 2005 10:24:43 +0100
    To: freebsd-hackers@freebsd.org
    
    

     Hi Robert,
    I don't think your patch is correct, the total linked list can be broken
    while the lock is released, thus just passing the link may not be enough
    I have submitted a PR[1] for this a month ago but nobody took care of it yet
      Regards,
    Antoine Pelisse

    [1] http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/84684
      On 9/29/05, Robert Watson <rwatson@freebsd.org> wrote:
    >
    > On Thu, 29 Sep 2005, Rob Watt wrote:
    >
    > > On Thu, 29 Sep 2005, Robert Watson wrote:
    > >
    > >> Could you dump the contents of *td and *td->td_proc for me? I'm quite
    > >> interested to know what the value in td->td_proc->p_state is, among
    > other
    > >> things. If I could also have you generate a dump of the KSE group
    > >> structures in td->td_proc->p_ksegrps and the threads in
    > >> td->td_proc->p_threads.
    > >
    > > I've attached a file with many of the values you have asked for. We
    > > looked at some of the threads referenced by td->td_proc->p_threads, but
    > > we weren't sure we were walking the list correctly. Do you have any tips
    >
    > > for walking those thread lists?
    > >
    > >> Could you tell me if the program named by p->p_comm is linked against a
    > >> threading library? If it's a custom app, you may already know, and if
    > >> not, you can run ldd on the application to see what it is linked
    > >> against.
    > >
    > > The programs named by p->p_comm is linked against the pthreads library.
    >
    > This seems to be enough information to at least track this down a bit:
    > td_ksegrp is NULL, rather than a corrupt value, which suggests that the
    > thread is incompletely initialized. Other hints that this are the case
    > are that td_critnest is 1 (as is set when it is allocated), and the state
    > is TDS_INACTIVE. Some other fields are set though, such as td_oncpu,
    > which is normally initialized to NOCPU.
    >
    > > (kgdb) p *td
    > > $1 = {td_proc = 0xffffff004aa9f000, td_ksegrp = 0x0, td_plist =
    > > {tqe_next = 0xff ffff00b4798000,
    > > tqe_prev = 0xffffff00a97ae010}, td_kglist = {tqe_next =
    > > 0xffffff00b4798000,
    > > tqe_prev = 0xffffff00a97ae020}, td_slpq = {tqe_next = 0x0, tqe_prev
    > > = 0xffff ff001fac7c10}, td_lockq = {
    > > tqe_next = 0xffffff00a97ae000, tqe_prev = 0xffffffffb6797a70},
    > > td_runq = {tq e_next = 0x0,
    > > tqe_prev = 0xffffffff80608180}, td_selq = {tqh_first = 0x0, tqh_last
    > > = 0xfff fff00633112c0},
    > > td_sleepqueue = 0xffffff00382b0400, td_turnstile = 0xffffff00c1712900,
    > > td_umtx q = 0xffffff00d1207080,
    > > td_tid = 100253, td_flags = 16777216, td_inhibitors = 0, td_pflags =
    > > 128, td_d upfd = 0, td_wchan = 0x0,
    > > td_wmesg = 0x0, td_lastcpu = 2 '\002', td_oncpu = 2 '\002',
    > > td_owepreempt = 0 '\0', td_locks = 0,
    > > td_blocked = 0x0, td_ithd = 0x0, td_lockname = 0x0, td_contested =
    > > {lh_first =
    > > 0x0}, td_sleeplocks = 0x0,
    > > td_intr_nesting_level = 0, td_pinned = 0, td_mailbox = 0x0, td_ucred =
    > > 0xfffff f00ad18f200,
    > > td_standin = 0x0, td_upcall = 0x0, td_sticks = 0, td_uuticks = 0,
    > > td_usticks =
    > > 0, td_intrval = 0,
    > > td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = {__bits =
    > > {4294967295, 4 294967295, 4294967295,
    > > 4294967295}}, td_siglist = {__bits = {0, 0, 0, 0}}, td_generation
    > > = 14, td _sigstk = {ss_sp = 0x0,
    > > ss_size = 0, ss_flags = 0}, td_kflags = 0, td_xsig = 0,
    > > td_profil_addr = 0, td_profil_ticks = 0,
    > > td_base_pri = 182 '\uffff', td_priority = 182 '\uffff', td_pcb =
    > > 0xffffffffb68 dcd10, td_state = TDS_INACTIVE,
    > > td_retval = {1, 29309280}, td_slpcallout = {c_links = {sle = {sle_next
    > > = 0x0},
    > > tqe = {tqe_next = 0x0,
    > > tqe_prev = 0xffffff001fac7d80}}, c_time = 55907602, c_arg =
    > > 0xffffff0063 311260,
    > > c_func = 0xffffffff802e32a0 <sleepq_timeout>, c_mtx = 0x0, c_flags =
    > > 16}, td _frame = 0xffffffffb68dcc40,
    > > td_kstack_obj = 0xffffff0087f93d20, td_kstack = 18446744072477315072,
    > > td_kstac k_pages = 4,
    > > td_altkstack_obj = 0x0, td_altkstack = 0, td_altkstack_pages = 0,
    > > td_critnest = 1, td_md = {
    > > md_spinlock_count = 1, md_saved_flags = 582}, td_sched =
    > > 0xffffff0063311488}
    >
    > I'm not familiar with the internals of the thread and KSE life cycle here,
    >
    > so I think we'll need to look to those more familiar with this to
    > understand what of two things may be going on:
    >
    > (1) Is the fact that td_ksegrp != NULL an invariant for a connected
    > thread, and that kern_proc is relying on that but the thread code is
    > failing to implement it safely?
    >
    > (2) Is td_ksegrp sometimes left legitimately as NULL as part of the thread
    > life cycle, and that kern_proc incorrectly assumes that it is never
    > NULL when hooked up to a thread.
    >
    > This suggests a possible work-around of simply testing td_ksegrp for NULL
    > in kern_proc in order to avoid this, while attempting to resolve whether
    > an invariant is violated (or incorrectly assumed), which might require
    > some serious thinking and a solution that is non-trivial. Something like
    > the following might work in the mean time:
    >
    > Index: kern_proc.c
    > ===================================================================
    > RCS file: /home/ncvs/src/sys/kern/kern_proc.c,v
    > retrieving revision 1.231
    > diff -u -r1.231 kern_proc.c
    > --- kern_proc.c 27 Sep 2005 18:03:15 -0000 1.231
    > +++ kern_proc.c 29 Sep 2005 20:50:33 -0000
    > @@ -882,6 +882,8 @@
    > } else {
    > _PHOLD(p);
    > FOREACH_THREAD_IN_PROC(p, td) {
    > + if (td->td_ksegrp == NULL)
    > + continue;
    > fill_kinfo_thread(td, &kinfo_proc);
    > PROC_UNLOCK(p);
    > error = SYSCTL_OUT(req, (caddr_t)&kinfo_proc,
    >
    > I'm going to forward off your e-mail to the threads@ list and see if
    > anyone there wants to talk some more about this. If you don't mind
    > testing the above patch to see if this is a workable work-around, we may
    > want to think about getting it committed in the mean time.
    >
    > Thanks,
    >
    > Robert N M Watson
    > _______________________________________________
    > freebsd-hackers@freebsd.org mailing list
    > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
    > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
    >
    _______________________________________________
    freebsd-hackers@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
    To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"


  • Next message: Yar Tikhiy: "Re: A smarter mergemaster"

    Relevant Pages

    • Re: Conflicting info between the global Security Bulletin and some SPi Security Bulletin
      ... Two examples in a W2K4 system (with R1V2). ... The MS02-050 is explicitly listed as included in SP4 AND in Rollup 1 ... What happen if one runs an unnecessary patch? ... of those lists, are in a better shape than those lists. ...
      (microsoft.public.win2000.security)
    • Re: PAWS security vulnerability
      ... but I am not a FreeBSD source contributor. ... Yes, I was shown this "vulnerability" by our network security person, ... patch, verify that I couldn't break it any longer. ... >you to post the patch and info to the appropriate FreeBSD security lists, ...
      (freebsd-questions)
    • Re: PAWS security vulnerability
      ... >FreeBSD security list" isn't grammatically correct. ... >"I told you to post the patch and info to the appropriate FreeBSD security ... >lists, and you aren't the least bit interested in doing what I told you" ...
      (freebsd-questions)
    • Re: [patch] pci: pci_enable_device_bars() fix
      ... Also a CC to linux-scsi and the driver author would be nice, ... What's hostile about telling you your patch is wrong and pointing you at ... trusted positions are expected to go via the lists and subsystems in ...
      (Linux-Kernel)
    • RE: PAWS security vulnerability
      ... You STILL haven't taken this to the correct security mailing list, ... > FreeBSD security ... >>lists, and you aren't the least bit interested in doing what ... >>appropriate forum to post the patch, ...
      (freebsd-questions)