Re: panic in rt_check_fib()
- From: Giorgos Keramidas <keramida@xxxxxxxxxxx>
- Date: Sun, 14 Sep 2008 15:56:12 +0300
On Sat, 13 Sep 2008 23:28:51 -0700, Julian Elischer <julian@xxxxxxxxxxxx> wrote:
To recap on this, I rewrote this function a couple of week sagobecause I
couldn't keep track of what was going on, and I thought it might
havesome bad edge cases. a couple of days later Giorgos contacted me
saying hta the had a fairly reproducible situation
where this was triggered and it appeared to be an edge case in
this function that allowed it to try lock the same lock twice.
I immediatly thought "ah=hah!" I may have a solution to this,
and gave him a copy of my new function and indead it DOES fix that
panic. however after deleting and recreating intefaces a few hundred
times without crashing in rt_check_fib() it then fails somewhere else,
(actually it leacks some resources and eventually networking stops).
I'm not convinced that is a problem with the new or old rt_check() but
it did stop me from just committing the new code.
I rereading the way the function (did and still does) work it
occurred to me that there was a large flaw in teh way it worked..
It dropped a the lock on one route while it went off an did something
else that might block, On returning it blindly re-grabbed that lock,
completely ignoring the fact that the route might not even be valid any
more. (or any of several other things that may have changed while
it was away (maybe sleeping)).
the code Giorgos is referring to is a patch I suggested to him to
fix this oversight and not the one that I originally tested and
had suggested to fix the edge case.
I do however ask that some other people look at this patch!
Exactly. Thanks for summarizing this so well :)
I have started a kernel with your latest patch (from the quoted message
above), and I can't panic my kernel with the script that did it in a
semi-reliable manner before:
% root@kobe:/root# while true ; do \
% sh home.sh > /dev/null 2>&1 ; \
% vmstat -z | sed -n -e 1p -e /rt/p ; \
% sleep 1 ; \
% done
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 19, 77, 43, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 20, 76, 47, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 21, 75, 51, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 23, 73, 55, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 24, 72, 59, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 25, 71, 62, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 26, 70, 65, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 27, 69, 69, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 29, 67, 73, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 30, 66, 76, 0
% ^C
% root@kobe:/root# sh home.sh
rtentries seem to be going up every time I cycle through the script,
which essentially brings down both wireless and wired interfaces and
then brings up the wired interface of my laptop. The core of the script
is currently:
# network interface options
export ifconfig_re0="inet 192.168.1.10/24"
export defaultrouter='192.168.1.1'
echo '## Stopping network interfaces.'
/etc/rc.d/netif stop re0 && ifconfig re0 delete
/etc/rc.d/netif stop iwn0 && ifconfig iwn0 delete
echo '## Bringing up network interface.'
/etc/rc.d/netif start re0
echo "## Reloading firewall rules."
/etc/rc.d/pf reload
# The default route may be pointing to another interface. Find out
# the IP address of the default gateway, delete it and point to the
# default gateway configured as ${defaultrouter}.
if [ -n "${defaultrouter}" ]; then
echo '## Setting default router.'
_oldrouter=`netstat -rn | grep default | awk '{print $2}'`
if [ -n "${_oldrouter}" ]; then
route delete default "${_oldrouter}"
unset _oldrouter
fi
route add default "$defaultrouter"
fi
With your version of rt_check_fib() I have no panics so far. This
doesn't mean we don't have a bug elsewhere, or that it will not panic
tomorrow, but it's nice that thing seem a bit more stable now. The old
version of rt_check_fib() used to panic about one third of the time I
ran my 'home.sh' script...
Now an interesting question is: Is it `normal' that the USED rtentry
objects keep going up at every interface restart and are (at least at
first glance) not reclaimed as fast as they are acquired?
_______________________________________________
freebsd-current@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"
- Follow-Ups:
- Re: panic in rt_check_fib()
- From: Julian Elischer
- Re: panic in rt_check_fib()
- References:
- panic in rt_check_fib()
- From: Giorgos Keramidas
- Re: panic in rt_check_fib()
- From: Robert Watson
- Re: panic in rt_check_fib()
- From: Julian Elischer
- Re: panic in rt_check_fib()
- From: Giorgos Keramidas
- Re: panic in rt_check_fib()
- From: Giorgos Keramidas
- Re: panic in rt_check_fib()
- From: Julian Elischer
- panic in rt_check_fib()
- Prev by Date: [head tinderbox] failure on powerpc/powerpc
- Next by Date: Re: cdefpriv usage (was: bsd versus linux device drivers)
- Previous by thread: Re: panic in rt_check_fib()
- Next by thread: Re: panic in rt_check_fib()
- Index(es):
Relevant Pages
|