Re: ssh & select() problem on 5.3
From: Claudiu Dragalia-Paraipan (dr.clau_at_gmail.com)
Date: 11/28/04
- Previous message: Robert Watson: "Re: ssh & select() problem on 5.3"
- In reply to: Robert Watson: "Re: ssh & select() problem on 5.3"
- Next in thread: Barney Wolff: "Re: ssh & select() problem on 5.3"
- Reply: Barney Wolff: "Re: ssh & select() problem on 5.3"
- Reply: Peter Jeremy: "Re: ssh & select() problem on 5.3"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sun, 28 Nov 2004 18:43:47 +0200 To: Robert Watson <rwatson@freebsd.org>
Hi,
Robert Watson wrote:
> Sounds like a bug, but the interesting question is really whether it's a
> kernel bug or an SSH bug. I'm not up on SSH internals, but there are a
> few other knobs you might try and things to look at that might help
> address whether it's a kernel bug or not:
>
> (1) Try debug.mpsafenet=0 in loader.conf on the 5.3 box -- if we're
> looking at a kernel race condition due to a locking bug, that might
> close the race. However, it might also just changing the timing...
> That this happens on SMP and UP suggests that it's not so much a
> timing issue.
I tried debug.mpsafenet=0. No change.
>
> (2) select() is almost always used to wait for space in a buffer to write,
> or wait for data in a buffer to read. Using a combination of
> netstat(1) and sockstat(1), it would be useful to know whether there
> is in fact data in either the send or receive buffer. Combined with
> inspecting the state of the select arguments and socket buffers in
> kernel, this might reveal whether perhaps there was a missed wakeup.
> It's worth noting that we believe we corrected a bug with exactly thes
> symptoms shortly before 5.3 release.
>
> Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
> robert@fledge.watson.org Principal Research Scientist, McAfee Research
I knew about the poll()/select() issue, that's why I thought this is the
case.
I have tried to same connection from a Windows with Putty client, on one
machine everything is ok, but on another dmesg triggers again the lock.
A friend tried both FreeBSD 5.3 and Windows, and it seems that it locks
more often in 5.3, but not only in 5.3.
More, I connected to another machine with ssh, and from there I ssh'ed
to the server which seems to trigger the lock. It still locks.
Even more, a tcpdump on the other end (I have access to the
router/firewall, which is right before the machine I am testing with),
after the lock-up, still shows packets being send from the server to me,
but a tcpdump at my end shows nothing: packets never get here.
In the light of the new events, I guess I can say that FreeBSD 5.3 acts
exactly as it should act, select() waits for packets that never get
here. Unless packets get here but are never processed by kernel (?).
Since the problem occurs only when I connect to the firewall or to a
server behind it, I started to suspect a hardware failure. Could a
network card cause such problems ?
The firewall is running on FreeBSD 5.2.1 with PF+ALTQ, and I can observe
the same behaviour: dmesg locks ssh connection. I have test this with PF
disabled, and the problem still occurs, so I can eliminate PF as a problem.
I've crossposted to hackers list too, since this can be of interest
there too.
If anyone has any ideea of what might be going on, it would be helpful.
With respect,
-- Claudiu Dragalina-Paraipan dr.clau@gmail.com
- application/pgp-signature attachment: OpenPGP digital signature
- Previous message: Robert Watson: "Re: ssh & select() problem on 5.3"
- In reply to: Robert Watson: "Re: ssh & select() problem on 5.3"
- Next in thread: Barney Wolff: "Re: ssh & select() problem on 5.3"
- Reply: Barney Wolff: "Re: ssh & select() problem on 5.3"
- Reply: Peter Jeremy: "Re: ssh & select() problem on 5.3"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
- Re: ssh & select() problem on 5.3
... > kernel bug or an SSH bug. ... It still locks. ...
(freebsd-hackers) - Re: ssh & select() problem on 5.3
... I ran ssh in gdb, and found out that it locks in selectin ... Attached
is dmesg for UP kernel. ... kernel bug or an SSH bug. ... (freebsd-current) - Re: [PATCH 1/2] LogFS proper
... Please comment the structure with kernel doc comments and avoid the tail ...
Do enums have a significant ... Also the BUG itself will give you enough clue where
it happened, ... which leaves only the prepared filesystem image to worry about. ...
(Linux-Kernel) - Re: 2.6.25 crash: EIP: [
] xfrm_output_resume+0x64/0x100 ss:esp 0068:c03a1e5c
... please include in all bug reports as ... This linux box is an ipsec gateway
and ... # Linux kernel version: 2.6.25 ... # PCI IDE chipsets support ...
(Linux-Kernel) - Server Crash (2.6.17-1.2157)- BUG: soft lockup detected on CPU#3!
... I just installed Fedora Core Kernel 2.6.17-1.2157_FC5smp and immediately got a "BUG:
soft lockup detected on CPU#3!", I've never had this on any other kernel version before, but on my desk
top PC and now this server with this specific kernel. ... isg-dev7 kernel: CPU: 3 ...
kernel BUG at include/linux/list.h:185! ... MEM window: dd200000-dd3fffff ...
(Fedora)