Re: major DNS hiccup



In article <8WJug.7561$i32.3378@xxxxxxxxxxxxxxxxxxxx> Mike Scott
<usenet.10@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> writes:

OK, I've come back to this at last. Hoping it might have gone away as
suddenly as it arrived; but no such luck :-(

I ran 'dig' to look up the address of one of the always-failing names.
'dig' output plus ethereal diagnosis follow. named running as caching
nameserver on localhost.

"Interesting" stuff - IMHO it's quite clear that your ISP is messing
things up with some sort of "transparent" filter/proxy/cache/firewall -
it might be specifically b0rken where EDNS0 is concerned, while their
"normal" name servers can deal with that one way or another (as they
should) - but it's not really conclusive that this is the trigger.
Unfortunately your ethereal output was missing some parts, notably
details of the "additional records", but they can be guessed...

No. Time Source Destination
Protocol Info
3 11:32:48.292602 86.22.67.158 194.74.151.194 DNS
Standard query A www.yell.co.uk

Frame 3 (85 bytes on wire, 85 bytes captured)

This is named's "normal" first query, including the EDSN0 OPT RR (which
isn't shown) - it gets a FORMERR reply:

No. Time Source Destination
Protocol Info
4 11:32:48.300886 194.74.151.194 86.22.67.158 DNS
Standard query response, Format error

Flags: 0x8081 (Standard query response, Format error)

As previously mentioned, this is OK per se (but see below), and named
retries without the EDNS0 stuff (note shorter packet and no additional
records):

No. Time Source Destination
Protocol Info
5 11:32:48.347740 86.22.67.158 194.74.151.194 DNS
Standard query A www.yell.co.uk

Frame 5 (74 bytes on wire, 74 bytes captured)

But here comes the real brokenness - the reply to this query is
non-authoritative, has no error but no answers either, but has authority
records:

No. Time Source Destination
Protocol Info
6 11:32:48.355006 194.74.151.194 86.22.67.158 DNS
Standard query response


Flags: 0x8080 (Standard query response, No error)

Questions: 1
Answer RRs: 0
Authority RRs: 2

The authority records weren't shown, but a fair guess is that they were
the two "proper" NS records, one of which is for the very server
purportedly sending this response. This is the signature of a "lame
delegation", and named would normally log that (but it can be turned off
IIRC).

The remaining packets are just an exact repeat with the other server
(.200), after which named is out of options and has to deliver the
SERVFAIL back to 'dig'.

If this really is due to ntl (my ISP) messing up, I'd appreciate some
advice on how to prove that really is the case.

Well, the above doesn't prove anything like that per se - it could just
be two badly broken and/or misconfigured name servers (but of course
lots of people would have problems with that domain if so, and you said
it happened to other domains too).

However when I try the exact same thing myself, there is no problem (see
below). The query from named happens to go to the .200 server - only,
since it is successful. And that server is perfectly capable of handling
the EDSN0 stuff - no FORMERR, but instead the correct reply at first
try, and it even sends an EDNS0 OPT RR itself. I bet I would get the
same result from .194.

So it would seem that your ISP's supposedly-transparent stuff either
generates the FORMERR itself, or mangles the request such that it really
has a format error when it arrives at the server. And then it enters
some strange "state" that causes it to generate the lame-delegation-
style response (I can't really think of a way for it to mangle the
request such that the actual server sends that response). Hm, actually,
you *would* get such a response if your query was forwarded to a
(different) server that didn't do recursion, which might be a clue (or
not).

Why it works for some domains remains unknown, but it may not be
possible to figure out from your end - e.g. their "transparent" stuff
could be load-balancing across some set of local caching servers, only
some of which are b0rken.

Anyway, for me this clearly proves that your ISP is at fault - whether
it is proof enough for them I have no idea. They could e.g. claim that
your named sends some broken stuff while mine doesn't - but then there
was the other poster here that had success with the exact same setup
that failed with your ISP when he switched to another ISP.

If all else fails, you could perhaps hack named to not use EDSN0 at all
- I didn't find any such setting, but at least in the code I have here,
in /usr/src/contrib/bind9/lib/dns/resolver.c the function
resquery_send() checks the per-server setting that I mentioned:

if ((query->addrinfo->flags & DNS_FETCHOPT_NOEDNS0) == 0 &&
peer != NULL &&
dns_peer_getsupportedns(peer, &useedns) == ISC_R_SUCCESS &&
!useedns)
{
query->options |= DNS_FETCHOPT_NOEDNS0;
dns_adb_changeflags(fctx->adb,
query->addrinfo,
DNS_FETCHOPT_NOEDNS0,
DNS_FETCHOPT_NOEDNS0);
}

Changing that to

if (1)
{
query->options |= DNS_FETCHOPT_NOEDNS0;
...

would probably do the trick... - again assuming that EDNS0 really *is*
the trigger.

--Per Hedeland
per@xxxxxxxxxxxx


Ethereal output:


No. Time Source Destination Protocol Info
13 38.820669 10.1.1.1 194.74.151.200 DNS Standard query A www.yell.co.uk

Frame 13 (85 bytes on wire, 85 bytes captured)
Ethernet II, Src: Intel_83:4c:d3 (00:d0:b7:83:4c:d3), Dst: Cisco_28:34:00 (00:02:17:28:34:00)
Internet Protocol, Src: 10.1.1.1 (10.1.1.1), Dst: 194.74.151.200 (194.74.151.200)
User Datagram Protocol, Src Port: 61490 (61490), Dst Port: domain (53)
Source port: 61490 (61490)
Destination port: domain (53)
Length: 51
Checksum: 0x1ac1 [correct]
Domain Name System (query)
Transaction ID: 0xa82b
Flags: 0x0000 (Standard query)
0... .... .... .... = Response: Message is a query
.000 0... .... .... = Opcode: Standard query (0)
.... ..0. .... .... = Truncated: Message is not truncated
.... ...0 .... .... = Recursion desired: Don't do query recursively
.... .... .0.. .... = Z: reserved (0)
.... .... ...0 .... = Non-authenticated data OK: Non-authenticated data is unacceptable
Questions: 1
Answer RRs: 0
Authority RRs: 0
Additional RRs: 1
Queries
www.yell.co.uk: type A, class IN
Name: www.yell.co.uk
Type: A (Host address)
Class: IN (0x0001)
Additional records
<Root>: type OPT
Name: <Root>
Type: OPT (EDNS0 option)
UDP payload size: 4096
Higher bits in extended RCODE: 0x0
EDNS0 version: 0
Z: 0x8000
Bit 0 (DO bit): 1 (Accepts DNSSEC security RRs)
Bits 1-15: 0x0 (reserved)
Data length: 0

No. Time Source Destination Protocol Info
14 38.884347 194.74.151.200 10.1.1.1 DNS Standard query response A 194.72.108.2

Frame 14 (190 bytes on wire, 190 bytes captured)
Ethernet II, Src: Cisco_28:34:00 (00:02:17:28:34:00), Dst: Intel_83:4c:d3 (00:d0:b7:83:4c:d3)
Internet Protocol, Src: 194.74.151.200 (194.74.151.200), Dst: 10.1.1.1 (10.1.1.1)
User Datagram Protocol, Src Port: domain (53), Dst Port: 61490 (61490)
Source port: domain (53)
Destination port: 61490 (61490)
Length: 156
Checksum: 0x5e59 [correct]
Domain Name System (response)
Transaction ID: 0xa82b
Flags: 0x8400 (Standard query response, No error)
1... .... .... .... = Response: Message is a response
.000 0... .... .... = Opcode: Standard query (0)
.... .1.. .... .... = Authoritative: Server is an authority for domain
.... ..0. .... .... = Truncated: Message is not truncated
.... ...0 .... .... = Recursion desired: Don't do query recursively
.... .... 0... .... = Recursion available: Server can't do recursive queries
.... .... .0.. .... = Z: reserved (0)
.... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
.... .... .... 0000 = Reply code: No error (0)
Questions: 1
Answer RRs: 1
Authority RRs: 2
Additional RRs: 3
Queries
www.yell.co.uk: type A, class IN
Name: www.yell.co.uk
Type: A (Host address)
Class: IN (0x0001)
Answers
www.yell.co.uk: type A, class IN, addr 194.72.108.2
Name: www.yell.co.uk
Type: A (Host address)
Class: IN (0x0001)
Time to live: 1 day
Data length: 4
Addr: 194.72.108.2
Authoritative nameservers
yell.co.uk: type NS, class IN, ns redgate2.yellowpages.co.uk
Name: yell.co.uk
Type: NS (Authoritative name server)
Class: IN (0x0001)
Time to live: 1 day
Data length: 23
Name server: redgate2.yellowpages.co.uk
yell.co.uk: type NS, class IN, ns redgate.yellowpages.co.uk
Name: yell.co.uk
Type: NS (Authoritative name server)
Class: IN (0x0001)
Time to live: 1 day
Data length: 10
Name server: redgate.yellowpages.co.uk
Additional records
redgate.yellowpages.co.uk: type A, class IN, addr 194.74.151.200
Name: redgate.yellowpages.co.uk
Type: A (Host address)
Class: IN (0x0001)
Time to live: 1 day
Data length: 4
Addr: 194.74.151.200
redgate2.yellowpages.co.uk: type A, class IN, addr 194.74.151.194
Name: redgate2.yellowpages.co.uk
Type: A (Host address)
Class: IN (0x0001)
Time to live: 1 day
Data length: 4
Addr: 194.74.151.194
<Root>: type OPT
Name: <Root>
Type: OPT (EDNS0 option)
UDP payload size: 4096
Higher bits in extended RCODE: 0x0
EDNS0 version: 0
Z: 0x8000
Bit 0 (DO bit): 1 (Accepts DNSSEC security RRs)
Bits 1-15: 0x0 (reserved)
Data length: 0
.



Relevant Pages

  • Re: major DNS hiccup
    ... Standard query A www.yell.co.uk ... User Datagram Protocol, Src Port: 60882, Dst Port: domain ... Authority RRs: 0 ...
    (comp.unix.bsd.freebsd.misc)
  • RE: Some technical errors
    ... If the SMTP server is not running on port 25 TCP it is not a public ... Manager - Computer Assurance Services BDO Chartered Accountants & ...
    (Security-Basics)
  • Re: SRV RRs support in Internet Explorer?
    ... The port number could be implicit (i.e. ... At any point in time, a server could fail ... can't effectively LB or backup because NSs cache the records for the TTL ... I still don't see how SRV records would help backup or LB. ...
    (microsoft.public.win2000.dns)
  • Re: Still cant connect to RWW or OWA remotely
    ... I get 'cannot find server or dns error' on both ... TCP [port number]> to open the ports. ... As for error messages when I fail to access RWW with the laptop, ... network, no connection seems possible. ...
    (microsoft.public.windows.server.sbs)
  • Re: cannot send mail from Windows mail
    ... When a username/password combination doesn't work in Windows Mail, ... I mean I dont use it but as outgoing address for my ISP account. ... youir username and password are correct for your mail server". ... Ask your home ISP if they support SMTP on a port other than 25. ...
    (microsoft.public.windows.vista.mail)