Re: better MTU support...

From: Andre Oppermann (andre_at_freebsd.org)
Date: 09/21/04

  • Next message: Vlad GALU: "Re: Wierd tunnel+MTU issue"
    Date: Tue, 21 Sep 2004 01:57:54 +0200
    To: John-Mark Gurney <gurney_j@resnet.uoregon.edu>
    
    

    John-Mark Gurney wrote:
    >
    > Andre Oppermann wrote this message on Thu, Sep 09, 2004 at 19:05 +0200:
    >
    > Ok, finally got a switch (and gige cards, if_re needs work) capable of
    > jumbo frames..
    >
    > > John-Mark Gurney wrote:
    > > > In a recent experiment w/ Jumbo frames, I found out that sending ip
    > > > frames completely ignores the MTU set on host routes. This makes it
    > > > difficult (or next to impossible) to support a network that has both
    > > > regular and jumbo frames on it as you can't restrict some hosts to the
    > > > smaller frames.
    > >
    > > What you should do instead is to set the MTU on the interface to 9018
    > > or so and then have a default route with MTU 1500 for everything else.
    > > Now you can specify larger MTUs for hosts that support it.
    > >
    > > Otherwise you are opening a can of worms...
    >
    > This doesn't fix it, since the output still doesn't honor the mtu on
    > the route.. Note, I'm not testing tcp, only udp and icmp since I've
    > seen that TCP already works fine...
    > # netstat -rnWfinet
    > Routing tables
    >
    > Internet:
    > Destination Gateway Flags Refs Use Mtu Netif Expire
    > default 192.168.0.14 UGS 0 11 1500 em0
    > 127.0.0.1 127.0.0.1 UH 0 40 16384 lo0
    > 192.168.0 link#5 UC 0 0 9000 em0
    > 192.168.0.1 00:a0:c9:59:8b:6c UHLW 0 33 1500 em0 175
    > 192.168.0.3 00:0a:95:9e:8b:88 UHLW 0 1988 9000 em0 374
    > 192.168.0.14 00:a0:c9:31:30:5e UHLW 1 8 1500 em0 955
    > 192.168.0.20 00:07:e9:0d:aa:ca UHLW 0 18 9000 em0 187
    > 192.168.0.21 00:07:e9:0d:ad:06 UHLW 0 2 9000 lo0
    >
    > tcpdump output:
    > 16:02:14.311079 IP 192.168.0.21 > 192.168.0.1: icmp 5008: echo request seq 14
    > 16:02:15.320981 IP 192.168.0.21 > 192.168.0.1: icmp 5008: echo request seq 15
    > 16:04:54.720890 IP 192.168.0.21 > 128.223.122.47: icmp 5008: echo request seq 0
    > 16:04:55.727148 IP 192.168.0.21 > 128.223.122.47: icmp 5008: echo request seq 1
    > 16:05:02.288989 IP 192.168.0.21 > 192.168.0.20: icmp 5008: echo request seq 0
    > 16:05:02.289856 IP 192.168.0.20 > 192.168.0.21: icmp 5008: echo reply seq 0
    > 16:05:03.296481 IP 192.168.0.21 > 192.168.0.20: icmp 5008: echo request seq 1
    > 16:05:03.297282 IP 192.168.0.20 > 192.168.0.21: icmp 5008: echo reply seq 1
    >
    > So, as you can see, it's broken...
    >
    > with my patch, ip properly fragments the packets to machines with
    > smaller mtu...
    >
    > > > I now have a patch to ip_output that makes it obay the MTU set on the
    > > > route instead of that of the interface.
    > >
    > > Your patch corrects a problem in ip_output where a smaller MTU on an
    > > rtentry was ignored but that is only for the non-TCP cases. When you
    > > open a TCP session the MTU will be honored (see tcp_subr.c:tcp_maxmtu).
    > > If not it would be a bug.
    > >
    > > Could you try your large MTU setup again using the procedure I desribed
    > > above?
    > >
    > > That should solve your immediate problem.
    >
    > Nope, it doesn't...
    >
    > > For the general 'bug' in ip_output that it doesn't honour a smaller MTU
    > > on a route I'd like to do a more throughout fix. Routes should be
    > > created with MTU 0 if the MTU is not different from the if_mtu. Only
    > > in those cases where you want to have a lower MTU you set it. For cloned
    > > routes the MTU would be cloned from the parent. This range of changes is
    > > more intrusive. On top of that comes the new ARP code which will have a
    > > MTU field as well. This one is supposed to store different MTUs for mixed
    > > MTU L2 networks. How to transport the MTU information is a separate
    > > discussion.
    > >
    > > If the fix above works for you I'd like to do the real fix later (< end
    > > of year) and not change the current behaviour in ip_output at the moment.
    >
    > It wouldn't be hard to add to my patch the check to see if the route's
    > mtu is 0 and just use the if mtu... which then solves the ip part of
    > your more complete fix... Then when you finally fix the route/arp stuff
    > nothing else should be necessary...
    >
    > Sound good?

    Moving the check upwards as you have done in ip_output() works in your
    case but is not a real and clean fix. Ideally the routes should never
    have any MTU assigned to them unless someone explicitly sets it. So the
    MTU for the routes should always be zero and ignored. If it is zero then
    only the link MTU will be used. If there is an MTU on a route it should
    be observed not only for host routes (as you do in your patch) but also
    for network routes. Getting this right requires disabling the copying
    of the MTU when a route is cloned or created. We also have to check that
    all consumers of the MTU field in the kernel and userland can cope with
    zero MTU and these semantics (ignoring it).

    I'll get to doing that till end of the week. If get some of those earlier
    please send me the patches so we don't duplicate work. Then we have next
    week something ready to commit to 6-current.

    -- 
    Andre
    _______________________________________________
    freebsd-net@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-net
    To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
    

  • Next message: Vlad GALU: "Re: Wierd tunnel+MTU issue"

    Relevant Pages

    • Re: better MTU support...
      ... > Ok, finally got a switch (and gige cards, if_re needs work) capable of ... > smaller mtu... ... >> on a route I'd like to do a more throughout fix. ... >> routes the MTU would be cloned from the parent. ...
      (freebsd-arch)
    • Re: better MTU support...
      ... > frames completely ignores the MTU set on host routes. ... > regular and jumbo frames on it as you can't restrict some hosts to the ... What you should do instead is to set the MTU on the interface to 9018 ... Now you can specify larger MTUs for hosts that support it. ...
      (freebsd-arch)
    • Re: better MTU support...
      ... > frames completely ignores the MTU set on host routes. ... > regular and jumbo frames on it as you can't restrict some hosts to the ... What you should do instead is to set the MTU on the interface to 9018 ... Now you can specify larger MTUs for hosts that support it. ...
      (freebsd-net)
    • Re: OE6 problem in sending
      ... I haven't seen adjusting MTU value to be a fix for quite a while now! ... >> Hi Pa Bear, ... This is for PC1 cable connect to Router - already ... I tried your 2a Tips but could not find the right combination of MTU ...
      (microsoft.public.windows.inetexplorer.ie6_outlookexpress)
    • [GIT]: Networking
      ... More packet scheduler qdisc locking fixes from Jarek Poplawski. ... Fix by making sure we do xfrm_state_putcalls without the lock ... If you set the MTU of an interface below 68, ... iwlwifi erroneously uses GFP_DMA which will depleat the GFP_DMA ...
      (Linux-Kernel)