Re: Handling 100.000 packets/sec or more

From: Tom Pavel (pavel_at_NetworkPhysics.COM)
Date: 01/14/04

  • Next message: ISAAC GELADO FERNANDEZ: "Re: Routing Networks"
    To: richard@wendland.org.uk
    Date: Wed, 14 Jan 2004 14:04:19 -0800
    
    

    >>>>> On Wed, 14 Jan 2004, Richard Wendland <richard@starburst.demon.co.uk> wri
    tes:

    > > device polling(8) really does help _alot_ for packet floods/storms.
    > > for device polling to work properly (imho) you would need to set HZ
    > > to 1000.
    > > I dont recommend any higher HZ on a PIII.
    >
    > Incidentally, setting HZ > 1000 would cause FreeBSD TCP to not comply
    > with RFC1323, as it would make the TCP timestamp option clock tick faster
    > than 1ms. RFC1323 4.2.2 specifies the clock rate to be in the range
    > 1 ms to 1 sec per tick.
    >
    > Really the TCP timestamp option clock should be divorced from HZ before
    > too long, as a time will come when people will want HZ > 1000.
    >
    > Actually a bit faster tick-rate is unlikely to run into much trouble in
    > practice, but it will cause the PAWS algorithm to stop a long running
    > TCP connection, see 4.2.3 of RFC1323.
    >
    > Richard

    The PAWS thing is real. Idle SSH or telnet connections can easily get
    hosed by wraparound if you crank up HZ too much. We encountered this
    at Network Physics.

    I had been meaning to submit a PR about this (and probably several
    others as well) for quite a while now, but I always got distracted by
    some other urgent matter... However, given the prod, I was able to
    dig up the fix we used for this particular problem. Pretty sure these
    diffs will not apply cleanly, even to -stable, but no doubt the gist
    of the idea should be clear enough. Hopefully, this can save someone
    some work on getting a fix into the tree.

    Tom Pavel

    Network Physics
    pavel@networkphysics.com / pavel@alum.mit.edu

    Index: tcp_input.c
    ===================================================================
    RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_input.c,v
    retrieving revision 1.41
    retrieving revision 1.42
    diff -u -r1.41 -r1.42
    --- tcp_input.c 2 Apr 2002 23:27:33 -0000 1.41
    +++ tcp_input.c 3 Apr 2002 22:24:24 -0000 1.42
    @@ -1185,7 +1185,7 @@
                      */
                     if ((to.to_flag & TOF_TS) != 0 &&
                        SEQ_LEQ(th->th_seq, tp->last_ack_sent)) {
    - tp->ts_recent_age = ticks;
    + GETCURTS(tp->ts_recent_age);
                             tp->ts_recent = to.to_tsval;
                     }
     
    @@ -1228,9 +1228,12 @@
                              && ((!(sack_check(tp))) ||
                                  to.to_tsecr)
     #endif
    - )
    - tcp_xmit_timer(tp, ticks - to.to_tsecr + 1);
    - else {
    + ) {
    + u_long cur_ts, rtt_ticks;
    + GETCURTS(cur_ts);
    + rtt_ticks = TSTMPTOTICK (cur_ts - to.to_tsecr);
    + tcp_xmit_timer(tp, rtt_ticks + 1);
    + } else {
     #ifdef LTSTMP
                                 tcp_xmit_timer(tp, tp->t_rtttime);
     #else
    @@ -1941,9 +1944,11 @@
              */
             if ((to.to_flag & TOF_TS) != 0 && tp->ts_recent &&
                 TSTMP_LT(to.to_tsval, tp->ts_recent)) {
    + u_long cur_ts;
     
                     /* Check to see if ts_recent is over 24 days old. */
    - if ((int)(ticks - tp->ts_recent_age) > TCP_PAWS_IDLE) {
    + GETCURTS(cur_ts);
    + if ((int)(cur_ts - tp->ts_recent_age) > TCP_PAWS_IDLE) {
                             /*
                              * Invalidate ts_recent. If this segment updates
                              * ts_recent, the age will be reset later and ts_recent
    @@ -2120,7 +2125,7 @@
              */
             if ((to.to_flag & TOF_TS) != 0 &&
                 SEQ_LEQ(th->th_seq, tp->last_ack_sent)) {
    - tp->ts_recent_age = ticks;
    + GETCURTS(tp->ts_recent_age);
                     tp->ts_recent = to.to_tsval;
             }
     
    @@ -2754,9 +2759,12 @@
                   /* bug fix from Mark Allman */
                     && ((!sack_check(tp)) || to.to_tsecr)
     #endif
    - )
    - tcp_xmit_timer(tp, ticks - to.to_tsecr + 1);
    - else {
    + ) {
    + u_long cur_ts, rtt_ticks;
    + GETCURTS(cur_ts);
    + rtt_ticks = TSTMPTOTICK (cur_ts - to.to_tsecr);
    + tcp_xmit_timer(tp, rtt_ticks + 1);
    + } else {
     
     #ifdef LTSTMP /* use local timestamp */
                     tcp_xmit_timer(tp, tp->t_rtttime);
    @@ -3293,7 +3301,7 @@
                             if (th->th_flags & TH_SYN) {
                                     tp->t_flags |= TF_RCVD_TSTMP;
                                     tp->ts_recent = to->to_tsval;
    - tp->ts_recent_age = ticks;
    + GETCURTS(tp->ts_recent_age);
                             }
                             break;
     
    Index: tcp_output.c
    ===================================================================
    RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_output.c,v
    retrieving revision 1.32
    retrieving revision 1.33
    diff -u -r1.32 -r1.33
    --- tcp_output.c 3 Apr 2002 01:55:20 -0000 1.32
    +++ tcp_output.c 3 Apr 2002 22:24:24 -0000 1.33
    @@ -616,7 +616,8 @@
     
                      /* Form timestamp option as shown in appendix A of RFC 1323. */
                      *lp++ = htonl(TCPOPT_TSTAMP_HDR);
    - *lp++ = htonl(ticks);
    + GETCURTS(*lp);
    + *lp++ = htonl(*lp);
                      *lp = htonl(tp->ts_recent);
                      optlen += TCPOLEN_TSTAMP_APPA;
              }
    Index: tcp_seq.h
    ===================================================================
    RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_seq.h,v
    retrieving revision 1.2
    retrieving revision 1.3
    diff -u -r1.2 -r1.3
    --- tcp_seq.h 16 Jul 2001 18:18:44 -0000 1.2
    +++ tcp_seq.h 3 Apr 2002 22:24:24 -0000 1.3
    @@ -88,8 +88,19 @@
                 (tp)->iss
     #endif
     
    -#define TCP_PAWS_IDLE (24 * 24 * 60 * 60 * hz)
    - /* timestamp wrap-around time */
    +/* clock macros for RFC1323 timestamps */
    +#define TSTMP_UNITS (10) /* in ms (RFC1323 says 1-1000 ms) */
    +#define GETCURTS(ts) \
    + do { \
    + struct timeval tv; \
    + getmicrouptime(&tv); \
    + (ts) = (u_long)tv.tv_sec * 1000 + tv.tv_usec / 1000; \
    + (ts) /= TSTMP_UNITS; \
    + } while (0)
    +#define TSTMPTOTICK(ts) (((int64_t)(ts))*hz*TSTMP_UNITS/1000)
    +
    +#define TCP_PAWS_IDLE (24 * 24 * 60 * 60 * 1000/TSTMP_UNITS)
    + /* timestamp wrap-around time (24 days in 10ms units) */
     
     #ifdef _KERNEL
     extern tcp_cc tcp_ccgen; /* global connection count */
    _______________________________________________
    freebsd-isp@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-isp
    To unsubscribe, send any mail to "freebsd-isp-unsubscribe@freebsd.org"


  • Next message: ISAAC GELADO FERNANDEZ: "Re: Routing Networks"