Re: packet drop with intel gigabit / marwell gigabit



[No subject in first one, sorry for repost]

Jin Guojun [VFFS] wrote:

You are fast away from the real world. This has been explained million
times, just like
I teach intern student every summer :-)

First of all, DDR400 and 200 MHz bus mean nothing -- A DDR 266 + 500MHz
CPU system
can over perform a DDR 400 + 1.7 GHz CPU system.

Given the same chipset+motherboard, no. DDR400 has more bandwidth and a smaller latency. Given different chipsets/motherboards, this may be true. However, one could also say with accuracy that a 500 Mhz processor can outperform the same family running at 1.7 GHz under some conditons but few people will run to buy 500 Mhz over 1.7 GHz for performance alone.

Another example:
Ixxxx 2 CPU was designed with 3 level caches. Supposedly
Level 1 to level2 takes 5 cycles
Level 2 to level 3 takes 11 cycles
What you expect CPU to memory time (cycles) -- CPU to level-1 is one
cycle ?
you would expect 17 cycles to 20 cycles of total. But it actually
takes 210 cycles
due to some design issues.
Now your 1.6 GB/s reduced to 16MB/s or even worse just based on this
factor.

1.6 Gb/s = system bus bandwidth. Cache won't affect this bandwidth. DDR400 has 400 MB/s: only attainable for long sequential accesses of either read or write but not a mix of both. DMA should be able to get near this limit (long and sequential, read or write only per transfer). A NIC with bus mastering DMA should be able to effectively use the memory bandwidth.

Number of other factors affect memory bandwidth, such as bus arbitration.
Have you done any memory benchmark on a system before doing such simple
calculation?

No, they are just theoretical values telling you the limits of performance. I asume that a decent implementation can get 75% of the theoretical limit at least some of the time under good conditions (like DMA).


Secondly, DMA moves data from NIC to mbuf, then who moves data from mbuf
to user buffer?
Not human. It is CPU. When DMA moving data, can CPU moves data
simultaneously?
DMA takes both I/O bandwidth and memory bandwidth. If your system has
only 16 MB/s
memory bandwidth, your network throughput is less 8 MB/s, typically
below 6.4 MB/s.
If you cannot move data fast enough away from NIC, what happens?
packet loss!

True, but would this type of packet loss even be measured by the OS? Packet loss to the OS means some packets were dropped from the software portion of the network stack right? This means that the NIC has no problems delivering it to the OS and the OS has problems delivering it to the user process.

You are arguing that the bandwidth is not sufficient for the processor to do this copy out (or page loan out = zero copy, only memory management tricks) and the software has to drop packets from mbufs when more packets arrive for UDP. Enough bandwidth is theoretically available for this (much more than required), it may or may not be true that the actual sustained bandwidth is insufficient. I don't think that 1/4 of the bandwidth is actually available for any reasonable (i.e. not junk) system.


That is why his CPU utilization was low because there was no much data
cross CPU.
So, that is why I asked him what is the CPU utilization first, then the
chipset. This is
the basic steps to diagnose network performance.
If you know a CPU and chipset for a system, you will know the network
performance
ceiling for that system, guaranteed. But it does not guarantee you can
get that ceiling
performance, especially over OC-12 (622 Mb/s) high-speed networks. That
requires
intensive tuning knowledge for current TCP stack, which is well
explained on the Internet
by searching for "TCP tuning".

In this case, bandwidth should not factor in (16 MB/s is low, disks can regularly double this easily). The 1 Gb/s NIC is not being fully used in this case (< 40 MB/s) and the processor is mostly idle.


_______________________________________________
freebsd-performance@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "freebsd-performance-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • (no subject)
    ... Level 2 to level 3 takes 11 cycles ... It is CPU. ... only 16 MB/s ... the basic steps to diagnose network performance. ...
    (freebsd-performance)
  • Re: Atmel releasing FLASH AVR32 ?
    ... A dual thread 40 MHz CPU can replace two 20 MHz CPUs. ... that a thread can only run max 1/2 or 1/3rd of the cycles ... switch at the start of the pipeline, ... equivalent to the interrupt latency. ...
    (comp.arch.embedded)
  • Re: Apple II Disk Drive Question
    ... derived from the Apple II CPU clock which runs at ... which will write one bit every four CPU cycles, ... adjusting the speed of the two drives to create the necessary ... know the rotation speed of both the writing and reading drives, ...
    (comp.sys.apple2)
  • Re: Apple II Disk Drive Question
    ... which will write one bit every four CPU cycles, ... disk is spinning. ... adjusting the speed of the two drives to create the necessary ...
    (comp.sys.apple2)
  • Re: interactive task starvation
    ... Where exactly are those extra cycles going I wonder? ... blows my mind though for reasons I've just said. ... in other processes which are starvating the CPU (eg: ... as no other workload has been identified. ...
    (Linux-Kernel)