Intel EtherExpress 10/100 behavior at 10Mbs
There's a fix. (Nov 23 '99)
When using the Intel 10/100 (fxp device under FreeBSD) cards at 10 Mbs
they transfer packets to the card's buffer space in a very bursty manner.
Instead of constantly transfering packets from the OS queue to the card
as buffer space becomes available they seem to wait until there is 128K
(or 128 packets worth, not sure which) of space on the card and then transfer
packets as fast as possible until the card is full (or above some high-water
mark) and then ignore the queue until those 128 packets are gone.
In many cases you probably don't notice or care that this is how the card
works. However, there are a few situations where it does matter.
-
Monitoring the queue size:
-
When you are monitoring the queue size (e.g. in an active queue management
scheme) the queue length information loses its value. That is, the
queue behavior is no longer a good indicator of network conditions.
If you monitor the instantaneous queue length during a period of overload
you will see the software queue cycle between 80ms periods during which
no packets are sent and 20 ms periods during which the queue appears to
be empty. These 20ms periods are the periods when the device driver
is transferring all arriving packets to the card. The 80ms periods
are times when the card has something to send and is above its low-water
mark. Obviously these wild fluctuations in queue size do not indicate
the relationship between the load and capacity at the outbound link.
-
-
Concerns about drop patterns:
-
The other result of this behavior that is obvious at the end station
will be long chains of consecutive drops. This could result in drops
of consecutive data or acks (triggering time-out), as well as drops of
entire frames. Note that this is only an issue during overload of
course.
I have replaced daffy's connection to the 139 network with a 3COM 3C905B
Fast Etherlink III 10/100 cards which does NOT exhibit this behavior.
(There is some concern that these cards aren't quite as fast as the Intel
cards for 100Mb connections).
I have attached mail I sent Kevin and Don as I was figuring this out:
Don and Kevin,
Do you think it's very likely these 10/100 cards have 128K of memory that
sits above the "low-water mark" of the card I think I can explain the behavior
we are seeing. I think that the card sends an interrupt requesting
more data when it is below the low water mark and the kernel flushes the
queue plus keeps transferring data to the card for the next 20ms until
it reaches the high water mark at which point the card blocks and the queue
starts filling up again until the card drains its buffer down to the lower
water mark again.
Turning on the signals for enqueue and dequeue actions indicates that packets
are ALWAYS being enqueued but dequeues only happen during the brief period
we see the queue drain and while the queue depth is at zero (my program
wasn't showing enqueues when the queue was empty or dequeues that emptied
the queue).
I want to think about whether this means we need to increase the queue
size relevant to the buffer management policy on the card.
I'm running an experiment with queue length = 200. If I'm right the
behavior will be bursty but the queue will never fully drain.
Actually, it might even quit being bursty. Perhaps it only uses the
low-water mark if there's ever a time when there isn't anything to transfer
to the card. Ah-hah! (ran the experiment) My first theory was
right. Queue size of 200 and the queue fills to size 200 then oscillates
between 200 and 72 (128 packets = 128K of buffer space reloading only when
empty!) Seems like a poor device driver design or am I missing something?
And then later:
3COM 3C905B Fast Etherlink XL 10/100BaseTX is a much better card (or
its device driver is better). (Note: this is a 10/100 card so using
it doesn't create a need for more routers, etc.) I do still see periods
of about 20ms where the no packets are being dequeued though, but nowhere
near as frequently.
Using that card with UDP traffic I maintain a full queue, no spikes.
With TCP traffic I see much more reasonable performance.
So, sounds like we should consider buying 3COM cards instead of Intel
EtherExpress in the future and consider the impact they are having on our
current set up. They are in use in almost every machine now.
However, I think that if you aren't interested in actually monitoring
the queue size it shouldn't be that much of an issue. They still
send as fast as the link will allow and if you're on an end-station running
TCP your buffering and blocking operations shouldn't be adversely effected,
right?
Fix
This problem is believed to be caused by too much buffer space on the
card. The default is 128. Change it by changing the following value in
[your favorite source directory]/pci/if_fxp.c:
/*
* Number of transmit control blocks. This determines the number
* of transmit buffers that can be chained in the CB list.
* This must be a power of two.
*/
#define FXP_NTXCB 128
Thanks to Jan Justensen (justesen@cs.auc.dk) and Jesper Krogh
(jkrogh@cs.auc.dk) for finding this.
Other
DiRT documents
Author: Mark Parris