Intel EtherExpress 10/100 behavior at 10Mbs

There's a fix. (Nov 23 '99)

When using the Intel 10/100 (fxp device under FreeBSD) cards at 10 Mbs they transfer packets to the card's buffer space in a very bursty manner. Instead of constantly transfering packets from the OS queue to the card as buffer space becomes available they seem to wait until there is 128K (or 128 packets worth, not sure which) of space on the card and then transfer packets as fast as possible until the card is full (or above some high-water mark) and then ignore the queue until those 128 packets are gone. In many cases you probably don't notice or care that this is how the card works. However, there are a few situations where it does matter.

Monitoring the queue size:: When you are monitoring the queue size (e.g. in an active queue management scheme) the queue length information loses its value. That is, the queue behavior is no longer a good indicator of network conditions. If you monitor the instantaneous queue length during a period of overload you will see the software queue cycle between 80ms periods during which no packets are sent and 20 ms periods during which the queue appears to be empty. These 20ms periods are the periods when the device driver is transferring all arriving packets to the card. The 80ms periods are times when the card has something to send and is above its low-water mark. Obviously these wild fluctuations in queue size do not indicate the relationship between the load and capacity at the outbound link.
Concerns about drop patterns:: The other result of this behavior that is obvious at the end station will be long chains of consecutive drops. This could result in drops of consecutive data or acks (triggering time-out), as well as drops of entire frames. Note that this is only an issue during overload of course.

I have replaced daffy's connection to the 139 network with a 3COM 3C905B Fast Etherlink III 10/100 cards which does NOT exhibit this behavior. (There is some concern that these cards aren't quite as fast as the Intel cards for 100Mb connections).

I have attached mail I sent Kevin and Don as I was figuring this out:

Don and Kevin, Do you think it's very likely these 10/100 cards have 128K of memory that sits above the "low-water mark" of the card I think I can explain the behavior we are seeing. I think that the card sends an interrupt requesting more data when it is below the low water mark and the kernel flushes the queue plus keeps transferring data to the card for the next 20ms until it reaches the high water mark at which point the card blocks and the queue starts filling up again until the card drains its buffer down to the lower water mark again.

Turning on the signals for enqueue and dequeue actions indicates that packets are ALWAYS being enqueued but dequeues only happen during the brief period we see the queue drain and while the queue depth is at zero (my program wasn't showing enqueues when the queue was empty or dequeues that emptied the queue).

I want to think about whether this means we need to increase the queue size relevant to the buffer management policy on the card.

I'm running an experiment with queue length = 200. If I'm right the behavior will be bursty but the queue will never fully drain.   Actually, it might even quit being bursty. Perhaps it only uses the low-water mark if there's ever a time when there isn't anything to transfer to the card. Ah-hah! (ran the experiment) My first theory was right. Queue size of 200 and the queue fills to size 200 then oscillates between 200 and 72 (128 packets = 128K of buffer space reloading only when empty!) Seems like a poor device driver design or am I missing something?

And then later:

3COM 3C905B Fast Etherlink XL 10/100BaseTX is a much better card (or its device driver is better). (Note: this is a 10/100 card so using it doesn't create a need for more routers, etc.) I do still see periods of about 20ms where the no packets are being dequeued though, but nowhere near as frequently.

Using that card with UDP traffic I maintain a full queue, no spikes. With TCP traffic I see much more reasonable performance.

So, sounds like we should consider buying 3COM cards instead of Intel EtherExpress in the future and consider the impact they are having on our current set up. They are in use in almost every machine now. However, I think that if you aren't interested in actually monitoring the queue size it shouldn't be that much of an issue. They still send as fast as the link will allow and if you're on an end-station running TCP your buffering and blocking operations shouldn't be adversely effected, right?

Fix

This problem is believed to be caused by too much buffer space on the card. The default is 128. Change it by changing the following value in [your favorite source directory]/pci/if_fxp.c:

/*
 * Number of transmit control blocks. This determines the number
 * of transmit buffers that can be chained in the CB list.
 * This must be a power of two.
 */
#define FXP_NTXCB       128

Thanks to Jan Justensen (justesen@cs.auc.dk) and Jesper Krogh (jkrogh@cs.auc.dk) for finding this.

Other DiRT documents

Author: Mark Parris