1.0 Network Traffic Control
Computer network traffic is the data moving across the network at a given time. Network bandwidth is the maximum amount of data that can be transmitted over the network in unit time. Computer networks face the possibility of network congestion, where excessive data is trying to pass through the network and there might be delays in packet transmission, packet loss, blocking of new connections and, in general, a degraded quality of service. Network traffic control is the process of regulating network traffic so as to reduce network congestion, latency and packet loss.
The term Quality of Service (QoS) generally means overall performance of a service as perceived by the users of the service. Certain predefined parameters are measured in real time to arrive at a quantitative measure of the quality of service. In computer networking, quality of service refers to traffic prioritization and resource reservation rather than the overall performance quality achieved for the service. Quality of service in packet switched networks involves providing different priorities to users or data flows and ensuring a certain level of performance for a data flow. QoS can be implemented using the tc command in Linux.
2.0 Queues
The incoming and outgoing packets are queued before these are received or transmitted respectively. The queue for incoming packets is known as the ingress queue. Similarly, the queue for outgoing packets is called the egress queue. We have more control over the egress queue as it has packets generated by our host. We can re-order these packets in the queue, effectively favoring some packets over the rest. The ip -s link command gives the queue capacity (qlen) in number of packets. If the queue is full and more packets come; they are discarded and are not transmitted. The ingress queue has packets which have been sent to us by other hosts. We can not reorder them; the only thing we can do is to drop some packets, indicating network congestion by not sending the TCP ACK to the sending host. The sending host gets the hint and slows down transmission of packets to us. For UDP packets, this does not work. If UDP packets are dropped, they are simply lost as there is no ACK and re-transmission.
3.0 TRAFFIC CONTROL ELEMENTS
3.1 SHAPING
Shaping involves delaying the transmission of packets to meet a certain data rate. This is the way we ensure that the output data rate does not exceed the desired value. Shapers can also smooth out the bursts in traffic. Shaping is done at egress.
3.2 SCHEDULING
Scheduling is deciding which packet would be transmitted next. This is done by rearranging the packets in the queue. The objectives are to provide a quick response for interactive applications and also to provide adequate bandwidth for bulk transfers like downloads initiated by remote hosts. Scheduling is done at egress.
3.3 POLICING
Policing is measuring the packets received on an interface and limiting these to a particular value. The packets might be reclassified or dropped. Policing is done at ingress.
3.4 DROPPING
After the traffics exceeds a predefined value, the packets are simply dropped. This is done both at the ingress and the egress.
4.0 TRAFFIC CONTROL COMPONENTS
4.1 qdiscs
qdisc is an abbreviation for Queue Discipline. A qdisc is the packet scheduling code that is attached to a network interface. qdiscs are implemented as modules, which are inserted in the kernel at the run time. A qdisc can drop, forward, queue, delay or re-order packets at a network interface. tc is a user space program for managing qdiscs for network interfaces. The other terms used for qdisc are Packet Scheduler, queuing algorithm and the packet scheduler algorithm.
The kernel sends (en-queues) packets received on an network interface to a qdisc. Similarly, the kernel takes (de-queues) packets from a qdisc for transmission on a network interface.
qdiscs are of two types, classful qdiscs, which contain classes, and classless qdiscs, which don't.
4.2 Classes
A Class is a sub-qdisc. A class may contain another class. Using classes, we can configure the QoS in more detail. When packets are received by a qdisc, these may be queued in inner qdiscs in classes. When the kernel wants the packets for transmission, the packets of certain classes might be given ahead of others, thereby prioritizing certain types of traffic.
4.3 Filters
When a qdisc with classes receives a packet, it needs to decide in which class it has to be enqueued. It needs to be classified. Filters are used for classification of packets. A filter must contain a classifier phrase. The most common classifier used by filters is the u32 classifier which is used by filers for selecting packets based on packet attributes.
5.0 Using the tc command
The tc command has sub-commands to add, change, replace and delete qdiscs, classes and filters. Also, there is the show sub-command to give details of currently configured objects. For example, running the tc -s qdisc show command on a desktop running Linux,
$ tc -s qdisc show dev eth0 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 8728071 bytes 59911 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0
pfifo_fast is the default qdisc for all interfaces in Linux. It is a FIFO qdisc with prioritization. It has three bands, FIFO 0, FIFO 1 and FIFO 2. The band 0 is for traffic from interactive applications, where we wish to minimize the delay. The band 1 if for best effort
, and is the normal
service. Band 2 is for bulk data transfers, where the goal is to maximize the throughput and minimize the monetary cost. The packets are put in one of the three bands based on the value of the ToS field in the packet. First, all the packets in FIFO 0 are transmitted. When there are no packets left in FIFO 0, packets in FIFO 1 are transmitted. Lastly, packets in FIFO 2 are transmitted.
6.0 Bufferbloat
Network latency is the time taken by a packet to reach from one end of the connection to another. A typical TCP connection between a sender and a receiver passes through many devices and has many links of varying bandwidth. There are buffers at each processing unit in the communication path so that packets arriving can be stored while the communication link is being used for transmission. Buffers are a necessary part of the communication pipe and help in making effective use of the communication link. But, as the network devices have got more and more RAM and also due to the misguided objective of preventing packet loss to the maximum extent possible, the buffer sizes have increased to high values. The result is that the communication pipe has buffers of unreasonably big sizes at intermediate devices. These buffers get filled and obstruct the normal functioning of the TCP. TCP uses end to end signalling of packet loss, but because of bloated buffers, the information gets delayed. Also, since the buffers along the communication path get filled, the packets for high priority as well as normal
traffic can't go through. This results in very high network latency. As it is caused by big buffers filled with in-flight
network packets, it is called bufferbloat.
The solution is Active Queue Management (AQM) at hosts and routers. This involves managing the queue in buffers so that the packet queue is kept at reasonable limits and also signalling the sender TCP to slow down by dropping or marking packets in case the queue grows fast. The tc -s qdisc show command on a router running OpenWRT gives the following output,
# tc -s qdisc show dev eth0 qdisc fq_codel 0: root refcnt 2 limit 1024p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn Sent 395154953 bytes 464130 pkt (dropped 0, overlimits 0 requeues 4) backlog 0b 0p requeues 4 maxpacket 1414 drop_overlimit 0 new_flow_count 2 ecn_mark 0 new_flows_len 0 old_flows_len 0
Here FQ_CoDel qdisc is being used. FQ_CoDel stands for Fair Queuing (FQ) with Controlled Delay (CoDel) Active Queue Management scheme. FQ_CoDel maintains a fair queue by having a number of FIFO queues and using a hash function to dispatch incoming packets to one of the queues. Each of these queues are monitored by the CoDel queue discipline. CoDel tries to control the delay of packets to a certain value, say 5 msec by default. It examines the head of each queue and drops the packets which have been there for long. With the packets dropped and the delay controlled, the TCP congestion control mechanisms are able to do their work. We can set FQ_CoDel qdisc for the eth0 interface in the earlier desktop system,
$ sudo tc qdisc add dev eth0 root fq_codel $ tc -s qdisc show dev eth0 qdisc fq_codel 8001: root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn Sent 1806 bytes 20 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 256 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 $
The tc qdisc del command deletes the current qdisc and restores the default pfifo_fast qdisc.
$ sudo tc qdisc del dev eth0 root $ tc -s qdisc show dev eth0 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 528 bytes 8 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0
7.0 Traffic Control with tc command
7.1 Handle
All qdiscs and classes have individual id which has the format m:n, where m is the major number and n is the minor number. Both m and n are limited to 16 bits. The id is used as the handle in the tc command. For a qdisc, the minor number is 0. For a class, the major number is that of the qdisc that the class belongs to. So if a handle's minor number is 0, it is the id of a qdisc. Otherwise, it is an id of a class whose qdisc is identified by its major number. The root qdisc has the handle 1:0. The handle ffff:0 is reserved for the ingress qdisc.
7.2 Root qdisc
Each network interface has an egress root qdisc with the handle 1:0. As the name suggests, it is the root of the tree of qdiscs. Sub-qdiscs are known as classes. So, at the top of the tree, we have the root qdisc. The other nodes are classes. The kernel interacts with the root. It enqueues packets to the root. Similarly, it dequeues packets from the root. The packets may get classified to one of the classes, down the line. The classification is done by filters attached to a classful qdisc. Traffic control on the egress of an interface boils down, in effect, to building up (or down, as tress grow down here) this tree.
8.0 Example
Suppose we wish to reduce the bandwidth for wireless users in general and reduce it further for a particular user, identified by the IP address. We can run the following commands on the router.
# tc qdisc add dev wlan0 root handle 1:0 hfsc default 1 # tc class add dev wlan0 parent 1:0 classid 1:1 hfsc sc rate 1mbit ul rate 1mbit # tc class add dev wlan0 parent 1:0 classid 1:2 hfsc sc rate 400kbit ul rate 400kbit # tc filter add dev wlan0 protocol all parent 1: u32 match ip dst 192.168.2.157 flowid 1:2
We added the HFSC qdisc as the root qdisc to the wireless interface and set its default class 1:1. Any packet that is not classified would be sent to class 1:1. By default, HFSC drops all packets that are not classified and that is the reason for the default class. We set the bandwidth limit of 1Mbps for the default class, which, in effect, becomes the default bandwidth for wireless. Now we make one more class, 1:2, and set its bandwidth to 400 kbps. Finally, we set a filter to the root qdisc to match the IP, for which we want to reduce bandwidth, and send it to flow id 1:2, which is the class id of the relevant qdisc.
Very happy to see this. Have you seen cake yet? (man tc-cake)
Thanks. Will definitely do that.