Blog Post

ECN explained: Navigate congestion for faster, smoother data delivery

Updated

Published

August 13, 2024

mins read

Akshita Agarwal

in this blog post

Heading 2

Fact: No one likes traffic congestion.

That’s why no one pines for the days before Google Maps.

Thanks to navigation apps on our phones and cars, we can see traffic updates that help us avoid busy roads during rush hour and reach our destinations faster.

The same logic applies to content delivered over the Internet.

Congestion on the web happens when data packets flood the network, causing delays and packet loss. Imagine if there was a way to be informed of congested paths, or even reduce the amount of traffic along a route altogether. By doing so, we could significantly enhance the end-user experience, ensuring faster, smoother, and more reliable data delivery.

Before diving into the solution, let’s first look at the traditional method of managing congestion.

How traditional TCP manages congestion

TCP (Transmission Control Protocol) is like a smart postal service that ensures all your messages get to the right place, in the right order, without any missing or messed up parts. This makes communication over the Internet reliable and accurate.

However, classic TCP has its problems. Ironically, it relies on congestion to detect and control the flow of data.

Here’s a breakdown:

Congestion window size: Classic TCP increases the congestion window size (the amount of data it sends) until it detects congestion.
Detection through packet loss: It detects congestion when packets are lost, which means the network is already congested.
Recovery mechanisms: When congestion is detected, TCP uses mechanisms like slow start to recover, which slows down data transmission.

The issue here is that TCP causes congestion to detect packet loss and adjust the transmission rate accordingly. While this works well, to build a good network with low latency, low loss, and scalable throughput (L4S), it would be better to avoid congestion altogether.

Enter Explicit Congestion Notification (ECN)

ECN introduces a more sophisticated approach to managing network congestion.

It’s an extension to the Internet Protocol (IP) and to the Transmission Control Protocol, as defined in RFC 3168. ECN enables devices to communicate about network congestion without losing packets, as long as components in the path support ECN and the network infrastructure is compatible.

Before diving into how ECN works, let’s understand its operation within TCP and IP.

How ECN works with TCP and IP

ECN with IP

ECN uses two reserved bits in the Traffic Class field of the IPv4 or IPv6 header. These bits can be set in different combinations to represent four distinct code points:

00 – Not ECN-Capable Transport, Not-ECT
01 – ECN Capable Transport(1), ECT(1)
10 – ECN Capable Transport(0), ECT(0)
11 – Congestion Experienced, CE.

These combinations indicate whether the data is ECN-capable and if congestion is experienced.

ECN with TCP

TCP uses two flags in its header to implement ECN.

The first flag, ECN-Echo (ECE), signals the sender to slow down by echoing back the congestion indication.

The second flag, Congestion Window Reduced (CWR), confirms that the congestion indication was received.

ECN step-by-step

Step 1: Negotiation.

The sender and receiver agree to use ECN during their initial TCP handshake.

A diagram of a computer hardware systemDescription automatically generated

When the sender transmits a data packet, it sets the ECT (ECN-Capable Transport) code in the IP header to signal the router that both sender and receiver support ECN and have agreed to use it. The ECT value can be either ECT(0) or ECT(1), both indicating that, in case of congestion, the router should mark the packet instead of dropping it. If ECN negotiation fails, the sender sets the Non-ECT code point in the packet.

Step 2: Detecting congestion.

An Internet router generally manages a set of queues, often one or more per interface, which store packets awaiting transmission on that interface. Traditionally, these queues follow a drop-tail discipline: a packet is added to the queue if it has not reached its maximum length (measured in packets or bytes), and otherwise, the packet is dropped. However, in this case, since the ECT bit is set in the IP header, the router cannot do so.

Diagram of a diagramDescription automatically generated

Instead, it changes ECT [0 1] to CE [1 1] in the IP header. This modified packet is sent to the receiver. Only the router can change ECT to CE (Congestion Experienced); no other entity has authority. Thus, the packet arrives with ECT and leaves with CE, showing a modification in the IP header.

Step 3: Notifying the sender.

The receiver detects the CE mark in the IP header, which indicates congestion. The receiver must then notify the sender about this congestion. To do this, the receiver marks the ECE (ECN-Echo) bit in the TCP header instead of the IP header to ensure this information is not altered by intermediary devices.

A diagram of a computer hardware processDescription automatically generated

The ECE bit has two purposes. During the initial handshake (when SYN/ACK bits are set), it signifies ECN negotiation. When set in the acknowledgement (ACK) packet, it tells the sender that the router is experiencing congestion.

Step 4: Reducing the congestion window.‍

When the sender receives the ECE mark, it reduces its congestion window (cwnd) will be cut by 50%. In case the ACK packet carrying the ECE information is dropped due to congestion, the receiver continues to send the ECE flag in subsequent packets until it gets acknowledgement from the sender.

Diagram of a computer hardware systemDescription automatically generated

The receiver is programmed to keep sending the ECE mark until the sender acknowledges it. This ensures that at least one ACK will reach the sender, informing it about the congestion.

Step 5: Confirming the reduction.

The sender then notifies the receiver that it has reduced the congestion window by setting the CWR (congestion window reduced) flag in the TCP header. If the packet encounters congestion again, the router can change the ECT to CE. Upon receiving the packet with the CWR flag, the receiver will stop sending the CE flag in subsequent ACK packets.

Diagram of a computer hardware systemDescription automatically generated with medium confidence

The sender will reduce the congestion window by half only once per ECE mark, not for each ECE. It verifies if the ECE pertains to the current congestion window; if it's a new ECE, the sender will then reduce the cwnd. This approach mirrors NewReno’s cwnd reduction during the fast recovery phase. The receiver stops sending ECE in subsequent ACK packets after it received CWR from the sender.

Why you should be implementing ECN

Using ECN offers several key advantages for improving network performance and user experience:

#1 - Improved throughput

ECN enhances application throughput by preventing the inefficiency of discarding data that has already made it across part of the network path.

#2 - No retransmission of lost packets

Some latency-sensitive applications, like UDP-based VoIP, interactive video, or real-time data, do not retransmit lost packets but can adjust their sending rate in response to congestion. These applications degrade significantly with packet loss and use methods like error correction, data duplication, or codec error concealment to counteract congestion effects. However, these methods add complexity, consume extra network capacity, and increase path latency during congestion.

ECN helps by decoupling congestion control from packet loss, allowing transports to reduce their rate before the application encounters loss. This minimizes the negative impact of loss-mitigation methods, improving user experience.

#3 - Decreased likelihood of RTO expiration

Reducing the likelihood of packet loss is crucial for reliable transport in applications that send bursts of segments and then go idle, either due to no more data or network constraints like flow or congestion control.

Standard transport recovery methods, such as Fast Recovery, often fail to handle the loss of the last segment(s) of a burst, known as "tail loss." The receiver, unaware of the missing segments, provides no feedback, so retransmission depends on a transport retransmission timer. This timer expiry results in the loss of network path state, resetting path estimates like RTT and reinitializing the congestion window, which degrades transport performance until it readapts to the path.

When congestion occurs at the end of a burst, an ECN-capable network device can mark the packet(s) with CE instead of dropping them. This prevents a retransmission timeout, reducing application latency and improving throughput for applications that send intermittent bursts and rely on timer-based recovery.

#4 - Minimized head-of-line blocking

With ECN, an application can keep receiving data during incipient congestion, avoiding the extra reordering delay in reliable transport. When a transport receives a CE-marked packet, it prompts the sender to appropriately reduce the maximum transmission rate for future traffic.

Advancing network diagnostics with Catchpoint’s enhanced traceroute capabilities

As lag-sensitive applications like AR/VR, streaming, gaming, Metaverse, and autonomous vehicles continue to grow in popularity, the demand for seamless performance increases. To ensure a superior network experience for your end users, Catchpoint’s Network Experience solution features Traceroute with the ability to track ECN settings. If a router on the network bleaches ECN during congestion, increased latency could occur. As an Internet service provider or a partner to one, you can monitor your infrastructure and track the ECN bleaching culprits. This allows you to test your L4S readiness with ECN traceroute. Additionally, to support the broader community, we’ve made this capability open-source, available on GitHub.

What makes Catchpoint’s Traceroute ECN different?

While many network monitoring solutions focus on similar areas, they often have several limitations, especially in areas like wireless last mile loads. For instance, many lack comprehensive observability related to ECN, which means they might miss critical ECN-related insights. Without complete ECN data, troubleshooting becomes difficult, like searching for a needle in a haystack. Network engineers using these solutions might feel blindfolded when congestion occurs, missing out on vital information needed to manage network intersections effectively.

Catchpoint enables you to perform traceroute tests to identify the path a probe takes from source to destination. This monitor type offers hop-by-hop insights, including latency, packet loss percentage, IP addresses, hostnames (if available), ASNs, and countries. When ECN is enabled, our traceroute tests can track the ECN status at each hop. Assuming both the source and destination are ECN-enabled, our tests help identify hops that set the ECN flag to ECT(0) or ECT(1). More importantly, you can detect hops that do not support ECN and reset the flag to Non-ECT.

The example below shows a single traceroute run and the exact hop at which ECN was bleached (when the ECN marking is dropped).

A screenshot of a computerDescription automatically generated

In addition to looking at individual runs, Catchpoint enables you to analyze trends overtime, and group data together based on various filters and breakdowns.

In the example below, over a sample period of one hour, about 32% of the traceroute requests have bleached ECN paths.

A screenshot of a graphDescription automatically generated

In the example below, traceroute trends are recorded for scenarios with and without ECN support. As shown, the overall ping round trip time is higher when ECN is not supported compared to when it is. Without ECN support, congestion leads to packet drops and retransmissions, potentially increasing latency. With ECN in place, packets are marked with a CE flag rather than being dropped. This approach helps reduce the congestion window without packet loss and identifies the exact location of congestion within the network.

A graph showing a number of different colored linesDescription automatically generated

In this dataset, when ECN is supported end to end, it is observed that overall round trip time is about ~ 47% faster.

Optimizing network performance: The impact of ECN and enhanced visibility on real-time applications

In an era where low latency and high reliability are crucial for applications like AR/VR, streaming, gaming, and autonomous vehicles, managing network congestion efficiently is more important than ever. ECN offers a sophisticated way to handle congestion by signaling it without dropping packets, thereby improving throughput, reducing latency, and enhancing overall user experience. Catchpoint’s Network Experience solution takes this a step further by providing full visibility into traffic routing decisions around network congestion. With advanced traceroute capabilities, network administrators can identify and address congestion points, ensuring smoother and faster data delivery. By leveraging these tools, organizations can stay ahead in delivering exceptional real-time applications and maintaining a superior network experience.

Read our report on global ECN bleaching rates and how it affects your network.