Fragmentation Revisited: For What It's Worth

Fragmentation Revisited: For What It's Worth Boeing Research & Technology

P.O. Box 3707 Seattle WA 98124 USA fltemplin@acm.org

I-D Internet-Draft Internet Protocol (IP) fragmentation and reassembly have served as core elements of the architecture from the very earliest days but they have been subject to negative publicity by studies that have declared them "harmful" and "fragile". These warning labels have resonated deeply within the community in a way that fosters the enemies of sound engineering: fear, uncertainty and doubt. This document revisits IP fragmentation and shows that a properly engineered alternative solution is both practical and necessary to provide a robust service for the future of Internetworking.

Internet Protocol (IP) fragmentation and reassembly have served as core elements of the architecture from the very earliest days but they have been subject to negative publicity by studies that have declared them "harmful" and "fragile". This is true for both IPv4 and IPv6 which differ only in the length of the IP Identification field. Beginning in the late 1980's, "Fragmentation Considered Harmful" inspired an investigation into path Maximum Transmission Unit (MTU) discovery that has only recently reached successful conclusions . Still more recently, "IP Fragmentation Considered Fragile" documented enough protocol and operational concerns to merit BCP status. However, these studies failed to observe that the shortcomings identified can be addressed by sound engineering solutions in support of a more robust service. The warning labels have instead inspired myths, folklore and urban legends that have caused deeply-embedded misgivings to carry forward from generation to generation in the Internet engineering community. This document examines IP fragmentation and reassembly within the context of the larger multi-layer Internetworking architecture for transferring data between peer end systems. A systematic examination instead of an exclusive focus on the IP layer services viewed in isolation is needed to set the appropriate context. The document concludes that an improved IP fragmentation and reassembly service is both practical and necessary while citing a specification that offers a robust alternative solution.

illustrates a common architecture where an Original Source host connects to End User Network (EUN) A and a Final Destination host connects to EUN B. An Encapsulation Source for EUN A in turn connects to an Encapsulation Destination for EUN B via a virtual link spanning any intermediate Internetworks.

Applications on the Original Source prepare user data buffers for presentation to lower layers. In environments where delay or disruption may be significant, user data buffers may be based on "bundles" according to the Delay Tolerant Networking (DTN) Bundle Protocol (BP) . The user data buffers (or bundles) may range in size from very small for interactive communications such as Internet telephony to very large for transactional services such as large file transfer. Original source applications present user data to transport layer protocols such as TCP , QUIC and others which divide them into segments. Traditional wisdom suggests that each segment should be no larger than the path MTU and that each IP packet should carry exactly one segment. But, a more robust IP fragmentation and reassembly service would permit segment sizes to exceed the path MTU and IP parcels would permit a single IP packet to carry multiple segments. The resulting services can support improved performance profiles for common operational environments. Following segmentation, the transport layer protocol may present either a single segment or a parcel of multiple segments to the network layer for packetization. If each packet will contain a single segment, the network layer applies traditional packetization procedures while engaging a fragmentation-like service known as Generic Segment Offload (GSO) to packetize multi-segment parcels if necessary. If each packet will contain multiple segments, the network layer instead prepares the packet with a multi-segment transport layer parcel to form an IP parcel. The network layer then applies IP fragmentation if necessary and forwards the IP packets or fragments into EUN A. Standard IP forwarding within the local EUN will then deliver the packets/fragments either directly to the Final Destination or to an Encapsulation Source when the Final Destination resides in a remote EUN. Unlike the Original Source, the Encapsulation Source operates only at the network layer and below. The Encapsulation Source encapsulates any original packets or fragments bound for a remote destination according to or then forwards the resulting packets to an appropriate Encapsulation Destination as an adaptation layer service. As a source, the Encapsulation Source can apply adaptation layer IP fragmentation following encapsulation to ensure that the resulting fragments will not be lost due to a size restriction in an intermediate Internetwork. The Encapsulation Destination is then responsible for reassembling these fragments. After the Encapsulation Destination reassembles at the adaptation layer, it decapsulates to obtain the original IP packet or fragments which it then forwards to the Final Destination. The Final Destination then reassembles at the network layer if necessary using IP reassembly and/or Generic Receive Offload (GRO) according to the manner in which the Original Source applied fragmentation. The Final Destination then delivers the resulting segment or parcel to the transport layer which delivers the resulting user data to the application layer.

To efficiently engage fragmentation, each source requires a means to determine the per-flow path MTU. From the above architecture, the contributing elements include: 1) the path MTU from the Original Source to the Encapsulation Source, 2) the path MTU from the Encapsulation Source to the Encapsulation Destination, and 3) the path MTU from the Encapsulation Destination to the Final Destination. In common use cases, however, EUNs often comprise limited domains with robust link MTUs while intervening Internetworks may be arbitrarily complex including heterogeneous links with widely varying MTUs possibly as small as the minimum IP link MTU. More specifically, the EUN path MTUs are visible at the network layer while the path MTU between the encapsulation endpoints is visible at the adaptation layer and presents the appearance of a single virtual link to the network layer. This suggests a multi-layer path MTU probing discipline is required, where both the Original Source probes the Final Destination at the network layer and the Encapsulation Source probes the Encapsulation Destination at the adaptation layer. The former allows the original source to determine the largest original IP packet or fragment size that can traverse the entire path to the destination while the latter allows the encapsulation source to determine the maximum adaptation layer fragment size for the encapsulation destination. In common deployments, the Original Source is often positioned close to or possibly even co-resident on the same physical platform as the Encapsulation Source. The same is true of the Encapsulation Destination and Final Destination. When the Original Source and Encapsulation Source are co-resident, multiple layers of fragmentation may be needed for the same original packet before the resulting fragment packets are transmitted over the physical or virtual data link media. When the Encapsulation Destination and Final Destination are co-resident, multiple layers of reassembly may be needed before the reassembled segments or parcels are delivered to upper layers. This may require the operating system to perform expensive buffer linearization following initial stages of reassembly before presenting them to additional stages. The original IP and encapsulation services could alternatively be located in separate virtual machines on the same physical platform so that each virtual machine engages only a single layer of fragmentation or reassembly. While the Encapsulation Source is fragmenting at the adaptation layer, it should probe the forward path to the Encapsulation Destination to determine the largest fragment size that can traverse the intermediate Internetworks for each flow. At the same time, the Encapsulation Source can represent an unbounded MTU to the network layer in order to accommodate all original IP packets up to 65535 octets with fragmentation-assured delivery while allowing larger original IP packets to proceed without fragmentation based on best-effort delivery. Alternatively, the Encapsulation Source could refrain from performing fragmentation and adaptation layer path MTU probing while forwarding all encapsulated packets to the Encapsulation Destination based on best-effort delivery regardless of their size. This means that the network layer path MTU probing between the Original Source and Final Destination would also have the effect of probing the path between the Encapsulation Source and Destination over the intermediate Internetworks. The drawback of this approach is that the link MTUs in the intermediate Internetworks are often beyond the control of the source and destination endpoints and will often configure MTU sizes that are significantly smaller than those in the EUNs. Such an arrangement will often fail to benefit from the larger native MTU sizes of the EUNs.

Considerable evidence suggests that aspects of standard fragmentation procedures for both IPv4 and IPv6 introduce serious performance and/or security issues in some environments. For example, IPv4 provides only a 16-bit Identification field which means that a source must limit its transmission rate to avoid wrapping the Identification value within the Maximum Datagram Lifetime (MDL) which may be multiple orders of magnitude too slow for modern networks . IPv6 addresses this issue by providing a 32-bit Identification field, but reuse within the MDL is still possible when the source resets the Identification sequence frequently to avoid predictable values . Additionally, both IPv4 and IPv6 fragmentation can produce non final fragments of differing lengths as small as 8 octets (while the final fragment may be smaller still) and the offsets of later fragments may overlap with those of earlier fragments. This (offset, length) relationship further allows intermediate systems to perform gratuitous fragmentation on individual fragments in isolation without first (virtually) reassembling all fragments of the same packet. This is true for IPv4 even when the DF bit is set to 1 and also true for IPv6 even though the standard deprecates intermediate system fragmentation. Accordingly, there is no assurance that the fragments produced by the source will be the same size as the fragments that arrive at the destination. Finally, the number of fragments per packet is bounded only by the size of the original packet divided by the 8 octet minimum fragment size. For example, an 8KB packet could be fragmented into as many as 1024 fragments - far too many to support efficient reassembly procedures. Plus, the loss of a single fragment would result in retransmission of all fragments of the original packet. These issues suggest that the original IP fragmentation and reassembly design has shortcomings that can be addressed through a well-engineered alternative solution. However, the earlier publications raised alarms that resonated both broadly and deeply throughout the Internetworking industry. This has resulted in a state of paralysis where little progress to address the issues has been made even in the modern era.

As discussed above, standard IP fragmentation has a number of issues that have been widely known but left unaddressed for many decades. This condition can be corrected by applying the classic IETF process of problem statement leading to solution. This document in conjunction with "IP Fragmentation Considered Fragile" should therefore be considered as a problem statement, with a solution found in . The offered solution addresses IP fragmentation issues by establishing an Extended Fragment Header (EFH) intended for use instead of the standard IPv4 and IPv6 fragmentation headers and procedures. The EFH is based on a 64-bit Identification value, mandates no more than 64 fragments per packet, mandates a minimum non-final fragment size of 1024 octets, mandates an identical size for all non-final fragments and eliminates any possibility for fragment overlap. Additionally, intermediate systems are unable to alter the size of individual fragments in isolation without first (virtually) reassembling the entire packet. The size of the fragments produced by the source will therefore be the same as the size of fragments that arrive at the destination unless an intermediate system performs the onerous task of (virtual) reassembly and re-fragmentation. The EFH solution also has provisions for managing the loss unit with respect to the retransmission unit. In networks where loss is rare, this means that the source may send large packets at high data rates even if fragmentation with a non-final fragment size as small as 1024 octets is necessary. When loss becomes significant, the EFH solution provides a means for the destination to advise the source to reduce the size of its packets resulting in fewer fragments. The EFH solution is based on an IPv6 Destination Option that appears instead of the IPv6 Fragment Header. The EFH may also appear in IPv4 packets if they are able to transit the path. When (UDP)/IP encapsulation is applied, however, IP packets containing the EFH can often transit limited domains without loss at intermediate systems that filter packets with IPv6 extension headers.

Standard IP fragmentation has well known issues that were presented in ways that caused the community to become paralyzed with uncertainty rather than move forward in confidence according to the time-proven IETF process of problem statement leads to solution. While the earlier publications succeeded in articulating the issues, their titles inspired fear, uncertainty and doubt instead of promoting a well-balanced engineering approach toward a robust solution. The time for such a solution has now arrived.

This document is an informational problem statement and does not in itself request any IANA actions. IANA considerations can be found in the cited solution space document.

This document is an informational problem statement and does not in itself address security. Security considerations can be found in the cited solution space document.

Performance maximization efforts in the Internet engineering community have produced foundational improvements. Those who contributed are acknowledged. Honoring life, liberty and the pursuit of happiness.

Fragmentation Considered Harmful, SIGCOMM '87: Proceedings of the ACM workshop on Frontiers in computer communications technology, DOI 10.1145/55482.55524, http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-87-3.pdf. Retrospective on "Fragmentation Considered Harmful", ACM SIGCOMM Computer Communication Review, https://ccronline.sigcomm.org/wp-content/uploads/2019/10/acmdl19-328.pdf

The Encapsulation Source operates on already-packaged IP packets supplied by the Original Source and can therefore only adapt their sizes by applying IP fragmentation following encapsulation. The Encapsulation Destination in turn applies IP reassembly prior to decapsulation. Conversely, the Original Source and Final Destination apply packaging and reassembly at multiple architectural layers and can selectively apply or avoid IP fragmentation of the original IP packets. This section presents additional considerations for upper layer packaging. IP fragmentation and reassembly using the Extended Fragment Header (EFH) operate on IP packets that include one or more upper layer protocol (ULP) segment with the corresponding ULP headers. This means that even for very large ULP segments only a single instance of ULP headers appears in the resulting sequence of interdependent fragments even if the segment size exceeds the path MTU. Generic Segment Offload (GSO) with its counterpart Generic Receive Offload (GRO) are widely-known services that perform fragmentation and reassembly according to the same algorithm specified for the EFH but with ULP segment sizes no larger than the path MTU. GSO produces fragment sequences (independent packets, actually) that include a separate instance of the ULP headers in each packet instead of a single instance for the entire sequence. With a nominal ULP header size of 20 octets for TCP, this means that a 64-packet sequence would need to carry 1260 redundant octets for each GSO/GRO transaction - a significant increase in overhead. When transport layer security encapsulations such as TLS/SSL are present, the ULP header overhead is greater still. ULP use of EFH fragmentation and reassembly in contrast with GSO/GRO therefore requires an adaptive consideration of the packet loss profile for a given flow. Assuming a nominal path MTU (e.g., 1280 octets, 1500 octets, etc.) and with minimal packet loss, EFH with larger ULP segment sizes offers efficiency advantages in comparison with GSO/GRO with MTU-sized segment sizes. When packet loss levels increase, however, ULPs that use EFH should adaptively reduce their segment sizes to compensate. When loss levels become significant, ULPs that use both EFH and GSO/GRO may need to reduce their transmission rates until loss profiles improve. These adaptations are necessary to dynamically balance the flow's loss unit in relation to the retransmission unit under the current loss profile. For larger path MTUs (e.g., 4500 octets, 9000 octets, or larger still), the two services converge to offer similar performance profiles at segment sizes no larger than the path MTU, while EFH can advance to still larger segment sizes for improved efficiency. EFH can also transport IP parcels and Advanced Jumbos (following IP encapsulation) even if the underlying path does not support them natively.

<< RFC Editor - remove prior to publication >> Differences from earlier versions: First draft publication.