<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
     please see http://xml.resource.org/authoring/README.html. -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="3"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="info" docName="draft-song-opsawg-ifit-framework-18"
     ipr="trust200902">
  <front>
    <title abbrev="IFIT">A Framework for In-situ Flow Information Telemetry</title>

    <author fullname="Haoyu Song" initials="H." surname="Song">
      <organization>Futurewei</organization>

      <address>
        <postal>
          <street>2330 Central Expressway</street>

          <city>Santa Clara</city>

          <country>USA</country>
        </postal>

        <email>haoyu.song@futurewei.com</email>
      </address>
    </author>

    <author fullname="Fengwei Qin" initials="F." surname="Qin">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street>No. 32 Xuanwumenxi Ave., Xicheng District</street>

          <city>Beijing, 100032</city>

          <country>P.R. China</country>
        </postal>

        <email>qinfengwei@chinamobile.com</email>
      </address>
    </author>

    <author fullname="Huanan Chen" initials="H." surname="Chen">
      <organization>China Telecom</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <country>P. R. China</country>
        </postal>

        <email>chenhuan6@chinatelecom.cn</email>
      </address>
    </author>

    <author fullname="Jaehwan Jin" initials="J." surname="Jin">
      <organization>LG U+</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <country>South Korea</country>
        </postal>

        <email>daenamu1@lguplus.co.kr</email>
      </address>
    </author>

    <author fullname="Jongyoon Shin" initials="J." surname="Shin">
      <organization>SK Telecom</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <country>South Korea</country>
        </postal>

        <email>jongyoon.shin@sk.com</email>
      </address>
    </author>

    <date day="6" month="September" year="2022"/>

    <area>Operation and Management Area</area>

    <workgroup>OPSAWG</workgroup>

    <keyword>IFIT</keyword>

    <abstract>
      <t>As network scale increases and network operation becomes more
   sophisticated, existing Operation, Administration, and Maintenance
   (OAM) methods are no longer sufficient to meet the monitoring and
   measurement requirements.  Emerging data-plane on-path telemetry
   techniques which provide high-precision flow insight and which issue
   notifications in real time can supplement existing proactive and
   reactive methods that run in active and passive modes. These new
   approaches are collectively known as in-situ flow information
   telemetry (IFIT). They enable quality of experience for users and
   applications, and identification of network faults and deficiencies.</t>

   <t>This document outlines a high-level framework for IFIT to collect and correlate performance measurement information from the network. It identifies the components that coordinate existing protocol tools and telemetry mechanisms, and addresses deployment challenges for flow-oriented on-path telemetry techniques, especially in carrier networks.</t>

   <t>The document is a guide for system designers applying the referenced techniques. It is also intended to motivate further work to enhance the OAM ecosystem.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>Efficient network operation increasingly relies on high-quality
      data-plane telemetry to provide the necessary visibility into the behavior of traffic flows and network resources. Existing
      Operation, Administration, and Maintenance (OAM) methods, which include
      proactive and reactive techniques, running both active and passive
      modes, are no longer sufficient to meet the monitoring and measurement
      requirements when networks becomes more autonomous <xref target="RFC8993" /> and
      application-aware <xref target="I-D.li-apn-framework" />. The complexity of today's networks and service
      quality requirements demand new high-precision and real-time OAM
      techniques.</t>

      <t>The ability to expedite network failure detection, fault
      localization, and recovery mechanisms, particularly in the case of soft
      failures or path degradation is expected, and it must not cause service
      disruption. Emerging on-path telemetry techniques can provide high-precision
      flow insight and real-time network issue notification (e.g., jitter,
      latency, packet loss, significant bit error variations, and unequal
      load-balancing). On-Path Telemetry (OPT) refers to data-plane telemetry
      techniques that directly tap and measure network traffic by embedding
      instructions or metadata into user packets. The data provided by on-path
      telemetry are especially useful for verifying Service Level Agreement (SLA) compliance, user experience
      enhancement, service path enforcement, fault diagnosis, and network
      resource optimization. It is essential to recognize that existing work
      on this topic includes a variety of on-path telemetry techniques,
      including <xref target="RFC9197">In-situ
      OAM (IOAM)</xref>, <xref target="I-D.ietf-ippm-ioam-direct-export">IOAM
      Direct Export (DEX)</xref>, <xref
      target="I-D.song-ippm-postcard-based-telemetry">Marking-based
      Postcard-based Telemetry (PBT-M)</xref>, <xref
      target="I-D.zhou-ippm-enhanced-alternate-marking">Enhanced Alternate
      Marking (EAM)</xref>, and <xref
      target="I-D.mirsky-ippm-hybrid-two-step">Hybrid Two-Step (HTS)</xref>,
      have been developed or proposed. These techniques can provide flow information on the entire
      forwarding path on a per-packet basis in real-time. The aforementioned
      on-path telemetry techniques differ from the active and passive OAM
      schemes in that they directly modify and monitor the
      user packets in networks so as to achieve high measurement accuracy.
      Formally, these on-path telemetry techniques can be classified as the
      OAM hybrid type I, since they involve "augmentation or modification of
      the stream of interest, or employment of methods that modify the
      treatment of the streams", according to <xref target="RFC7799"/>. We name these techniques as "In-situ Flow Information Telemetry" (IFIT).</t>

      <t>On-path telemetry is useful for application-aware networking
      operations, not only in data center and enterprise networks, but also in
      carrier networks which may cross multiple domains. The techniques
      can provide benefits for carrier network operators in various
      scenarios. For example, it is critical for the operators who offer
      high-bandwidth, latency and loss-sensitive services such as video
      streaming and online gaming to closely monitor the relevant flows in
      real-time as the basis for any further optimizations.</t>

      <t>This framework document is intended to guide system designers
      attempting to use the referenced techniques as well as to motivate
      further work to enhance the telemetry ecosystem. It highlights
      requirements and challenges, outlines important techniques that are
      applicable, and provides examples of how these might be applied for
      critical use cases.</t>

      <t>The document scope is discussed in <xref target="sec:scope"/>.</t>

      <section anchor="sec:mode"
               title="Classification and Modes of On-path Telemetry">
        <t>The operation of IFIT differs from both active OAM and
        passive OAM as defined in <xref target="RFC7799"/>. It does not
        generate any active probe packets or passively observe unmodified
        user packets. Instead, it modifies selected user packets in order to
        collect useful information about them. Therefore, the operation is
        categorized as the hybrid OAM type I method per <xref
        target="RFC7799"/>.</t>

        <t>This hybrid OAM type I method can be further partitioned into two modes <xref
        target="passport-postcard"/>. In the passport mode, each node on the
        path can add telemetry data to the user packets (i.e., stamps the
        passport). The accumulated data trace is exported at a configured end
        node. In the postcard mode, each node directly exports the telemetry
        data using an independent packet (i.e., sends a postcard) while the
        user packets are unmodified. It is possible to combine the two modes
        together in one solution. We call this the hybrid mode.</t>

        <t><xref target="figure_0"/> shows the classification of the
        on-path telemetry techniques.</t>

        <t><figure anchor="figure_0"
            title="On-path Telemetry Technique Classification">
            <artwork><![CDATA[
                
 +-----------+-------------+------------+--------------------+
 |  Mode     | Passport    | Postcard   | Hybrid             |
 +-----------+-------------+------------+--------------------+
 |           | IOAM Trace  | IOAM DEX   | Multicast Telemetry|   
 | Technique | IOAM E2E    | PBT-M      | HTS                |
 |           |             | EAM        |                    |
 +-----------+-------------+------------+--------------------+ 
	]]></artwork>
          </figure></t>

        <t>IOAM Trace and E2E options are described in <xref target="RFC9197"/>. </t>
		<t>EAM is described in <xref target="I-D.zhou-ippm-enhanced-alternate-marking"/>. </t>
		<t>IOAM DEX option is described in <xref target="I-D.ietf-ippm-ioam-direct-export"/>.</t>
        <t>PBT-M is described in <xref target="I-D.song-ippm-postcard-based-telemetry"/>. </t>
		<t>Multicast Telemetry is described in <xref target="I-D.ietf-mboned-multicast-telemetry"/>. </t>
        <t>HTS is described in <xref target="I-D.mirsky-ippm-hybrid-two-step"/>.</t>

        <t>The advantages of the passport mode include:</t>

        <t><list style="symbols">
            <t>It automatically retains the telemetry data correlation along
            the entire path. The self-describing feature simplifies the data
            consumption.</t>

            <t>The on-path data for a packet is only exported once so the data
            export overhead is low.</t>

            <t>Only the head and tail nodes of the paths need to be configured for header insertion and removal, so the configuration overhead is low.</t>
          </list></t>

        <t>The disadvantages of the passport mode include:</t>

        <t><list style="symbols">
            <t>The telemetry data carried by user packets inflate the packet
            size, which may be undesirable or prohibitive.</t>

            <t>Approaches for encapsulating the instruction header and data in
            transport protocols need to be standardized.</t>

            <t>Carrying sensitive data along the path is vulnerable to
            security and privacy breach.</t>

            <t>If a packet is dropped on the path, the data collected are also
            lost.</t>
          </list></t>

        <t>The postcard mode complements the passport mode. The advantages of
        the postcard mode include:</t>

        <t><list style="symbols">
            <t>Either there is no packet header overhead (e.g., PBT-M) or the
            overhead is small and fixed (e.g., IOAM DEX).</t>

            <t>The encapsulation requirement may be avoided (e.g., PBT-M).</t>

            <t>The telemetry data can be secured before export.</t>

            <t>Even if a packet is dropped on the path, the partial data
            collected are still available.</t>
          </list></t>

        <t>The disadvantages of the postcard mode include:</t>

        <t><list style="symbols">
            <t>Telemetry data are spread in multiple postcards so extra effort
            is needed to correlate the data.</t>

            <t>Every node exports a postcard for a packet which increases the
            data export overhead.</t>

            <t>In case of PBT-M, every node on the path needs to be
            configured, so the configuration overhead is high.</t>

            <t>In case of IOAM DEX, the transport encapsulation requirement remains.</t>
          </list></t>

        <t>The hybrid mode either tailors for some specific application
        scenario (e.g., Multicast Telemetry) or provides some alternative
        approach (e.g., HTS). A postcard can be sent per segment of a path or the telemetry data can be carried in a companion packet following each monitored use packet. The hybrid mode combines the advantages of both the passport mode and the postcard mode, but it may incur extra processing complexity.</t>
		
      </section>

      <section anchor="sec:challenge" title="Requirements and Challenges">
        <t>Although on-path telemetry is beneficial, successfully applying
        such techniques in carrier networks must consider performance,
        deployability, and flexibility. Specifically, we need to address the
        following practical deployment challenges:</t>

        <t><list style="symbols">
            <t>C1: On-path telemetry incurs extra packet processing which may
            cause stress on the network data plane. The potential impact on
            the forwarding performance creates an unfavorable "observer
            effect" (where the actions of performing on-path
            telemetry may change the behavior of the traffic being measured). This will not only damage the fidelity of the
            measurement, but also defy the purpose of the measurement.</t>

            <t>C2: On-path telemetry can generate a considerable amount of
            data which may claim too much transport bandwidth and inundate the
            servers for data collection, storage, and analysis. For
            example, if the technique is applied to all the traffic, one node may
            collect a few tens of bytes as telemetry data for each packet. The
            whole forwarding path might accumulate telemetry data with a size
            similar to or even exceeding that of the original packet.</t>

            <t>C3: The collectible data defined currently are essential but
			limited.  This, in turn, limits the management and operational
			techniques that can be applied.  Flexibility and extensibility
			of data definition, aggregation, acquisition, and filtering,
			must be considered.</t>

            <t>C4: Applying only a single underlying on-path telemetry
            technique may miss some important events or lead to incorrect results. For example, packet drop
            can cause the loss of the flow telemetry data and the packet drop
            location and reason remains unknown if only the In-situ OAM trace
            option is used. A comprehensive solution needs the flexibility to
            switch between different underlying techniques and adjust the
            configurations and parameters at runtime. Thus, system-level
            orchestration is needed.</t>

            <t>C5: We must provide solutions to support an incremental deployment
			strategy. That is, we need to support
            established encapsulation schemes for various predominant
            protocols such as Ethernet, IPv6, and MPLS with backward
            compatibility and properly handle various transport tunnels.</t>

            <t>C6: The development of simplified on-path telemetry primitives
            and models for configuration and queries is essential. Telemetry
            models may be utilized via an API-based telemetry service for
            external applications, for end-to-end performance measurement and
            application performance monitoring. Standard-based protocols
            and methods are needed for network configuration and programming,
            and telemetry data pre-processing and export, to provide
            interoperability.</t>
          </list></t>
      </section>

      <section anchor="sec:scope" title="Scope">
        <t>Following the network telemetry framework discussed in <xref
        target="RFC9232"/>, this document focuses on the on-path
        telemetry, a specific class of data-plane telemetry techniques, and
        provides a high-level framework which addresses the challenges for
		deployment listed in <xref target="sec:challenge" />, especially in carrier networks.</t>

        <t>This document aims to clarify the problem space, essential
        requirements, and summarizes best practices and general system design
        considerations. This document provides some examples to show the novel
        network telemetry applications under the framework.</t>

        <t>As an informational document, it describes an open framework with a
        few key components. The framework does not enforce any specific
        implementation on each component, neither does it define interfaces
        (e.g., API, protocol) between components. The choice of underlying
        on-path telemetry techniques and other implementation details is
        determined by the application implementer. Therefore, the framework is not
        a solution specification. It only provides a high-level overview and
        is not necessarily a mandatory recommendation for on-path telemetry
        applications.</t>

        <t>The standardization of the underlying techniques and interfaces
        mentioned in this document is undertaken by various working groups.
        Due to the limited scope and intended status of this document, it has
        no overlap or conflict with those works.</t>
      </section>

      <section title="Relationship with Network Telemetry Framework (NTF)">
        <t><xref target="RFC9232"/> describes a Network Telemetry
        Framework (NTF). One dimension used by NTF to partition network
        telemetry techniques and systems is based on the three planes in
        networks (i.e., control plane, management plane, and forwarding plane) 
		and external data sources. IFIT fits in the category of
        forwarding-plane telemetry and deals with the specific on-path
        technical branch of the forwarding-plane telemetry.</t>

        <t>According to NTF, an on-path telemetry application mainly
        subscribes to event-triggered or streaming data. The key functional
        components of IFIT described in <xref target="sec:comp" /> 
		match the general components in NTF with more specific functions. 
		"On-demand
        Technique Selection and Integration" is an application layer function,
        matching the "Data Query, Analysis, and Storage" component in NTF;
        "Flexible Flow, Packet, and Data Selection" matches the "Data
        Configuration and Subscription" component; "Flexible Data Export"
        matches the "Data Encoding and Export" component; "Dynamic Network
        Probe" matches the "Data Generation and Processing" component.</t>
      </section>


      <section title="Glossary">
        <t>This section defines and explains the acronyms and terms used in
        this document.</t>

        <t><list style="hanging">
            <t hangText="On-path Telemetry:">Remotely acquiring performance
            and behavior data about network flows on a per-packet basis on the
            packet's forwarding path. The term refers to a class of data-plane
            telemetry techniques, including IOAM, PBT, EAM, and HTS. Such
            techniques may need to mark user packets, or insert instruction/
            metadata into the headers of user packets.</t>

            <t hangText="IFIT:">In-situ Flow Information Telemetry is a
            high-level reference framework that shows how network data-plane
            monitoring and measurement applications can address the deployment challenges of
            the flow-oriented on-path telemetry techniques.</t>

            <t hangText="Reflective Telemetry:">The reflective telemetry functions in a dynamic and closed-loop fashion. A new telemetry action is provisioned as a result of self-knowledge acquired through prior telemetry actions.</t>
          </list></t>
      </section>

  <!--    <section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in BCP
        14 <xref target="RFC2119"/><xref target="RFC8174"/> when, and only
        when, they appear in all capitals, as shown here.</t>
      </section>  -->
	  
    </section>

    <section title="Architectural Concepts and Key Components">
      <t>To address the challenges mentioned in <xref target="sec:challenge"/>, a high-level framework
      which can help to build a workable and efficient on-path telemetry
      application is presented. In-situ Flow Information Telemetry (IFIT) is
      dedicated to on-path telemetry data about user and application traffic
      flows. It covers a class of on-path telemetry techniques and works at a
      level higher than any specific underlying technique. The framework is
      comprised of some key functional components (<xref target="sec:comp"/>).
      By assembling these components, IFIT supports reflective telemetry that
      enables autonomous network operations (<xref target="sec:app"/>).</t>

      <section title="Reference Deployment">
        <t><xref target="figure_1"/> shows a reference deployment scenario of
        on-path telemetry.</t>

        <t><figure anchor="figure_1" title="Deployment Scenario">
            <artwork><![CDATA[     

                  On-path Telemetry Application
               +----------------------------------+
               |            Controller            |
               | +------------+     +-----------+ | 
               | | Configure  |     | Collector | | 
               | |     &      |<----|     &     | | 
               | | Control    |     | Analyzer  | | 
               | +-----:------+     +-----------+ | 
               |       :                  ^       |
               +-------:------------------|-------+
                       :configuration     |telemetry data
                       :& action          | 
          .............:..................|.........
          :           :               :   |        :
          :   +-------:---+-----------:---+--------:---+
          :   |       :   |           :   |        :   |
          V   |       V   |           V   |        V   |
       +------+-+   +-----+--+     +------+-+   +------+-+  +--------+
packets| Head   |   | Path   |     | Path   |   | Tail   |  |Node out|
     =>| Node   |==>| Node   |=//=>| Node   |==>| Node   |=>|of OPT  |=>
       |        |   | A      |     | Z      |   |        |  |domain  |
       +--------+   +--------+     +--------+   +--------+  +--------+
  
       |<---       On-path Telemetry Domain          --->|
     ]]></artwork>
          </figure></t>

        <t>An on-path telemetry application can conduct network data-plane 
		monitoring and measurement tasks over a limited
        domain <xref target="RFC8799" /> by applying one or more underlying techniques. The application
        contains multiple elements, including configuring the network
        nodes and processing the telemetry data. The application usually uses
        a logically centralized controller for
        configuring the network nodes in the domain, and collecting and
        analyzing telemetry data. The configuration determines which
        underlying technique is used, what telemetry data are of interest,
        which flows and packets are concerned with, how the telemetry data are
        collected, etc. The process can be dynamic and interactive: after the
        telemetry data processing and analyzing, the application may instruct
        the controller to modify the configuration of the nodes, which affects
        the future telemetry data collection.</t>

        <t>From the system-level view, it is recommended to use
        standardized configuration and data collection interfaces, regardless
        of the underlying technique. The specification of these
        interfaces and the implementation of the controller are out of scope
        for this document.</t>

        <t>The on-path telemetry domain encompasses the head nodes and the end
        nodes, and may cross multiple network domains. The head nodes are
        responsible for enabling the on-path telemetry functions and the end
        nodes are responsible for terminating them. All capable nodes in this
        domain will be capable of executing the instructed on-path telemetry
        function. It is important to note that any application must, through
        configuration and policy, guarantee that any packet with on-path
        telemetry header and metadata will not leak out of the domain. </t>

        <t>The underlying on-path telemetry techniques covered by the IFIT
        framework can be of any modes discussed in <xref
        target="sec:mode"/>.</t>
      </section>



      <section anchor="sec:comp" title="Key Components">
        <t>The key components of IFIT to address the challenges listed in
   <xref target="sec:challenge" /> are as follows. The components are described in more
   detail in the sections that follow.</t>

        <t><list style="symbols">
            <t>Flexible flow, packet, and data selection policy, addressing
            the challenge C1 described in Section 1;</t>

            <t>Flexible data export, addressing the challenge C2;</t>

            <t>Dynamic network probe, addressing C3;</t>

            <t>On-demand technique selection and integration, addressing
            C4.</t>
          </list></t>

        <t>Note that the challenges C5 and C6 are mostly standard-related,
        and are fundamental to IFIT. We discuss the protocol implications
        and guidance for solution developers in <xref target="sec:gap"/>.</t>


        <section title="Flexible Flow, Packet, and Data Selection">
          <t>In most cases, it is impractical to enable data collection
          for all the flows and for all the packets in a flow due to the
          potential performance and bandwidth impact. Therefore, a workable
          solution usually need to select only a subset of flows and flow
          packets on which to enable data collection, even though this means the
          loss of some information and accuracy.</t>

          <t>In the data plane, a flow filter like those used for
          an Access Control List (ACL) provides an
          ideal means to determine the subset of flows. An application can
          set a sample rate or probability to a flow to allow only a subset of
          flow packets to be monitored, collect a different set of data for
          different packets, and disable or enable data collection on any
          specific network node. An application can further allow any node to
          accept or deny the data collection process in full or partially.</t>

          <t>Based on these flexible mechanisms, IFIT allows applications to
          apply flexible flow and data selection policies to suit their
          requirements. The applications can dynamically change the policies
          at any time based on the network load, processing capability, focus
          of interest, and any other criteria.</t>

          <section title="Block Diagram">
            <t><figure anchor="figure_3"
                title="Flexible Flow, Packet, and Data Selection">
                <artwork><![CDATA[
            +----------------------------+
            | +----------+  +----------+ |  
            | |Flow      |  |Data      | |
            | |Selection |  |Selection | |
            | +----------+  +----------+ |
            | +----------+               |
            | |Packet    |               |  
            | |Selection |               |
            | +----------+               |                             
            +----------------------------+         
          ]]></artwork>
              </figure></t>

            <t><xref target="figure_3"/> shows the block diagram of this
            component. The flow selection block defines the policies to choose
            target flows for monitoring. Flow has different granularity. A
            basic flow is defined by 5-tuple IP header fields. Flow can also
            be aggregated at interface level, tunnel level, protocol level,
            and so on. The packet selection block defines the policies to
            choose packets from a target flow. The policy can be either a
            sampling interval, a sampling probability, or some specific packet
            signature. The data selection block defines the set of data to be
            collected. This can be changed on a per-packet or per-flow
            basis.</t>
          </section>

          <section title="Example: Sketch-guided Elephant Flow Selection">
            <t>Network operators are usually more interested in elephant flows
            which consume more resource and are sensitive to changes in
            network conditions. A <xref target="CMSketch">CountMin
            Sketch</xref> can be used on the data path of the head nodes,
            which identifies and reports the elephant flows periodically. The
            controller maintains a current set of elephant flows and
            dynamically enables the on-path telemetry for only these
            flows.</t>
          </section>

          <section title="Example: Adaptive Packet Sampling">
            <t>Applying on-path telemetry on all packets of the selected flows can
            still be out of reach. A sample rate should be set for these flows
            and telemetry should only be enabled on the sampled packets. However, the
            head nodes have no clue on the proper sampling rate. An overly
            high rate would exhaust the network resource and even cause packet
            drops; An overly low rate, on the contrary, would result in the
            loss of information and inaccuracy of measurements.</t>

            <t>An adaptive approach can be used based on the network
            conditions to dynamically adjust the sampling rate. Every node
            gives user traffic forwarding higher priority than telemetry data
            export. In case of network congestion, the telemetry can sense
            some signals from the data collected (e.g., deep buffer size, long
            delay, packet drop, and data loss). The controller may use these
            signals to adjust the packet sampling rate. In each adjustment
            period (i.e., RTT of the feedback loop), the sampling rate is
            either decreased or increased in response of the signals. An Additive Increase/Multiplicative Decrease (AIMD)
            policy similar to the TCP flow control mechanism for rate
            adjustment can be used.</t>
          </section>
        </section>

        <section title="Flexible Data Export">
          <t>The flow telemetry data can catch the dynamics of the network and
          the interactions between user traffic and network. Nevertheless, the
          data may contain redundancy. It is advisable to remove the
          redundancy from the data in order to reduce the data transport
          bandwidth and server processing load.</t>

          <t>In addition to efficient export data encoding (e.g., <xref
          target="RFC7011">IPFIX</xref> or <eref
          target="https://developers.google.com/protocol-buffers/">protobuf</eref>),
          nodes have several other ways to reduce the export data by taking
          advantage of network device's capability and programmability. Nodes
          can cache the data and send the accumulated data in batches if the
          data is not time sensitive. Various deduplication and compression
          techniques can be applied on the batched data.</t>

          <t>From the application perspective, an application may only be
          interested in some special events which can be derived from the
          telemetry data. For example, in the case that the forwarding delay of a
          packet exceeds a threshold, or a flow changes its forwarding path is
          of interest, it is unnecessary to send the original raw data to the
          data collecting and processing servers. Rather, IFIT takes advantage
          of the in-network computing capability of network devices to process
          the raw data and only push the event notifications to the
          subscribing applications.</t>

          <t>Such events can be expressed as policies. A policy can request
          data export only on change, on exception, on timeout, or on
          threshold.</t>

          <section title="Block Diagram">
            <t><figure anchor="figure_4" title="Flexible Data Export">
                <artwork><![CDATA[
            +-------------------------------------------+
            | +-----------+ +-----------+ +-----------+ | 
            | |Data       | |Data       | |Export     | | 
            | |Encoding   | |Batching   | |Protocol   | |
            | +-----------+ +-----------+ +-----------+ |
            | +-----------+ +-----------+ +-----------+ |
            | |Data       | |Data       | |Data       | |
            | |Compression| |Dedup.     | |Filter     | |
            | +-----------+ +-----------+ +-----------+ |   
            | +-----------+ +-----------+               |
            | |Data       | |Data       |               | 
            | |Computing  | |Aggregation|               | 
            | +-----------+ +-----------+               |
            +-------------------------------------------+         
          ]]></artwork>
              </figure></t>

            <t><xref target="figure_4"/> shows the block diagram of this
            component. The data encoding block defines the method to encode
            the telemetry data. The data batching block defines the size of
            batch data buffered at the device side before export. The export
            protocol block defines the protocol used for telemetry data
            export. The data compression block defines the algorithm to
            compress the raw data. The data deduplication block defines the
            algorithm to remove the redundancy in the raw data. The data
            filter block defines the policies to filter the needed data. The
            data computing block defines the policies to prepocess the raw
            data and generate some new data. The data aggregation block
            defines the procedure to combine and synthesize the data.</t>
          </section>

          <section title="Example: Event-based Anomaly Monitor">
            <t>Network operators are interested in anomalies such as path
            change, network congestion, and packet drop. Such anomalies are
            hidden in raw telemetry data (e.g., path trace, timestamp). Such
            anomalies can be described as events and programmed into the
            device data plane. Only the triggered events are exported. For
            example, if a new flow appears at any node, a path change event is
            triggered; if the packet delay exceeds a predefined threshold in a
            node, the congestion event is triggered; if a packet is dropped
            due to buffer overflow, a packet drop event is triggered.</t>

            <t>The export data reduction due to such optimization is
            substantial. For example, given a single 5-hop 10Gbps path, assume
            a moderate number of 1 million packets per second are monitored,
            and the telemetry data plus the export packet overhead consume
            less than 30 bytes per hop. Without such optimization, the
            bandwidth consumed by the telemetry data can easily exceed 1Gbps
            (more than 10% of the path bandwidth), When the optimization is
            used, the bandwidth consumed by the telemetry data is negligible.
            Moreover, the pre-processed telemetry data greatly simplify the
            work of data analyzers.</t>
          </section>
        </section>

        <section title="Dynamic Network Probe">
          <t>Due to limited data plane resource and network bandwidth, it is
          unlikely one can monitor all the data all the time. On the other
          hand, the data needed by applications may be arbitrary but
          ephemeral. It is critical to meet the dynamic data requirements with
          limited resource.</t>

          <t>Fortunately, data plane programmability allows new data probles to be
          dynamically loaded. These on-demand probes are called
          Dynamic Network Probes (DNP). DNP is the technique to enable probes
          for customized data collection in different network planes. When
          working with an on-path telemetry technique, DNP is loaded into the data plane through
          incremental programming or configuration. The DNP can effectively
          conduct data generation, processing, and aggregation.</t>

          <t>DNP introduces flexibility and extensibility to IFIT. It
          can implement the optimizations for export data reduction motioned
          in the previous section. It can also generate custom data as
          required by today's and tomorrow's applications.</t>

          <section title="Block Diagram">
            <t><figure anchor="figure_5" title="Dynamic Network Probes">
                <artwork><![CDATA[
            +----------------------------+
            | +----------+  +----------+ |  
            | |Active    |  |YANG      | |
            | |Packet    |  |Model     | |
            | |Filter    |  |          | |
            | +----------+  +----------+ |
            | +----------+  +----------+ |
            | |Hardware  |  |Software  | |  
            | |Function  |  |Function  | |
            | +----------+  +----------+ |                             
            +----------------------------+         
          ]]></artwork>
              </figure></t>

            <t><xref target="figure_5"/> shows the block diagram of this
            component. The active packet filter block is available in
            most hardware and it defines DNPs through dynamically update the
            packet filtering policies (including flow selection and action). YANG models
            can be dynamically deployed to enable different data processing
            and filtering functions. Some hardware allows dynamically loading
            hardware-based functions into the forwarding path at runtime
            through mechanisms such as reserved pipelines and function stubs.
            Dynamically loadable software functions can be implemented in the
            control processors in capable nodes.</t>
          </section>

          <section title="Examples">
            <t>Following are some possible DNPs that can be dynamically
            deployed to support applications.</t>

            <t><list style="hanging">
                <t hangText="On-demand Flow Sketch:">A flow sketch is a
                compact online data structure (usually a variation of
                multi-hashing table) for approximate estimation of multiple
                flow properties. It can be used to facilitate flow selection.
                The aforementioned <xref target="CMSketch">CountMin
                Sketch</xref> is such an example. Since a sketch consumes data
                plane resources, it should only be deployed when actually
                needed.</t>

                <t hangText="Smart Flow Filter:">The policies that choose
                flows and packet sampling rate can change during the lifetime
                of an application.</t>

                <t hangText="Smart Statistics:">An application may need to
                count flows based on different flow granularity or maintain
                hit counters for selected flow table entries.</t>

                <t hangText="Smart Data Reduction:">DNP can be used to program
                the events that conditionally trigger data export.</t>
              </list></t>
          </section>
        </section>

        <section title="On-demand Technique Selection and Integration">
          <t>With multiple underlying data collection and export techniques at
          its disposal, IFIT can flexibly adapt to different network
          conditions and different application requirements.</t>

          <t>For example, depending on the types of data that are of interest,
          IFIT may choose either passport or postcard mode to collect the data; if an
          application needs to track down where the packets are lost,
          switching from passport mode to postcard mode should be supported.</t>

          <t>IFIT can further integrate multiple data plane monitoring and
          measurement techniques together and present a comprehensive data
          plane telemetry solution.</t>

          <t>Based on the application requirements and the real-time telemetry
          data analysis results, new configurations and actions can be
          deployed.</t>

          <section title="Block Diagram">
            <t><figure anchor="figure_7"
                title="Technique Selection and Integration">
                <artwork><![CDATA[
            +----------------------------------------------+
            | +------------+  +-------------+  +---------+ |
            | |Application |  |Configuration|  |Telemetry| |
            | |Requirements|->|& Action     |<-|Data     | |
            | |            |  |             |  |Analysis | |
            | +------------+  +-------------+  +---------+ |                                                 
            +----------------------------------------------+
            | Passport Mode:                               | 
            | +----------+   +----------+                  |  
            | |IOAM E2E  |   |IOAM Trace|                  |
            | +----------+   +----------+                  |
            | Postcard Mode:                               |
            | +----------+   +----------+   +----------+   |
            | |PBT-M     |   |IOAM DEX  |   |EAM       |   |
            | +----------+   +----------+   +----------+   |
            | Hybrid Mode:                                 | 
            | +----------+   +----------+                  | 
            | |HTS       |   |Multicast |                  |
            | |          |   |Telemetry |                  |
            | +----------+   +----------+                  |
            +----------------------------------------------+         
          ]]></artwork>
              </figure></t>

            <t><xref target="figure_7"/> shows the block diagram of this
            component, which lists the candidate on-path telemetry
            techniques.</t>

            <t>Located in the logically centralized controller, this component
            makes all the control and configuration dynamically to the capable
            nodes in the domain which will affect the future telemetry data.
            The configuration and action decisions are based on the inputs
            from the application requirements and the realtime telemetry data
            analysis results. Note that here the telemetry data source is not
            limited to the data plane. The data can come form all the sources
            mentioned in <xref target="RFC9232"/>, including
            external data sources.</t>
          </section>
        </section>
      </section>
	  
	  

      <section anchor="sec:app" title="IFIT for Reflective Telemetry">
        <t>The components described in <xref target="sec:comp" /> can work together to support reflective
        telemetry, as shown in <xref target="figure_8"/>.</t>

        <t><figure anchor="figure_8" title="IFIT-based Reflective Telemetry">
            <artwork><![CDATA[
                        +---------------------+
                        |  On-path Telemetry  |
                 +------+    Applications     |<------+
                 |      |                     |       |
                 |      +---------------------+       |
                 |         Technique Selection        | 
                 |         and Integration            | 
                 |                                    | 
                 |Flexible                   Flexible |     
                 |Flow,     reflection-loop      Data |
                 |Packet,                       Export| 
                 |and Data                            | 
                 |Selection                      +----+----+  
                 V                              +---------+|   
           +----------+ Encapsulation          +---------+|| 
           |  Head    | and Tunneling          |  Path   ||| 
           |  Node    |----------------------->|  Nodes  ||+ 
           |          |                        |         |+ 
           +----------+                        +---------+ 
               DNP                                DNP  

]]></artwork>
          </figure></t>

        <t>An application may pick a suite of telemetry techniques based on
        its requirements and apply an initial technique to the data plane. It
        then configures the head nodes to decide the initial target
        flows/packets and telemetry data set, the encapsulation and tunneling
        scheme based on the underlying network architecture, and the
        IFIT-capable nodes to decide the initial telemetry data export policy.
        Based on the network condition and the analysis results of the
        telemetry data, the application can change the telemetry technique,
        the flow/data selection policy, and the data export approach in real
        time without breaking the normal network operation. Many of such
        dynamic changes can be done through loading and unloading DNPs.</t>

        <t>The reflective telemetry enabled by the IFIT allows numerous new
        applications. Two examples are provided below.</t>
      

      <section title="Intelligent Multipoint Performance Monitoring">
        <t><xref target="RFC8889"/> describes an intelligent performance
        management based on the network condition. The idea is to split the
        monitoring network into clusters. The cluster partition that can be
        applied to every type of network graph and the possibility to combine
        clusters at different levels enable the so-called Network Zooming. It
        allows a controller to calibrate the network telemetry, so that it can
        start without examining in depth and monitor the network as a whole.
        In case of necessity (packet loss or too high delay), an immediate
        detailed analysis can be reconfigured. In particular, the controller,
        that is aware of the network topology, can set up the most suitable
        cluster partition by changing the traffic filter or activate new
        measurement points and the problem can be localized with a
        step-by-step process.</t>

        <t>An application on top of the controllers can manage such mechanism,
        whose dynamic and reflective operations are supported by the IFIT framework.</t>
      </section>

      <section title="Intent-based Network Monitoring">
        <t><figure anchor="figure_9" title="Intent-based Monitoring">
            <artwork><![CDATA[

                     1.User Intents
                            |
                            V         5.Per-packet
             4.Packet +------------+   Telemetry 
               Filter |            |   Data 
             +--------+ Controller |<--------+
             |        |            |         |
             |        +--+---------+         |
             |           |       ^           |
             |         2.|DNPs 3.|Network    |
             |           |       |Information|
             |           V       |           |
      +------+-------------------+-----------+---+
      |      |                                   |
      |      V                      +------+     |
      | +-------+                  +------+|     | 
      | | Head  |                 +------+||     |
      | | Node  |                 |Path  ||+     |
      | |       |                 |Nodes |+      |
      | +-------+                 +------+       |
      +------------------------------------------+
     ]]></artwork>
          </figure></t>

        <t>In this example, a user can express high level intents for network
        monitoring. The controller translates an intent and configures the
        corresponding DNPs in capable nodes which collect necessary network
        information. Based on the real-time information feedback, the
        controller runs a local algorithm to determine the suspicious flows.
        It then deploys specific packet filters to the head node to initiate the high precision
        per-packet on-path telemetry for these flows.</t>
      </section>
    </section>

    </section>

    <section anchor="sec:gap" title="Guidance for Solution Developers">
			 
	  <t>Having a high-level framework covering a class of related techniques
      promotes a holistic approach for standard development and helps to
      avoid duplicated efforts and piecemeal solutions that only focus on a
      specific technique while omitting the compatibility and extensibility
      issues, which is important to a healthy ecosystem for network
      telemetry.</t>
	  
      <t>A complete IFIT-based solution needs standard interfaces for
      configuration and data extraction, and standard encapsulation on various
      transport protocols. It may also need standard API and primitives for
      application programming and deployment. <xref
      target="I-D.ietf-ippm-ioam-deployment"/> summarizes some techniques for 
	  encapsulation and data export for IOAM. Solution developers need to
      consider the aspects set out in the following subsections.</t>

      

      <section title="Encapsulation in Transport Protocols">
        <t>Since the introduction of IOAM, the IOAM option header
        encapsulation schemes in various network protocols have been defined
		(e.g., <xref target="I-D.ietf-ippm-ioam-ipv6-options" />).
        Similar encapsulation schemes are needed to cover the other
        on-path telemetry techniques. Meanwhile, the on-path telemetry header/data encapsulation
        schemes in some popular protocols, such as MPLS and SRv6, are
        also needed. <xref
        target="I-D.song-ippm-postcard-based-telemetry">PBT-M</xref> does not
        introduce new headers to the packets so the trouble of encapsulation
        for a new header is avoided. While there are some proposals which
        allow new header encapsulation in MPLS packets (e.g., <xref
        target="I-D.song-mpls-extension-header"/>) or in SRv6 packets (e.g.,
        <xref target="I-D.song-spring-siam"/>), they are still in their infancy
        stage and require further work. Before standards are available, in a
        confined domain, pre-standard encapsulation approaches may be
        applied.</t>
      </section>

      <section title="Tunneling Support">
        <t>In carrier networks, it is common for user traffic to traverse
        various tunnels for QoS, traffic engineering, or security. Both the
        uniform mode and the pipe mode for tunnel support are required and
        described in <xref target="I-D.song-ippm-ioam-tunnel-mode"/>. The uniform mode treats the nodes in a
        tunnel uniformly as the nodes outside of the tunnel on a path.
        In contrast, the pipe mode abstracts all the nodes between the
        tunnel ingress and egress as a circuit so no nodes in the tunnel is
        visible to the nodes outside of the tunnel. With
        such flexibility, the operator can either gain a true end-to-end
        visibility or apply a hierarchical approach which isolates the
        monitoring domain between customer and provider.</t>
      </section>

      <section title="Deployment Automation">
        <t>Standard approaches that automate the function
        configuration, and capability query and advertisement, could either be
        deployed in a centralized fashion or a distributed fashion. The draft <xref
        target="I-D.ietf-ippm-ioam-yang"/> provides a YANG model for IOAM
        configuration. Similar models needs to be defined for other
        techniques. It is also helpful to provide standards-based approaches
        for configuration in various network environments. For example, in
        Segment Routing (SR) networks, extensions to BGP or Path Computation Element Communication Protocol (PCEP) can be defined to
        distribute SR policies carrying on-path telemetry information, so that
        telemetry behavior can be enabled automatically when the SR policy is
        applied. <xref target="I-D.chen-pce-sr-policy-ifit"/> defines extensions
		to PCEP to configure SR policies for on-path telemetry. <xref target="I-D.ietf-idr-sr-policy-ifit"/>
        defines extensions to BGP for the same purpose. Additional
        capability discovery and dissemination will be needed for other types
        of networks.</t>

        <t>To realize the potential of on-path telemetry, programming and
        deploying DNPs are important. <xref target="RFC5810">ForCES</xref> is
        a standard protocol for network device programming, which can be used
        for DNP deployment. Currently some related works such as <xref
        target="I-D.wwx-netmod-event-yang"/> and <xref
        target="I-D.bwd-netmod-eca-framework"/> have proposed to use YANG
        models to define the smart policies which can be used to implement
        DNPs. In the future, other approaches for hardware and software-based
        functions can be development to enhance the programmability and
        flexibility.</t>
      </section>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>In addition to the specific security issues discussed in each
      individual document on on-path telemetry, this document considers the
      overall security issues at the system level. This should serve as a
      guide to the on-path telemetry application developers and users. 
	  General security and privacy considerations for any network telemetry system are also discussed in <xref target="RFC9232"/>.</t>
	  
	  <t>Since the on-path telemetry techniques work on the network forwarding plane, the IFIT framework poses some security risks. The important and sensitive information about a network could be exposed to an attacker. Further, the on-path telemetry data might swamp various
	  parts of the network, leading to a possible DoS attack. </t>
	  
	  <t>Fortunately, security measures can be enforced on various parts of the framework to mitigate such threats. For example, the configuration can filter and rate limit the monitored traffic; encryption and authentication can be applied on the exported telemetry data; different underlying techniques can be chosen to adapt to the different network conditions.</t>  
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document includes no request to IANA.</t>
    </section>

    <section anchor="Contributors" title="Contributors">
      <t>Other major contributors of this document include Giuseppe Fioccola,
      Daniel King, Zhenqiang Li, Zhenbin Li, Tianran Zhou, and James
      Guichard.</t>
    </section>

    <section anchor="Acknowledgments" title="Acknowledgments">
      <t>We thank Diego Lopez, Shwetha Bhandari, Joe Clarke, Adrian Farrel,
      Frank Brockners, Al Morton, Alex Clemm, Alan DeKok, Benoit Claise, and Warren Kumari
      for their constructive suggestions for improving this document.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include='reference.RFC.2119'?>

      <?rfc include='reference.RFC.8174'?>

      <?rfc include='reference.RFC.7799'?>

      <?rfc include='reference.RFC.8799'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.7011'?>

      <?rfc include='reference.RFC.8889'?>

      <?rfc include='reference.RFC.5810'?>
	  
	  <?rfc include='reference.RFC.8993'?>

      <?rfc include='reference.RFC.9197'?>
	  
	  <?rfc include='reference.I-D.li-apn-framework'?>

      <?rfc include='reference.I-D.ietf-ippm-ioam-direct-export'?>

      <?rfc include='reference.I-D.song-mpls-extension-header'?>

      <?rfc include='reference.I-D.song-ippm-ioam-tunnel-mode'?>

      <?rfc include='reference.I-D.ietf-mboned-multicast-telemetry'?>

      <?rfc include='reference.I-D.song-ippm-postcard-based-telemetry'?>

      <?rfc include='reference.I-D.zhou-ippm-enhanced-alternate-marking'?>

      <?rfc include='reference.I-D.mirsky-ippm-hybrid-two-step'?>

      <?rfc include='reference.I-D.ietf-ippm-ioam-deployment'?>

      <?rfc include='reference.I-D.herbert-ipv4-eh"?>

      <?rfc include='reference.I-D.wwx-netmod-event-yang'?>

      <?rfc include='reference.RFC.9232'?>

      <?rfc include='reference.I-D.ietf-ippm-ioam-yang'?>

      <?rfc include='reference.I-D.bwd-netmod-eca-framework'?>

      <?rfc include='reference.I-D.chen-pce-sr-policy-ifit'?>

      <?rfc include='reference.I-D.ietf-idr-sr-policy-ifit'?>
	  
	  <?rfc include='reference.I-D.ietf-ippm-ioam-ipv6-options'?>
	  
	  <?rfc include='reference.I-D.song-spring-siam'?>

      <reference anchor="CMSketch"
                 target="http://dx.doi.org/10.1016/j.jalgor.2003.12.001">
        <front>
          <title>An improved data stream summary: the count-min sketch and its
          applications</title>

          <author initials="G." surname="Cormode"/>

          <author initials="S." surname="Muthukrishnan"/>

          <date year="2005"/>
        </front>
      </reference>

      <reference anchor="passport-postcard"
                 target="https://doi.org/10.1145/2342441.2342453">
        <front>
          <title>Where is the debugger for my software-defined
          network?</title>

          <author initials="N." surname="Handigol"/>

          <author initials="B." surname="Heller"/>

          <author initials="V." surname="Jeyakumar"/>

          <author initials="D." surname="Mazieres"/>

          <author initials="N." surname="McKeown"/>

          <date year="2012"/>
        </front>
      </reference>
    </references>
  </back>
</rfc>
