<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
     There has to be one entity for each item to be referenced.
     An alternate method (rfc include) is described in the references. -->
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC6241 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6241.xml">
<!ENTITY RFC7950 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7950.xml">
<!ENTITY RFC7149 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7149.xml">
<!ENTITY RFC7426 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7426.xml">
<!ENTITY RFC8299 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8299.xml">
<!ENTITY RFC8309 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8309.xml">
<!ENTITY RFC8340 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8340.xml">
<!ENTITY RFC8453 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8453.xml">
<!ENTITY RFC8174 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8174.xml">
<!ENTITY RFC8345 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8345.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs),
     please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
     (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="std"
     docName="draft-he-adaptive-collecting-problem-usecases-00"
     ipr="trust200902">
  <front>
    <title abbrev="Adaptive Traffic Data Collection">Problem Statement and Use
    Cases of Adaptive Traffic Data Collection</title>

    <author fullname="Xiaoming He" initials="X." surname="He">
      <organization>China Telecom</organization>

      <address>
        <email>hexm4@chinatelecom.cn</email>
      </address>
    </author>

    <author fullname="Dongfeng Mao" initials="X." surname="Mao">
      <organization>China Telecom</organization>

      <address>
        <email>maodf@chinatelecom.cn</email>
      </address>
    </author>

    <author fullname="Qiufang Ma" initials="Q." surname="Ma">
      <organization>Huawei</organization>

      <address>
        <postal>
          <street>101 Software Avenue, Yuhua District</street>

          <city>Nanjing</city>

          <region>Jiangsu</region>

          <code>210012</code>

          <country>China</country>
        </postal>

        <email>maqiufang1@huawei.com</email>
      </address>
    </author>

    <author fullname="Tianran Zhou" initials="X." surname="Zhou">
      <organization>Huawei</organization>

      <address>
        <email>zhoutianran@huawei.com</email>
      </address>
    </author>

    <date year="2022"/>

    <area>ops</area>

    <keyword>Adaptive Telemetry</keyword>

    <abstract>
      <t>IP carrier network needs to provide real-time traffic visibility to
      help network operators quickly and accurately locate network congestion
      and packet loss, and make timely path adjustment for deterministic
      services in order to avoid congestion. It is essential to explore the
      adaptive traffic data collection mechanism so as to capture real-time
      network state at minimum resource consumption.</t>

      <t>This document summarizes the problems currently faced by network
      operators when attempting to provide timely traffic data collection to
      satisfy the various scenarios that require real-time network state and
      traffic visibility, and aggregates the requirements for adaptive traffic
      collecting mechanism from a variety of deployment scenarios.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="Introduction" title="Introduction">
      <t>With the advent of cloud computing, big data and AI, as well as the
      scale deployment of 5G mobile communication technology, a large number
      of uRLLC services such as AR/VR, industrial Internet and computing power
      network have emerged, which puts forward higher requirements for the
      service quality of IP carrier network. IP carrier network needs to
      provide real-time traffic visibility to help network operators quickly
      and accurately locate network congestion and packet loss, and make
      timely path adjustment for the services of deterministic delay in order
      to avoid the congested nodes and links. For such business scenarios, the
      network needs to provide traffic sampling capability in sub seconds or
      even milliseconds so as to gain real-time network state.</t>

      <t>For decades, SNMP and MIBs have been widely deployed and the de facto
      choice for many monitoring solutions, especially in collecting interface
      traffic. Arguably the biggest shortcoming of SNMP for those applications
      concerns the need to rely on periodic polling, because it introduces an
      additional load on the network and devices, and it is brittle if polling
      cycles are missed. Therefore, SNMP has no capability to realize
      real-time traffic sampling at sub seconds or even milliseconds level.
      Telemetry, as a revolutionary data acquisition technique, based on pull
      mechanism that is able to deliver object changes as they happen,
      overcomes the limitations of SNMP such as "slow speed, low efficiency
      and more demands for processing capacity". Nevertheless, for the sake of
      capturing real-time network state, persistent sampling of interface
      traffic at milliseconds interval will generate a considerable amount of
      data which may claim too much transport bandwidth and overload the
      servers for data collection, storage, and analysis. Increasing the data
      handling capacity is technically feasible but expensive, and difficult
      to achieve large-scale deployment in operator's networks. It is
      essential to explore the adaptive traffic data collection mechanism so
      as to capture real-time network state at minimum resource
      consumption.</t>

      <t>This document summarizes the problems currently faced by network
      operators when attempting to provide timely traffic data collection to
      satisfy the various requirements by the aforementioned new scenarios
      that require real-time network state and traffic visibility. Also, this
      document aggregates the requirements for adaptive traffic data
      collection mechanism from a variety of deployment scenarios.</t>

      <section title="Abbreviations">
        <t><list style="hanging">
            <t hangText="AI: ">Artificial Intelligence<vspace
            blankLines="1"/></t>

            <t hangText="AR: ">Augmented Reality<vspace blankLines="1"/></t>

            <t hangText="VR: ">Virtual Reality<vspace blankLines="1"/></t>

            <t hangText="IP RAN: ">IP Radio Access Network<vspace
            blankLines="1"/></t>

            <t hangText="DetNet: ">Deterministic Networking<vspace
            blankLines="1"/></t>

            <t hangText="QoE: ">Quality of Experience<vspace
            blankLines="1"/></t>

            <t hangText="SLA: ">Service Level Agreement<vspace
            blankLines="1"/></t>

            <t hangText="uRLLC: ">ultra Reliable &amp; Low Latency
            Communication<vspace blankLines="1"/></t>

            <t hangText="NMS: ">Network Management System<vspace
            blankLines="1"/></t>

            <t hangText="IDC: ">Internet Data Center<vspace
            blankLines="1"/></t>

            <t hangText="SNMP: ">Simple Network Management Protocol<vspace
            blankLines="1"/></t>

            <t hangText="MIB: ">Management Information Base<vspace
            blankLines="1"/></t>
          </list></t>
      </section>
    </section>

    <section anchor="terminology" title="Terminology">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in BCP 14
      <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when,
      they appear in all capitals, as shown here.</t>

      <t>The following terms are defined in this document:<list
          style="hanging">
          <t hangText="adaptive traffic data collection: ">Allow servers
          automatically switch to different telemetry sampling period to
          collect traffic data according to the threshold change.<vspace
          blankLines="1"/></t>
        </list></t>
    </section>

    <section title="Problem Statement">
      <t>As is well known ,IP network, based on statistical multiplexing
      model, is of traffic burst characteristics. For a long time, operators
      have obtained traffic visibility from the Network Management System
      (NMS), and satisfied with 30~40% bandwidth utilization. In spite of such
      low link usage, many complaints have still been received about poor QoE
      in delivering applications with the sensitivity of delay and packet
      loss. The fundamental cause is that the observed average network traffic
      masks the characteristics of traffic burst, given that SNMP is widely
      employed in operator's networks to collect network traffic at 5 minutes
      intervals.</t>

      <t>A large quantity of laboratory data and operational data indicate
      that a microburst phenomenon occurs frequently in operator's carrier
      networks, such as IP RAN, IP metropolitan network, IP backbone network
      and IDC. The typical duration of such a microburst is tens to hundreds
      of milliseconds, easy to cause instantaneous congestion of the output
      queue. Network congestion amplifies queuing delay and jitter, it may
      even give rise to packet loss. All along, solving the problem of network
      congestion is a major challenge for IP networks. So, the microburst is
      not beneficial to the deterministic-delay applications. And it is
      difficult to eliminate the microburst, but must attempt to avoid it.</t>

      <t>Although the mechanism of microburst is not very distinct, however,
      it does not hinder us to detect it. Fortunately, Telemetry (e.g., YANG
      PUSH [RFC8639] [RFC8641],gNMI <xref target="gNMI"/>) has the capability
      to collect interface traffic at a higher frequency, i.e., millisecond
      interval. So, by means of telemetry technique, we can capture the
      complete aspects of a microburst traffic. However, it is impractical to
      gain the real-time traffic visibility at the cost of persistent sampling
      at millisecond intervals. For example, in order to capture a microburst
      traffic of interface, at least 10-millisecond sampling cycle is
      necessary. Compared with the today's widely employed 5-minute sampling
      cycle based on SNMP, the required resources will increase by 30000
      times!</t>

      <t>It is essential to investigate the adaptive traffic data collection
      mechanism so as to capture real-time network state at minimum resource
      consumption. That is to say, in normal non-congested network conditions,
      which happen at the time of 95% above, minutes-level sampling cycle is
      enough as it is. But, while detecting a congestion state or congestion
      trend, sampling period must be timely tuned to milliseconds to capture a
      microburst traffic of interface. A congestion state or congestion trend
      of interface is manifested by packet loss due to queue overflow, queue
      depth beyond the threshold or too high link utilization, which can be
      defined as Event-triggered data. Such data can be actively pushed
      through subscription or passively polled through query. Although the
      microburst phenomenon occurs frequently, it is transient and an on-line
      detection tool is preferable to find it timely. The traditional method
      of using CPU on main control board through query is processing resources
      consuming, the network device must possess built-in hardware designed
      especially to monitor it.</t>

      <t>In order to reduce the excessive consumption of resources caused by
      millisecond level collection of the single data, batch data such as
      hundreds of sampled traffic data from an interface can be packaged as a
      telemetry packet and is sent to the collector. The timestamp is required
      for every sampled traffic data for the convenience of the collector
      visualizing the interface traffic trend, And the collector must realize
      traffic visualization in real-time manner in order that the operators
      can observe it immediately.</t>
    </section>

    <section title="Scenarios of Adaptive Traffic data collection">
      <t>This section presents several typical scenarios which require
      adaptive traffic data collection to gain real-time network state and
      traffic visibility at minimum resource consumption.</t>

      <section title="Multi-dimensional real-time portrait of interface traffic characteristic">
        <t>Interface traffic data collection is one of the most important
        functions for NMS. Today, more and more applications are of
        latency-sensitive and loss-sensitive characteristic, and the real-time
        traffic visibility can help operators better understand network
        performance so as to achieve SLA guarantees. On the other hand,
        obtaining the holistic and genuine characteristic of interface traffic
        is also a basic requirement for the statistical multiplexing model of
        IP network, which is of great significance for traffic prediction,
        network planning, network capacity expansion, network
        optimization,etc. However, the traditional NMS based on SNMP has no
        capability to depict genuine characteristic of interface traffic, and
        interface traffic data collection based on telemetry techniques is
        preferable.</t>

        <t>It is essential to exploit the adaptive traffic data collection
        techniques to depict multi-dimensional real-time portrait of interface
        traffic characteristic at minimum resource consumption. That is to
        say, in normal non-congested network conditions, which happen at the
        time of 95% above, minutes-level sampling cycle is enough as it is.
        But, while detecting a congestion state or congestion trend, sampling
        cycle must be timely tuned to milliseconds to capture a microburst
        traffic of interface. Such an adaptive traffic data collection
        technique can not only reflect the coarse-grained interface traffic
        characteristics, but also capture the congestion state of interface
        with finer time granularity. It will be an important tool for the
        DetNet to obtain real-time network performance. Because of the lower
        cost, it can be deployed on large-scale in operator's networks.</t>
      </section>

      <section title="Microburst traffic detecting">
        <t>Microburst traffic, as an instantaneous congestion phenomenon
        occurring frequently in IP carrier network, will cause critical delay
        jitter and even packet loss, which will seriously affect the QoE of
        latency-sensitive and loss-sensitive applications. The ability of
        detecting microburst traffic of interface will help network operators
        quickly and accurately locate network congestion and packet loss, and
        make timely path adjustment for deterministic-delay services in order
        to avoid the congested nodes and links.</t>

        <t>Because the typical duration of such a microburst is generally tens
        to hundreds of milliseconds, at least 10-millisecond sampling cycle is
        necessary. Although the microburst phenomenon occurs frequently, it
        takes very little time of 24 hours a day. It is not a good approach to
        observe it through persistent millisecond sampling period. Preferably,
        we can capture it as soon as a microburst occurs to ensure important
        diagnose data will not be missed.Because it is transient and an
        on-line detection tool is required to find it timely. Triggered by the
        events such as packet loss, queue depth beyond the threshold which is
        detected timely, sampling period must be timely tuned to milliseconds
        to capture a microburst traffic of interface. In a word, it is of
        practical significance to explore the microburst detection technology
        aiming at minimizing resource consumption.</t>
      </section>

      <section title="Congestion avoidance for deterministic services">
        <t>Network congestion will rapidly increase queuing delay and jitter,
        it may even give rise to packet loss, which will seriously affect the
        QoE of delay-sensitive and packet loss-sensitive applications. The
        goal of network optimization is to reduce the occurrence of network
        congestion as much as possible.</t>

        <t>It is a complicated problem for network operators to accurately
        predict the trend of network congestion and make network adjustment in
        advance. The real-time traffic visibility based on the adaptive
        traffic data collection techniques can accurately predict the
        long-term congestion, and quickly capture the instantaneous congestion
        (i.e,. microburst) of interface. By means of the real-time traffic
        visibility, the automatic optimization tool (e.g., AI) can make timely
        path adjustment for key traffic flows. For example, based on the
        real-time traffic visibility and microburst events (e.g., packet loss,
        queue depth) collected, the controller can accurately predict the
        congestion trend of interface and make timely traffic redirection to
        the non-congested interface for delay deterministic applications.</t>
      </section>

      <section title="On-path telemetry based on adaptive traffic sampling">
        <t>On-path telemetry is useful for application-aware networking
        operations. For example, it is critical for the operators who offer
        high-bandwidth, latency and loss-sensitive services such as video
        streaming and online gaming to closely monitor the relevant flows in
        real-time as the basis for any further optimizations. Applying on-path
        telemetry on all packets of selected flows can still be out of reach.
        A sampling rate should be set for these flows and only enable
        telemetry on the sampled packets. However, a too high rate would
        exhaust the network resource and even cause packet drops; an overly
        low rate, on the contrary, would result in the loss of information and
        inaccuracy of measurements.</t>

        <t>An adaptive approach can be used based on the network conditions to
        dynamically adjust the sampling rate. In normal network state, a low
        sampling rate is enough to reflect network performance; But, in case
        of network congestion, the controller is aware of it from the
        real-time traffic visibility and events data collected (e.g., packet
        loss, queue depth), and timely adjust the packet sampling rate at very
        high level. Even all packets of selected flows are applicable so as to
        acquire real-time measurement data such as latency, jitter and packet
        loss.</t>
      </section>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document does not include an IANA request.</t>
    </section>

    <section anchor="scecurity" title="Security Considerations">
      <t>This document provides an adaptive telemetry mechanism to minimize
      the resource consumption. The increased complexity of network telemetry
      may give rise to some security concerns. For example, persistent traffic
      collection at very high rate (e.g., at millisecond interval) induced by
      wrong configuration or spurious triggering might exhaust resources of
      the forwarding plane and the control plane of network device as well as
      the collector; An inappropriate threshold setting should be avoided. The
      telemetry data is highly sensitive, which exposes a lot of information
      about the network and its configuration. Some of that information can
      make designing attacks against the network much easier (e.g., exact
      details of what software and patches have been installed), and allows an
      attacker to determine whether a device may be subject to unprotected
      security vulnerabilities.</t>

      <t>On the other hand, for telemetry interfaces security considerations,
      NETCONF or gNMI must provide authentication, data
      integrity,confidentiality, and replay protection. And further study of
      the security issues will be required.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119.xml"?>

      <?rfc include="reference.RFC.3688.xml"?>

      <?rfc include="reference.RFC.6020.xml"?>

      <?rfc include="reference.RFC.6242.xml"?>

      <?rfc include="reference.RFC.8639.xml"?>

      <?rfc include="reference.RFC.8641.xml"?>
    </references>

    <references title="Informative References">
      <?rfc include="reference.RFC.8174.xml"?>

      <reference anchor="gNMI">
        <front>
          <title>https://github.com/openconfig/gnmi</title>

          <author>
            <organization/>
          </author>

          <date/>
        </front>
      </reference>
    </references>
  </back>
</rfc>
