<?xml version="1.0" encoding="US-ASCII"?>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!-- used by XSLT processors -->
<!-- OPTIONS, known as processing instructions (PIs) go here. -->
<!-- For a complete list and description of PIs,
     please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable PIs that most I-Ds might want to use. -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC): -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="3"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references: -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space:
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of popular PIs -->
<rfc category="std" docName="draft-ietf-pce-state-sync-07" ipr="trust200902"
     obsoletes="" sortRefs="true" submissionType="IETF" symRefs="true"
     tocDepth="3" tocInclude="true" updates="" version="3" xml:lang="en">
  <!-- xml2rfc v2v3 conversion 3.10.0 -->

  <front>
    <title abbrev="state-sync">Inter Stateful Path Computation Element (PCE)
    Communication Procedures.</title>

    <seriesInfo name="Internet-Draft" value="draft-ietf-pce-state-sync-07"/>

    <author fullname="Stephane Litkowski" initials="S" surname="Litkowski">
      <organization>Cisco</organization>

      <address>
        <!-- postal><street/><city/><region/><code/><country/></postal -->

        <!-- <phone/> -->

        <!-- <facsimile/> -->

        <email>slitkows.ietf@gmail.com</email>

        <!-- <uri/> -->
      </address>
    </author>

    <author fullname="Siva Sivabalan" initials="S." surname="Sivabalan">
      <organization>Ciena Corporation</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <email>msiva282@gmail.com</email>
      </address>
    </author>

    <author fullname="Cheng Li" initials="C" surname="Li">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street>Huawei Campus, No. 156 Beiqing Rd.</street>

          <city>Beijing</city>

          <region/>

          <code>100095</code>

          <country>China</country>
        </postal>

        <email>c.l@huawei.com</email>
      </address>
    </author>

    <author fullname="Haomian Zheng" initials="H." surname="Zheng">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street>H1, Huawei Xiliu Beipo Village, Songshan Lake</street>

          <city>Dongguan</city>

          <region>Guangdong</region>

          <code>523808</code>

          <country>China</country>
        </postal>

        <phone/>

        <email>zhenghaomian@huawei.com</email>

        <uri/>
      </address>
    </author>

    <date/>

    <area>Routing</area>

    <workgroup>PCE Working Group</workgroup>

    <!-- <keyword/> -->

    <!-- <keyword/> -->

    <!-- <keyword/> -->

    <!-- <keyword/> -->

    <abstract>
      <t>The Path Computation Element (PCE) Communication Protocol (PCEP)
      provides mechanisms for PCEs to perform path computation in response to
      a Path Computation Client (PCC) request. The Stateful PCE extensions
      allow stateful control of Multi-Protocol Label Switching (MPLS) Traffic
      Engineering (TE) Label Switched Paths (LSPs) using PCEP.</t>

      <t>A Path Computation Client (PCC) can synchronize an LSP state
      information to a Stateful Path Computation Element (PCE). A PCC can have
      multiple PCEP sessions towards multiple PCEs. There are some use cases,
      where an inter-PCE stateful communication can bring additional
      resiliency in the design, for instance when some PCC-PCE session
      fails.</t>

      <t>This document describes the procedures to allow a stateful
      communication between PCEs for various use-cases and also the procedures
      to prevent computations loops. <!--[Dhruv, added text in introduction] Hierarchical PCE use case is out of scope of this document.--></t>
    </abstract>
  </front>

  <middle>
    <section anchor="intro" numbered="true" toc="default">
      <name>Introduction and Problem Statement</name>

      <t>The Path Computation Element communication Protocol (PCEP) <xref
      format="default" target="RFC5440"/> provides mechanisms for Path
      Computation Elements (PCEs) to perform path computations in response to
      Path Computation Clients' (PCCs) requests.</t>

      <t>A stateful PCE <xref format="default" target="RFC8231"/> is capable
      of considering, for the purposes of path computation, not only the
      network state in terms of links and nodes (referred to as the Traffic
      Engineering Database or TED) but also the status of active services
      (previously computed paths, and currently reserved resources, stored in
      the Label Switched Paths Database (LSP-DB).</t>

      <t><xref format="default" target="RFC8051"/> describes general
      considerations for a stateful PCE deployment and examines its
      applicability and benefits, as well as its challenges and limitations
      through a number of use cases.</t>

      <t>A PCC can synchronize an LSP state information to a Stateful PCE. The
      stateful PCE extension allows a redundancy scenario where a PCC can have
      redundant PCEP sessions towards multiple PCEs. In such a case, a PCC
      gives control of a LSP to only a single PCE, and only one PCE is
      responsible for path computation for this delegated LSP. <!--dhruv:commented it! The document does not state the procedures related to an inter-PCE stateful communication.--></t>

      <t>There are some use cases, where an inter-PCE stateful communication
      can bring additional resiliency in the design, for instance when some
      PCC-PCE session fails. The inter-PCE stateful communication may also
      provide a faster update of the LSP states when such an event occurs.
      Finally, when, in a redundant PCE scenario, there is a need to compute a
      set of paths that are part of a group (so there is a dependency between
      the paths), there may be some cases where the computation of all paths
      in the group is not handled by the same PCE: this situation is called a
      split-brain. This split-brain scenario may lead to computation loops
      between PCEs or suboptimal path computation.</t>


      <t>In the scope of this document, the term 'computation loop' is used to describe a behaviour of PCEP message exchange looping between PCC and PCE or between PCEs, resulting in frequent path calculations, path reporting and path updates to the network resulting in constant load on the PCE and oscillation of data plane traffic after each subsequent path update.</t>

      <t>This document describes the procedures to allow a stateful
      communication between PCEs for various use-cases and also the procedures
      to prevent computations loops. <!--[Dhruv, added text in introduction] Hierarchical PCE use case is out of scope of this document.--></t>

      <t>Further, the examples in this section are for illustrative purpose to
      showcase the need for inter-PCE stateful PCEP sessions.</t>



      <section anchor="Language" numbered="true" toc="default">
        <name>Requirements Language</name>

        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in BCP
        14 <xref format="default" target="RFC2119"/> <xref format="default"
        target="RFC8174"/> when, and only when, they appear in all capitals,
        as shown here.</t>
      </section>

      <section anchor="lsp-changes" numbered="true" toc="default">
        <name>Reporting LSP Changes</name>

        <t>When using a stateful PCE (<xref format="default"
        target="RFC8231"/>), a PCC can synchronize an LSP state information to
        the stateful PCE. If the PCC grants the control of the LSP to the PCE
        (called delegation <xref format="default" target="RFC8231"/>), the PCE
        can update the LSP parameters at any time.</t>

        <t>In a multi PCE deployment (redundancy, loadbalancing...), with the
        current specification defined in <xref format="default"
        target="RFC8231"/>, when a PCE makes an update, it is the PCC that is
        in charge of reporting the LSP status to all PCEs with LSP parameter
        change which brings additional hops and delays in notifying the
        overall network of the LSP parameter change.</t>

        <t>This delay may affect the reaction time of the other PCEs if they
        need to take action after being notified of the LSP parameter
        change.</t>

        <t>Apart from the synchronization from the PCC, it is also useful if
        there is a synchronization mechanism between the stateful PCEs. As
        stateful PCE make changes to its delegated LSPs, these changes
        (pending LSPs and the sticky resources <xref format="default"
        target="RFC7399"/>) can be synchronized immediately to the other
        PCEs.</t>

        <artwork align="left" alt="" name="" type=""><![CDATA[

          +----------+
          |   PCC1   |  LSP1
          +----------+
             /    \
            /      \
   +---------+    +---------+
   |  PCE1   |    |  PCE2   |
   +---------+    +---------+
           \       /
            \     /
          +----------+
          |   PCC2   |  LSP2
          +----------+
    ]]></artwork>

        <t>In the figure above, we consider a load-balanced PCE architecture,
        so PCE1 is responsible to compute paths for PCC1 and PCE2 is
        responsible to compute paths for PCC2. When PCE1 triggers an LSP
        update for LSP1, it sends a PCUpd message to PCC1 containing the new
        parameters for LSP1. PCC1 will take the parameters into account and
        will send a PCRpt message to PCE1 and PCE2 reflecting the changes.
        PCE2 will so be notified of the change only after receiving the PCRpt
        message from PCC1.</t>

        <t>Let's consider that the LSP1 parameters changed in such a way that
        LSP1 will take over resources from LSP2 with a higher priority. After
        receiving the report from PCC1, PCE2 will therefore try to find a new
        path for LSP2. If we consider that there is a round trip delay of
        about 150 milliseconds (ms) between the PCEs and PCC1 and a round trip
        delay of 10 ms between the two PCEs if will take more than 150 ms for
        PCE2 to be notified of the change.</t>

        <t>Adding a PCEP session between PCE1 and PCE2 may allow to reduce the
        synchronization time, so PCE2 can react more quickly by taking the
        pending LSPs and attached resources into account during path
        computation and re-optimization.</t>
      </section>

      <section anchor="split-brain" numbered="true" toc="default">
        <name>Split-Brain</name>

        <t>In a resiliency case, a PCC has redundant PCEP sessions towards
        multiple PCEs. In such a case, a PCC gives control on an LSP to a
        single PCE only, and only this PCE is responsible for the path
        computation for the delegated LSP: the PCC achieves this by setting
        the D flag only towards the active PCE <xref format="default"
        target="RFC8231"/> selected for delegation. The election of the active
        PCE to delegate an LSP is controlled by each PCC. The PCC usually
        elects the active PCE by a local configured policy (by setting a
        priority). Upon PCEP session failure, or active PCE failure, PCC may
        decide to elect a new active PCE by sending new PCRpt message with D
        flag set to this new active PCE. When the failed PCE or PCEP session
        comes back online, it will be up to the implementation to do
        preemption. Doing preemption may lead to some <!--traffic Dhruv: removed traffic as MBB can help with traffic continuity-->
        disruption on the existing path if path results from both PCEs are not
        exactly the same. By considering a network with multiple PCCs and
        implementing multiple stateful PCEs for redundancy purpose, there is
        no guarantee that at any time all the PCCs delegate their LSPs to the
        same PCE.</t>

        <artwork align="left" alt="" name="" type=""><![CDATA[

          +----------+
          |   PCC1   |  LSP1
          +----------+
             /    \
            /      \
   +---------+    +---------+
   |  PCE1   |    |  PCE2   |
   +---------+    +---------+
           \       /
   *fail*   \     /
          +----------+
          |   PCC2   |  LSP2
          +----------+
    ]]></artwork>

        <t>In the example above, we consider that by configuration, both PCCs
        will firstly delegate their LSPs to PCE1. So, PCE1 is responsible for
        computing a path for both LSP1 and LSP2. If the PCEP session between
        PCC2 and PCE1 fails, PCC2 will delegate LSP2 to PCE2. So PCE1 becomes
        responsible only for LSP1 path computation while PCE2 is responsible
        for the path computation of LSP2. When the PCC2-PCE1 session is back
        online, PCC2 will keep using PCE2 as active PCE (consider no
        preemption in this example). So the result is a permanent situation
        where each PCE is responsible for a subset of path computation.</t>

        <t>This situation is called a split-brain scenario, as there are
        multiple computation brains running at the same time while a central
        computation unit was required in some deployments/use cases.</t>

        <t>Further, there are use cases where a particular LSP path
        computation is linked to another LSP path computation: the most common
        use case is path disjointness (see <xref format="default"
        target="RFC8800"/>) and Bidirectional LSPs (see <xref format="default"
        target="RFC9059"/>). The set of LSPs that are dependent to each other
        may start from a different head-end.</t>

        <artwork align="left" alt="" name="" type=""><![CDATA[
      _________________________________________
     /                                         \
    /        +------+            +------+       \
   |         | PCE1 |            | PCE2 |        |
   |         +------+            +------+        |
   |                                             |
   | +------+                          +------+  |
   | | PCC1 | ---------------------->  | PCC2 |  |
   | +------+                          +------+  |
   |                                             |
   |                                             |
   | +------+                          +------+  |
   | | PCC3 | ---------------------->  | PCC4 |  |
   | +------+                          +------+  |
   |                                             |
    \                                           /
     \_________________________________________/


      _________________________________________
     /                                         \
    /        +------+            +------+       \
   |         | PCE1 |            | PCE2 |        |
   |         +------+            +------+        |
   |                                             |
   | +------+           10             +------+  |
   | | PCC1 | ----- R1 ---- R2 ------- | PCC2 |  |
   | +------+       |        |         +------+  |
   |                |        |                   |
   |                |        |                   |
   | +------+       |        |         +------+  |
   | | PCC3 | ----- R3 ---- R4 ------- | PCC4 |  |
   | +------+                          +------+  |
   |                                             |
    \                                           /
     \_________________________________________/


    ]]></artwork>

        <t>In the figure above, the requirement is to create two link-disjoint
        LSPs: PCC1-&gt;PCC2 and PCC3-&gt;PCC4. In the topology, all links cost
        metric is set to 1 except for the link 'R1-R2' which has a metric of
        10. The PCEs are responsible for the path computation and PCE1 is the
        active primary PCE for all PCCs in the nominal case.</t>

        <t>The rest of this section lists various scenarios for illustrative purposes, there are many other cases where the solution defined in this document is applicable.</t>

        <t>Scenario 1:</t>

        <t>In the normal case (PCE1 as active primary PCE), consider that
        PCC1-&gt;PCC2 LSP is configured first with the link disjointness
        constraint, PCE1 sends a PCUpd message to PCC1 with the ERO:
        R1-&gt;R3-&gt;R4-&gt;R2-&gt;PCC2 (shortest path). PCC1 signals and
        installs the path. When PCC3-&gt;PCC4 is configured, the PCEs already
        knows the path of PCC1-&gt;PCC2 and can compute a link-disjoint path:
        the solution requires to move PCC1-&gt;PCC2 onto a new path to let
        room for the new LSP. PCE1 sends a PCUpd message to PCC1 with the new
        ERO: R1-&gt;R2-&gt;PCC2 and a PCUpd to PCC3 with the following ERO:
        R3-&gt;R4-&gt;PCC4. In the normal case, there is no issue for PCE1 to
        compute a link-disjoint path.</t>

        <t>Scenario 2:</t>

        <t>Consider that PCC1 lost its PCEP session with PCE1 (all other PCEP
        sessions are UP). PCC1 delegates its LSP to PCE2.</t>

        <artwork align="left" alt="" name="" type=""><![CDATA[

          +----------+
          |   PCC1   |  LSP: PCC1->PCC2
          +----------+
                  \
                   \ D=1
   +---------+    +---------+
   |  PCE1   |    |  PCE2   |
   +---------+    +---------+
       D=1 \       / D=0
            \     /
          +----------+
          |   PCC3   |  LSP: PCC3->PCC4
          +----------+
    ]]></artwork>

        <t>Consider that the PCC1-&gt;PCC2 LSP is configured first with the
        link disjointness constraint, PCE2 (which is the new active primary
        PCE for PCC1) sends a PCUpd message to PCC1 with the ERO:
        R1-&gt;R3-&gt;R4-&gt;R2-&gt;PCC2 (shortest path). When PCC3-&gt;PCC4
        is configured, PCE1 is not aware of LSPs from PCC1 any more, so it
        cannot compute a disjoint path for PCC3-&gt;PCC4 and will send a PCUpd
        message to PCC3 with the shortest path ERO: R3-&gt;R4-&gt;PCC4. When
        PCC3-&gt;PCC4 LSP will be reported to PCE2 by PCC3, PCE2 will ensure
        disjointness computation and will correctly move PCC1-&gt;PCC2 (as it
        owns delegation for this LSP) on the following path:
        R1-&gt;R2-&gt;PCC2. With this sequence of event and these PCEP
        sessions, disjointness is ensured.</t>

        <t>Scenario 3:</t>

        <artwork align="left" alt="" name="" type=""><![CDATA[

          +----------+
          |   PCC1   |  LSP: PCC1->PCC2
          +----------+
            /     \
       D=1 /       \ D=0
   +---------+    +---------+
   |  PCE1   |    |  PCE2   |
   +---------+    +---------+
                   / D=1
                  /
          +----------+
          |   PCC3   |  LSP: PCC3->PCC4
          +----------+
    ]]></artwork>

        <t>Consider the above PCEP sessions and the PCC1-&gt;PCC2 LSP is
        configured first with the link disjointness constraint, PCE1 computes
        the shortest path as it is the only LSP in the disjoint association
        group that it is aware of: R1-&gt;R3-&gt;R4-&gt;R2-&gt;PCC2 (shortest
        path). When PCC3-&gt;PCC4 is configured, PCE2 must compute a disjoint
        path for this LSP. The only solution found is to move PCC1-&gt;PCC2
        LSP on another path, but PCE2 cannot do it as it does not have
        delegation for this LSP. In this set-up, PCEs are not able to find a
        disjoint path.</t>

        <t>Scenario 4:</t>

        <artwork align="left" alt="" name="" type=""><![CDATA[

          +----------+
          |   PCC1   |  LSP: PCC1->PCC2
          +----------+
            /     \
       D=1 /       \ D=0
   +---------+    +---------+
   |  PCE1   |    |  PCE2   |
   +---------+    +---------+
        D=0 \      / D=1
             \    /
          +----------+
          |   PCC3   |  LSP: PCC3->PCC4
          +----------+
    ]]></artwork>

        <t>Consider the above PCEP sessions and that PCEs are configured to
        fall-back to the shortest path if disjointness cannot be found as
        described in <xref format="default" target="RFC8800"/>. The
        PCC1-&gt;PCC2 LSP is configured first, PCE1 computes the shortest path
        as it is the only LSP in the disjoint association group that it is
        aware of: R1-&gt;R3-&gt;R4-&gt;R2-&gt;PCC2 (shortest path). When
        PCC3-&gt;PCC4 is configured, PCE2 must compute a disjoint path for
        this LSP. The only solution found is to move PCC1-&gt;PCC2 LSP on
        another path, but PCE2 cannot do it as it does not have delegation for
        this LSP. PCE2 then provides the shortest path for PCC3-&gt;PCC4:
        R3-&gt;R4-&gt;PCC4. When PCC3 receives the ERO, it reports it back to
        both PCEs. When PCE1 becomes aware of the PCC3-&gt;PCC4 path, it
        recomputes the constrained shortest path first (CSPF) algorithm and
        provides a new path for PCC1-&gt;PCC2: R1-&gt;R2-&gt;PCC2. The new
        path is reported back to all PCEs by PCC1. PCE2 recomputes also CSPF
        to take into account the new reported path. The new computation does
        not lead to any path update.</t>

        <t>Scenario 5:</t>

        <artwork align="left" alt="" name="" type=""><![CDATA[
      _____________________________________
     /                                     \
    /        +------+        +------+       \
   |         | PCE1 |        | PCE2 |        |
   |         +------+        +------+        |
   |                                         |
   | +------+         100          +------+  |
   | |      | -------------------- |      |  |
   | | PCC1 | ----- R1 ----------- | PCC2 |  |
   | +------+       |              +------+  |
   |    |           |                  |     |
   |  6 |           | 2                | 2   |
   |    |           |                  |     |
   | +------+       |              +------+  |
   | | PCC3 | ----- R3 ----------- | PCC4 |  |
   | +------+               10     +------+  |
   |                                         |
    \                                       /
     \_____________________________________/


    ]]></artwork>

        <t>Now, consider a new network topology with the same PCEP sessions as
        the previous example. Suppose that both LSPs are configured almost at
        the same time. PCE1 will compute a path for PCC1-&gt;PCC2 while PCE2
        will compute a path for PCC3-&gt;PCC4. As each PCE is not aware of the
        path of the second LSP in the association group (not reported yet),
        each PCE is computing the shortest path for the LSP. PCE1 computes
        ERO: R1-&gt;PCC2 for PCC1-&gt;PCC2 and PCE2 computes ERO:
        R3-&gt;R1-&gt;PCC2-&gt;PCC4 for PCC3-&gt;PCC4. When these shortest
        paths will be reported to each PCE. Each PCE will recompute
        disjointness. PCE1 will provide a new path for PCC1-&gt;PCC2 with ERO:
        PCC1-&gt;PCC2. PCE2 will provide also a new path for PCC3-&gt;PCC4
        with ERO: R3-&gt;PCC4. When those new paths will be reported to both
        PCEs, this will trigger CSPF again. PCE1 will provide a new more
        optimal path for PCC1-&gt;PCC2 with ERO: R1-&gt;PCC2 and PCE2 will
        also provide a more optimal path for PCC3-&gt;PCC4 with ERO:
        R3-&gt;R1-&gt;PCC2-&gt;PCC4. So we come back to the initial state.
        When those paths will be reported to both PCEs, this will trigger CSPF
        again. An infinite loop of CSPF computation is then happening with a
        permanent flap of paths because of the split-brain situation.</t>

        <t>Another common example to note would be two LSPs with link-diverse paths that share a common node in its path but delegated to different PCEs. In case of the common node failure, both PCEs would detect the same and each could independently compute a new path that might both choose the same new link.</t>

        <t>This permanent computation loop comes from the inconsistency
        between the state of the LSPs as seen by each PCE due to the
        split-brain: each PCE is trying to modify at the same time its
        delegated path based on the last received path information which de
        facto invalidates this received path information.</t>

        <t>Scenario 6: multi-domain</t>

        <artwork align="left" alt="" name="" type=""><![CDATA[
         Domain/Area 1        Domain/Area 2
      ________________      ________________
     /                \    /                \
    /        +------+ |   |  +------+        \
   |         | PCE1 | |   |  | PCE3 |        |
   |         +------+ |   |  +------+        |
   |                  |   |                  |
   |         +------+ |   |  +------+        |
   |         | PCE2 | |   |  | PCE4 |        |
   |         +------+ |   |  +------+        |
   |                  |   |                  |
   | +------+         |   |        +------+  |
   | | PCC1 |         |   |        | PCC2 |  |
   | +------+         |   |        +------+  |
   |                  |   |                  |
   |                  |   |                  |
   | +------+         |   |        +------+  |
   | | PCC3 |         |   |        | PCC4 |  |
   | +------+         |   |        +------+  |
    \                 |   |                  |
     \_______________/     \________________/


    ]]></artwork>

        <t>In the example above, suppose that the disjoint LSPs from PCC1 to
        PCC2 and from PCC4 to PCC3 are created. All the PCEs have the
        knowledge of both domain topologies (e.g. using BGP-LS <xref
        format="default" target="RFC9552"/>). For operation/management
        reasons, each domain uses its own group of redundant PCEs. PCE1/PCE2
        in domain 1 have PCEP sessions with PCC1 and PCC3 while PCE3/PCE4 in
        domain 2 have PCEP sessions with PCC2 and PCC4. As PCE1/2 does not
        know about LSPs from PCC2/4 and PCE3/4 do not know about LSPs from
        PCC1/3, there is no possibility to compute the disjointness
        constraint. This scenario can also be seen as a split-brain scenario.
        This multi-domain architecture (with multiple groups of PCEs) can also
        be used in a single domain, where an operator wants to limit the
        failure domain by creating multiple groups of PCEs maintaining a
        subset of PCCs. As for the multi-domain example, there will be no
        possibility to compute the disjoint path starting from head-ends
        managed by different PCE groups.</t>

        <t>In this document, we specify a solution that addresses the
        possibility to compute LSP association based constraints (like
        disjointness) in split-brain scenarios while preventing computation
        loops.</t>
      </section>

      <section anchor="H-PCE" numbered="true" toc="default">
        <name>Applicability to H-PCE</name>

        <t><xref format="default" target="RFC8751"/> describes general
        considerations and use cases for the deployment of Stateful PCE(s)
        using the Hierarchical PCE <xref format="default" target="RFC6805"/>
        architecture. In this architecture, there is a clear need to
        communicate between a child stateful PCE and a parent stateful PCE.
        The procedures and extensions as described in <xref format="default"
        target="procedures"/> are equally applicable to the H-PCE
        scenario.</t>
      </section>
    </section>

    <section anchor="solution" numbered="true" toc="default">
      <name>Solution</name>

      <t>The solution specified in this document is based on:</t>

      <ul spacing="normal">
        <li>The creation of the inter-PCE stateful PCEP session with specific
        procedures.</li>

        <li>A Primary/Secondary relationship between stateful PCEs.</li>
      </ul>

      <t>The solution builds upon the protocol extensions for stateful PCE in
      <xref format="default" target="RFC8231"/>, synchronization optimizations in <xref format="default" target="RFC8232"/>, and PCE-initiation in <xref format="default" target="RFC8281"/>.</t>

      <section anchor="state-sync-session" numbered="true" toc="default">
        <name>State-sync Session</name>

        <t>This document specify a mechanism to set-up a PCEP session between the
        stateful PCEs. Creating such a session is already authorized by
        multiple scenarios like the one described in <xref format="default"
        target="RFC4655"/> (multiple PCEs that are handling part of the path
        computation) and <xref format="default" target="RFC6805"/>
        (hierarchical PCE) but was only focused on the stateless PCEP
        sessions. As stateful PCE brings additional features (LSP state
        synchronization, path update, delegation, ...), thus some new
        behaviors need to be defined.</t>

        <t>This inter-PCE PCEP session will allow the exchange of LSP states
        between PCEs that would help some scenarios where PCEP sessions are
        lost between PCC and PCE. This inter-PCE PCEP session is henceforth
        called a state-sync session.</t>

        <t>For example, in the scenario below, there is no possibility to
        compute disjointness as there is no PCE that is aware of both
        LSPs.</t>

        <artwork align="left" alt="" name="" type=""><![CDATA[

          +----------+
          |   PCC1   |  LSP: PCC1->PCC2
          +----------+
            /
       D=1 /
   +---------+       +---------+
   |  PCE1   |       |  PCE2   |
   +---------+       +---------+
                     / D=1
                    /
          +----------+
          |   PCC3   |  LSP: PCC3->PCC4
          +----------+
    ]]></artwork>

        <t>If we add a state-sync session, PCE1 will be able to do state
        synchronization via PCRpt messages for its LSP to PCE2 and PCE2 will
        do the same. All the PCEs will be aware of all LSPs even if a
        PCC-&gt;PCE session is down. PCEs will then be able to compute
        disjoint paths.</t>

        <artwork align="left" alt="" name="" type=""><![CDATA[

          +----------+
          |   PCC1   |  LSP : PCC1->PCC2
          +----------+
            /
       D=1 /
   +---------+ PCEP  +---------+
   |  PCE1   | ----- |  PCE2   |
   +---------+       +---------+
                     / D=1
                    /
          +----------+
          |   PCC3   |  LSP : PCC3->PCC4
          +----------+
    ]]></artwork>

        <t>The procedures associated with this state-sync session are defined
        in <xref format="default" target="procedures"/>.</t>

        <t>By just adding this state-sync session, it does not ensure that a
        path with LSP association based constraints can always be computed and
        does not prevent the computation loop, but it increases resiliency and
        ensures that PCEs will have the state information for all LSPs. Also,
        this session will allow for a PCE to update the other PCEs providing a
        faster synchronization mechanism than relying on PCCs only.</t>
      </section>

      <section anchor="masterslave" numbered="true" toc="default">
        <name>Primary/Secondary Relationship between PCE</name>

        <t>As seen in <xref format="default" target="intro"/>, performing a
        path computation in a split-brain scenario (multiple PCEs responsible
        for computation) may provide a non-optimal LSP placement, no path, or
        computation loops. To provide the best efficiency, an LSP association
        constraint-based computation requires that a single PCE performs the
        path computation for all LSPs in the association group. Note that, it
        could be all LSPs belonging to a particular association group, or all
        LSPs from a particular PCC, or all LSPs in the network that need to be
        delegated to a single PCE based on the deployment scenarios.</t>

        <t>This document specify a mechanism to add a priority mechanism between PCEs to
        elect a single computing 'primary' PCE. Using this priority mechanism,
        PCEs can agree on the PCE that will be responsible for the computation
        for a particular association group, or set of LSPs. The priority could
        be set per association, per PCC, or for all LSPs. The rest of
        the text considers the association group as an example.</t>

        <t>When a single PCE is performing the computation for a particular
        association group, no computation loop can happen and an optimal
        placement will be provided. The other PCEs will only act as state
        collectors and forwarders.</t>

        <t>In the scenario described in <xref format="default"
        target="state-sync-session"/>, PCE1 and PCE2 will decide that PCE1
        will be responsible for the path computation of both LSPs. If we first
        configure PCC1-&gt;PCC2, PCE1 computes the shortest path at it is the
        only LSP in the disjoint-group that it is aware of:
        R1-&gt;R3-&gt;R4-&gt;R2-&gt;PCC2 (shortest path). When PCC3-&gt;PCC4
        is configured, PCE2 will not perform computation even if it has
        delegation but forwards the delegation via PCRpt message to PCE1
        through the state-sync session. PCE1 will then perform disjointness
        computation and will move PCC1-&gt;PCC2 onto R1-&gt;R2-&gt;PCC2 and
        provides an ERO to PCE2 for PCC3-&gt;PCC4: R3-&gt;R4-&gt;PCC4. The
        PCE2 will further update the PCC3 with the new path.</t>
      </section>
    </section>

    <section anchor="procedures" numbered="true" toc="default">
      <name>Procedures and Protocol Extensions</name>

      <section anchor="open" numbered="true" toc="default">
        <name>Opening a state-sync session</name>

        <section anchor="capability" numbered="true" toc="default">
          <name>Capability Advertisement</name>

          <t>A PCE indicates its support of state-sync procedures during the
          PCEP Initialization phase <xref format="default" target="RFC5440"/>.
          The OPEN object in the Open message MUST contains the "Stateful PCE
          Capability" TLV defined in <xref format="default"
          target="RFC8231"/>. A new P (INTER-PCE-CAPABILITY) flag is
          introduced to indicate the support of state-sync.</t>

          <!--<t>
            The format of the STATEFUL-PCE-CAPABILITY TLV is shown in the following figure:
            <figure>
            <artwork>
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |               Type            |            Length=4           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |              Flags                              |P|F|D|T|I|S|U|
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            </artwork>
            </figure>-->

          <t>This document adds a new bit in the Flags field with :</t>

          <ul spacing="normal">
            <li>P (INTER-PCE-CAPABILITY - 1 bit - TBD4): If set to 1 by a PCEP
            Speaker, the PCEP speaker indicates that the session MUST follow
            the state-sync procedures as described in this document. The P bit
            MUST be set by both speakers: if a PCEP Speaker receives a
            STATEFUL-PCE-CAPABILITY TLV with P=0 while it advertised P=1 or if
            both set P flag to 0, the session SHOULD be set-up but the
            state-sync procedures MUST NOT be applied on this session.</li>
          </ul>

          <t>The U flag <xref format="default" target="RFC8231"/> MUST be set
          when sending the STATEFUL-PCE-CAPABILITY TLV with the P flag set. In
          case the U flag is not set along with the P flag, the state sync
          capability is not enabled and it is considered as if the P flag is
          not set. The S flag MAY be set if optimized synchronization is
          required as per <xref format="default" target="RFC8232"/>.</t>
        </section>
      </section>

      <section anchor="sync" numbered="true" toc="default">
        <name>State Synchronization</name>

        <t>When the state sync capability has been negotiated between stateful
        PCEs, each PCEP speaker will behave as a PCE and as a PCC at the same
        time regarding the state synchronization as defined in <xref
        format="default" target="RFC8231"/>. This means that each PCEP
        Speaker:</t>

        <ul spacing="normal">
          <li>MUST send a PCRpt message towards its neighbor with S flag set
          for each LSP in its LSP database learned from a PCC. (PCC role)</li>

          <li>MUST send the End Of Synchronization Marker towards its neighbor
          when all LSPs have been reported. (PCC role)</li>

          <li>MUST wait for the LSP synchronization from its neighbor to end
          (receiving an End Of Synchronization Marker). (PCE role)</li>
        </ul>

        <t>The process of synchronization runs in parallel on each PCE (with
        no defined order).</t>

        <t>The optimized state synchronization procedures MAY be used, as
        defined in <xref format="default" target="RFC8232"/>.</t>

        <t>When a PCEP Speaker sends a PCRpt on a state-sync session, it MUST
        add the SPEAKER-ENTITY-ID TLV (defined in <xref format="default"
        target="RFC8232"/>) in the LSP Object, the value used will refer to
        the 'owner' PCC of the LSP. If a PCEP Speaker receives a PCRpt on a
        state-sync session without this TLV, it MUST discard the PCRpt message
        and it MUST reply with a PCErr message using error-type=6 (Mandatory
        Object missing) and error-value=TBD1 (SPEAKER-ENTITY-ID TLV
        missing).</t>
      </section>

      <section anchor="updates" numbered="true" toc="default">
        <name>Incremental Updates and Report Forwarding Rules</name>

        <t>During the life of an LSP, its state may change (path, constraints,
        operational state...) and a PCC will advertise a new PCRpt to the PCE
        for each such change.</t>

        <t>When propagating LSP state changes from a PCE to other PCEs, it is
        mandatory to ensure that a PCE always uses the freshest state coming
        from the PCC.</t>

        <t>When a PCE receives a new PCRpt from a PCC with the LSP-DB-VERSION,
        the PCE MUST forward the PCRpt to all its state-sync sessions and MUST
        add the appropriate SPEAKER-ENTITY-ID TLV in the PCRpt. In addition, it
        MUST add a new ORIGINAL-LSP-DB-VERSION TLV (described below). The
        ORIGINAL-LSP-DB-VERSION contains the LSP-DB-VERSION coming from the
        PCC.</t>

        <t>When a PCE receives a new PCRpt from a PCC without the
        LSP-DB-VERSION, it SHOULD NOT forward the PCRpt on any state-sync
        sessions and log such an event on the first occurrence.</t>

        <t>When a PCE receives a new PCRpt from a PCC with the R flag (Remove)
        set and an LSP-DB-VERSION TLV, the PCE MUST forward the PCRpt to all
        its state-sync sessions keeping the R flag set (Remove) and MUST add
        the appropriate SPEAKER-ENTITY-ID TLV and ORIGINAL-LSP-DB-VERSION TLV
        in the PCRpt message.</t>

        <t>When a PCE receives a PCRpt from a state-sync session, it MUST NOT
        forward the PCRpt to other state-sync sessions. This helps to prevent
        message loops between PCEs. As a consequence, a full mesh of PCEP
        sessions between PCEs are REQUIRED.</t>

        <t>When a PCRpt is forwarded, all the original objects and values are
        kept. As an example, the PLSP-ID used in the forwarded PCRpt will be
        the same as the original one used by the PCC. Thus an implementation
        supporting this document MUST consider SPEAKER-ENTITY-ID TLV and
        PLSP-ID together to uniquely identify an LSP on the state-sync
        session.</t>

        <t>The ORIGINAL-LSP-DB-VERSION TLV is encoded as follows and MUST
        always contain the LSP-DB-VERSION received from the owner PCC of the
        LSP:</t>

        <artwork align="left" alt="" name="" type=""><![CDATA[
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Type=TBD2           |            Length=8           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 LSP State DB Version Number                   |
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        ]]></artwork>

        <t>Using the ORIGINAL-LSP-DB-VERSION TLV allows a PCE to keep using
        optimized synchronization (<xref format="default" target="RFC8232"/>)
        with another PCE. In such a case, the PCE will send a PCRpt to another
        PCE with both ORIGINAL-LSP-DB-VERSION TLV and LSP-DB-VERSION TLV. The
        ORIGINAL-LSP-DB-VERSION TLV will contain the version number as
        allocated by the PCC while the LSP-DB-VERSION will contain the version
        number allocated by the local PCE.</t>
      </section>

      <section anchor="maintenance" numbered="true" toc="default">
        <name>Maintaining LSP States from Different Sources</name>

        <t>When a PCE receives a PCRpt on a state-sync session, it stores the
        LSP information into the original PCC address context (as the LSP
        belongs to the PCC). A PCE SHOULD maintain a single state for a
        particular LSP and SHOULD maintain the list of sources it learned a
        particular state from.</t>

        <t>A PCEP speaker may receive state information for a particular LSP
        from different sources: the PCC that owns the LSP (through a regular
        PCEP session) and some PCEs (through PCEP state-sync sessions). A PCEP
        speaker MUST always keep the freshest state in its LSP database,
        overriding the previously received information.</t>

        <t>A PCE, receiving a PCRpt from a PCC, updates the state of the LSP
        in its LSP-DB with the newly received information. When receiving a
        PCRpt from another PCE, a PCE SHOULD update the LSP state only if the
        ORIGINAL-LSP-DB-VERSION present in the PCRpt indicates it is newer than
        the current ORIGINAL-LSP-DB-VERSION of the stored LSP state taking wrap around
        into account. This ensures
        that a PCE never tries to update its stored LSP state with an old
        information. Each time a PCE updates an LSP state in its LSP-DB, it
        SHOULD reset the source list associated with the LSP state and SHOULD
        add the source speaker address in the source list. When a PCE receives
        a PCRpt which has an ORIGINAL-LSP-DB-VERSION (if coming from a PCE) or
        an LSP-DB-VERSION (if coming from the PCC) equals to the current
        ORIGINAL-LSP-DB-VERSION of the stored LSP state, it SHOULD add the
        source speaker address in the source list.</t>

        <t>When a PCE receives a PCRpt requesting an LSP deletion from a
        particular source, it SHOULD remove this particular source from the
        list of sources associated with this LSP.</t>

        <t>When the list of sources becomes empty for a particular LSP, the
        LSP state MUST be removed. This means that all the sources must send a
        PCRpt with R=1 for an LSP to make the PCE remove the LSP state.</t>

        <t>Note that a PCC uses the Open message exchange during PCEP session establishment to inform the PCE about its capabilities and parameters. Currently, there is no mechanism to pass that information to other PCEs via the state-sync session.</t>
      </section>

      <section anchor="priority" numbered="true" toc="default">
        <name>Computation Priority between PCEs and Sub-delegation</name>

        <t>A computation priority is necessary to ensure that a single PCE
        will perform the computation for all the LSPs in an association group:
        this will allow for a more optimized LSP placement and will prevent
        computation loops.</t>

        <t>All PCEs in the network that are handling LSPs in a common LSP
        association group SHOULD be aware of each other including the
        computation priority of each PCE. Note that there is no need for PCC
        to be aware of this. The computation priority is a number and the PCE
        having the highest priority MUST be responsible for the computation.
        If several PCEs have the same priority value, their IP address MUST
        be used as a tie-breaker to provide a rank: the highest IP address has
        more priority.</t>

        <t>The computation priorities could be set through local configurations.
        The priority for local and remote PCEs could be set at global level
        so the highest
        priority PCE will handle all path computations or more granular, so a
        PCE may have the highest priority for only a subset of LSPs or
        association-groups. See <xref target="control"/> for more details.
        In future, PCEs could also advertise and discover these parameters via PCEP,
        those details are out
        of the scope of this document and left for future specification.</t>

        <t>A PCEP Speaker receiving a PCRpt from a PCC with the D flag set
        that does not have the highest computation priority, SHOULD forward
        the PCRpt on all state-sync sessions (as per <xref format="default"
        target="updates"/>) and SHOULD set D flag on the state-sync session
        towards the highest priority PCE, D flag will be unset to all other
        state-sync sessions. This behavior is similar to the delegation
        behavior handled at the PCC side and is called a sub-delegation (the
        PCE sub-delegates the control of the LSP to another PCE). When a PCEP
        Speaker sub-delegates an LSP to another PCE, it loose control of the
        LSP and cannot update it anymore by its own decision. When a PCE
        receives a PCRpt with D flag set on a state-sync session, as a regular
        PCE, it is granted control over the LSP.</t>

        <t>If the highest priority PCE is failing or if the state-sync session
        between the local PCE and the highest priority PCE failed, the local
        PCE MAY decide to delegate the LSP to the next highest priority PCE or
        to take back control of the LSP. It is a local policy decision.</t>

        <t>When a PCE has the delegation for an LSP and needs to update this
        LSP, it MUST send a PCUpd message to all state-sync sessions and to
        the PCC session on which it received the delegation. The D-Flag would
        be unset in the PCUpd for state-sync sessions whereas the D-Flag would
        be set for the PCC. In the case of sub-delegation, the computing PCE
        will send the PCUpd only to all state-sync sessions (as it has no
        direct delegation from a PCC). The D-Flag would be set for the
        state-sync session to the PCE that sub-delegated this LSP and the
        D-Flag would be unset for other state-sync sessions.</t>

        <t>The PCUpd sent over a state-sync session MUST contain the
        SPEAKER-ENTITY-ID TLV in the LSP Object (the value used must identify
        the target PCC). The PLSP-ID used is the original PLSP-ID generated by
        the PCC and learned from the forwarded PCRpt. If a PCE receives a
        PCUpd on a state-sync session without the SPEAKER-ENTITY-ID TLV, it
        MUST discard the PCUpd and MUST reply with a PCErr message using
        error-type=6 (Mandatory Object missing) and error-value=TBD1
        (SPEAKER-ENTITY-ID TLV missing).</t>

        <t>When a PCE receives a valid PCUpd on a state-sync session, it
        SHOULD forward the PCUpd to the appropriate PCC (identified based on
        the SPEAKER-ENTITY-ID TLV value) that delegated the LSP originally and
        SHOULD remove the SPEAKER-ENTITY-ID TLV from the LSP Object. The
        acknowledgment of the PCUpd is done through a cascaded mechanism, and
        the PCC is the only responsible for triggering the acknowledgment:
        when the PCC receives the PCUpd from the local PCE, it acknowledges it
        with a PCRpt as per <xref format="default" target="RFC8231"/>. When
        receiving the new PCRpt from the PCC, the local PCE uses the defined
        forwarding rules on the state-sync session so the acknowledgment is
        relayed to the computing PCE.</t>

<section anchor="pri-assoc" numbered="true" toc="default">
        <name>Association Group</name>

        <t>All LSPs belonging to the same association group SHOULD have the
        same computation priorities for the PCEs. A PCE SHOULD NOT compute a
        path using an association-group
        constraint if it has delegation for only a subset of LSPs in the
        association-group. In this case, an implementation MAY use a local policy on PCE
        to decide if PCE does not compute path at all for this set of LSP or
        if it can compute a path by relaxing the association-group
        constraint.</t>
      </section>

      </section>

      <section anchor="passive" numbered="true" toc="default">
        <name>Passive Stateful Procedures</name>

        <t>In the passive stateful PCE architecture, the PCC is responsible
        for triggering a path computation request using a PCReq message to its
        PCE. Similarly to PCRpt Message, which remains unchanged for passive
        mode, if a PCE receives a PCReq for an LSP and if this PCE finds that
        it does not have the highest computation priority of this LSP, or
        groups, it MUST forward the PCReq message to the highest priority
        PCE over the state-sync session. When the highest priority PCE
        receives the PCReq, it computes the path and generates a PCRep message
        towards the PCE that made the request. This PCE will then forward the
        PCRep to the requesting PCC. The handling of LSP object and the
        SPEAKER-ENTITY-ID TLV in PCReq and PCRep is similar to PCRpt/PCUpd
        messages.</t>
      </section>

      <section anchor="init" numbered="true" toc="default">
        <name>PCE Initiation Procedures</name>

        <t>It is possible that a PCE does not have a PCEP session with the
        headend to initiate a LSP as per <xref format="default"
        target="RFC8281"/>. A PCE could send the PCInitiate message on the
        state-sync sessions to other PCE to request it to create a
        PCE-Initiated LSP on its behalf. If the PCE is able to initiate the
        LSP it would report it on the state-sync session via PCRpt message. If
        the PCE does not have a session to the headend, it MUST send a PCErr
        message with Error-type=24 (PCE instantiation error) and
        Error-value=TBD5 (No PCEP session with the headend). PCE could try to
        initiate via another state-sync PCE if available.</t>
      </section>
    </section>

    <section anchor="examples" numbered="true" toc="default">
      <name>Examples</name>

      <t>The examples in this section are for illustrative purpose only, to show how
      the behavior of the state sync inter-PCE session works.</t>

      <section anchor="example1" numbered="true" toc="default">
        <name>Example 1 - Successful disjoint paths (requiring reroute)</name>

        <artwork align="left" alt="" name="" type=""><![CDATA[
      _________________________________________
     /                                         \
    /        +------+            +------+       \
   |         | PCE1 |            | PCE2 |        |
   |         +------+            +------+        |
   |                                             |
   | +------+           10             +------+  |
   | | PCC1 | ----- R1 ---- R2 ------- | PCC2 |  |
   | +------+       |        |         +------+  |
   |                |        |                   |
   |                |        |                   |
   | +------+       |        |         +------+  |
   | | PCC3 | ----- R3 ---- R4 ------- | PCC4 |  |
   | +------+                          +------+  |
   |                                             |
    \                                           /
     \_________________________________________/


          +----------+
          |   PCC1   |  LSP : PCC1->PCC2
          +----------+
            /
       D=1 /
   +---------+    +---------+
   |  PCE1   |----|  PCE2   |
   +---------+    +---------+
                   / D=1
                  /
          +----------+
          |   PCC3   |  LSP : PCC3->PCC4
          +----------+

PCE1 computation priority 100
PCE2 computation priority 200
    ]]></artwork>

        <t>Consider the PCEP sessions as shown above, where computation
        priority is global for all the LSPs and a link disjoint path between LSPs
        PCC1-&gt;PCC2 and PCC3-&gt;PCC4 is required.</t>

        <t>Consider the PCC1-&gt;PCC2 is configured first and PCC1 delegates
        the LSP to PCE1, but as PCE1 does not have the highest computation
        priority, it sub-delegates the LSP to PCE2 by sending a PCRpt with D=1
        and including the SPEAKER-ENTITY-ID TLV over the state-sync session.
        PCE2 receives the PCRpt and as it has delegation for this LSP, it
        computes the shortest path: R1-&gt;R3-&gt;R4-&gt;R2-&gt;PCC2. It then
        sends a PCUpd to PCE1 (including the SPEAKER-ENTITY-ID TLV) with the
        computed ERO. PCE1 forwards the PCUpd to PCC1 (removing the
        SPEAKER-ENTITY-ID TLV). PCC1 acknowledges the PCUpd by a PCRpt to PCE1.
        PCE1 forwards the PCRpt to PCE2.</t>

        <t>When PCC3-&gt;PCC4 is configured, PCC3 delegates the LSP to PCE2,
        PCE2 can compute a disjoint path as it has knowledge of both LSPs and
        has delegation also for both. The only solution found is to move
        PCC1-&gt;PCC2 LSP on another path, PCE2 can move PCC1-&gt;PCC2 as it
        has sub-delegation for it. It creates a new PCUpd with a new ERO:
        R1-&gt;R2-PCC2 towards PCE1 which forwards to PCC1. PCE2 sends a PCUpd
        to PCC3 with the path: R3-&gt;R4-&gt;PCC4.</t>

        <t>In this set-up, PCEs are able to find a disjoint path while without
        state-sync and computation priority they could not.</t>
      </section>

      <section anchor="example2" numbered="true" toc="default">
        <name>Example 2 - Successful disjoint paths (simultaneous turnup)</name>

        <artwork align="left" alt="" name="" type=""><![CDATA[
      _____________________________________
     /                                     \
    /        +------+        +------+       \
   |         | PCE1 |        | PCE2 |        |
   |         +------+        +------+        |
   |                                         |
   | +------+         100          +------+  |
   | |      | -------------------- |      |  |
   | | PCC1 | ----- R1 ----------- | PCC2 |  |
   | +------+       |              +------+  |
   |    |           |                  |     |
   |  6 |           | 2                | 2   |
   |    |           |                  |     |
   | +------+       |              +------+  |
   | | PCC3 | ----- R3 ----------- | PCC4 |  |
   | +------+               10     +------+  |
   |                                         |
    \                                       /
     \_____________________________________/


          +----------+
          |   PCC1   |  LSP : PCC1->PCC2
          +----------+
            /     \
       D=1 /       \ D=0
   +---------+    +---------+
   |  PCE1   |----|  PCE2   |
   +---------+    +---------+
        D=0 \      / D=1
             \    /
          +----------+
          |   PCC3   |  LSP : PCC3->PCC4
          +----------+

PCE1 computation priority 200
PCE2 computation priority 100
    ]]></artwork>

        <t>In this example, suppose both LSPs are configured almost at the
        same time. PCE1 sub-delegates PCC1-&gt;PCC2 to PCE2 while PCE2 keeps
        delegation for PCC3-&gt;PCC4, PCE2 computes a path for PCC1-&gt;PCC2
        and PCC3-&gt;PCC4 and can achieve disjointness computation easily. No
        computation loop happens in this case.</t>
      </section>

      <section anchor="example3" numbered="true" toc="default">
        <name>Example 3 - Unfeasible disjoint paths (insufficient state-sync sessions)</name>

        <artwork align="left" alt="" name="" type=""><![CDATA[
      _________________________________________
     /                                         \
    /        +------+            +------+       \
   |         | PCE1 |            | PCE2 |        |
   |         +------+            +------+        |
   |                                             |
   | +------+           10             +------+  |
   | | PCC1 | ----- R1 ---- R2 ------- | PCC2 |  |
   | +------+       |        |         +------+  |
   |                |        |                   |
   |                |        |                   |
   | +------+       |        |         +------+  |
   | | PCC3 | ----- R3 ---- R4 ------- | PCC4 |  |
   | +------+                          +------+  |
   |                                             |
    \                                           /
     \_________________________________________/


          +----------+
          |   PCC1   |  LSP : PCC1->PCC2
          +----------+
            /
       D=1 /
   +---------+    +---------+    +---------+
   |  PCE1   |----|  PCE2   |----|  PCE3   |
   +---------+    +---------+    +---------+
                   / D=1
                  /
          +----------+
          |   PCC3   |  LSP : PCC3->PCC4
          +----------+

PCE1 computation priority 100
PCE2 computation priority 200
PCE3 computation priority 300
    ]]></artwork>

        <t>With the PCEP sessions as shown above, consider the need to have
        link disjoint LSPs PCC1-&gt;PCC2 and PCC3-&gt;PCC4.</t>

        <t>Suppose PCC1-&gt;PCC2 is configured first, PCC1 delegates the LSP
        to PCE1, but as PCE1 does not have the highest computation priority,
        it will sub-delegate the LSP to PCE2 (as it not aware of PCE3 and has
        no way to reach it). PCE2 cannot compute a path for PCC1-&gt;PCC2 as
        it does not have the highest priority and is not allowed to
        sub-delegate the LSP again towards PCE3 as per <xref format="default"
        target="procedures"/>.</t>

        <t>When PCC3-&gt;PCC4 is configured, PCC3 delegates the LSP to PCE2
        that performs sub-delegation to PCE3. As PCE3 will have knowledge of
        only one LSP in the group, it cannot compute disjointness and can
        decide to fall-back to a less constrained computation to provide a
        path for PCC3-&gt;PCC4. In this case, it will send a PCUpd to PCE2
        that will be forwarded to PCC3.</t>

        <t>Disjointness cannot be achieved in this scenario because of lack of
        state-sync session between PCE1 and PCE3, but no computation loop
        happens. Thus it is required for all PCEs that support state-sync to
        have a full mesh sessions between each other.</t>
      </section>
    </section>

    <section anchor="scaling" numbered="true" toc="default">
      <name>Using Primary/Secondary Computation and State-sync Sessions to
      increase Scaling</name>

      <t>The Primary/Secondary computation and state-sync sessions
      architecture can be used to increase the scaling of the PCE
      architecture. If the number of PCCs is really high, it may be too
      resource consuming for a single PCE instance to maintain all the PCEP sessions
      while at the same time performing all path computations. Using
      primary/secondary computation and state-sync sessions may allow to
      create groups of PCEs that manage a subset of the PCCs and perform some
      or no path computations. Decoupling PCEP session maintenance and
      computation will allow increasing scaling of the PCE architecture.</t>

      <artwork align="left" alt="" name="" type=""><![CDATA[

            +----------+
            |  PCC500  |
          +----------+-+
          |   PCC1   |
          +----------+
            /     \
           /       \
   +---------+   +---------+
   |  PCE1   |---|  PCE2   |
   +---------+   +---------+
        |    \  /    |
        |     \/     |
        |     /\     |
        |    /  \    |
   +---------+   +---------+
   |  PCE3   |---|  PCE4   |
   +---------+   +---------+
           \       /
            \     /
          +----------+
          |  PCC501  |
          +----------+-+
            |  PCC1000 |
            +----------+

    ]]></artwork>

      <t>In the figure above, two groups of PCEs are created: PCE1/2 maintain
      PCEP sessions with PCC1 up to PCC500, while PCE3/4 maintain PCEP
      sessions with PCC501 up to PCC1000. A granular primary/secondary policy
      is set-up as follows to load-share computation between PCEs:</t>

      <ul spacing="normal">
        <li>PCE1 has priority 200 for association ID 1 up to 300, association
        source 0.0.0.0. All other PCEs have a decreasing priority for those
        associations.</li>

        <li>PCE3 has priority 200 for association ID 301 up to 500,
        association source 0.0.0.0. All other PCEs have a decreasing priority
        for those associations.</li>
      </ul>

      <t>If some PCCs delegate LSPs with association ID 1 up to 300 and
      association source 0.0.0.0, the receiving PCE (if not PCE1) will
      sub-delegate the LSPs to PCE1. PCE1 becomes responsible for the
      computation of these LSP associations while PCE3 is responsible for the
      computation of another set of associations.</t>

      <t>The procedures described in this document could help greatly in
      load-sharing between a group of stateful PCEs.</t>
    </section>

    <section anchor="loop-avoidance" numbered="true" toc="default">
      <name>PCEP-PATH-VECTOR TLV</name>

      <t>This specification allows PCEP messages to be propagated among PCEP
      speaker. It may be useful to track information about the propagation of
      the messages. One of the use cases is a message loop detection
      mechanism, but other use cases like hop by hop information recording may
      also be implemented in future.</t>

      <t>This document introduces the PCEP-PATH-VECTOR TLV (type TBD3) to
      be encoded in the LSP Object with
      the following format:</t>

      <artwork align="left" alt="" name="" type=""><![CDATA[
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               Type=TBD3       |            Length             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              PCEP-SPEAKER-INFORMATION#1                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              ...                                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              ...                                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              PCEP-SPEAKER-INFORMATION#n                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ]]></artwork>

      <t>The TLV format and padding rules are as per <xref format="default"
      target="RFC5440"/>.</t>

      <t>The PCEP-SPEAKER-INFORMATION field has the following format:</t>

      <artwork align="left" alt="" name="" type=""><![CDATA[
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Length                    |      ID Length                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
//              Speaker Entity identity (variable)             //
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
//              Sub-TLVs (optional)                            //
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        ]]></artwork>

      <ul spacing="normal">
        <li>Length: defines the total length of the PCEP-SPEAKER-INFORMATION
        field.</li>

        <li>ID Length: defines the length of the Speaker identity actual field
        (non-padded).</li>

        <li>Speaker Entity identity: same possible values as the
        SPEAKER-IDENTIFIER-TLV. Padded with trailing zeros to a 4-byte
        boundary.</li>

        <li>The PCEP-SPEAKER-INFORMATION may also carry some optional sub-TLVs
        so each PCEP speaker can add local information that could be recorded.
        This document does not define any sub-TLV.</li>
      </ul>

      <t>The PCEP-PATH-VECTOR TLV MAY be carried in the LSP Object. Its usage
      is purely optional.</t>

      <t>If a PCEP speaker receives a message with PCEP-PATH-VECTOR TLV
      and finds its speaker information already present in the PCEP-PATH-VECTOR TLV,
      it MUST ignore the PCEP message and SHOULD log it as an error.</t>

      <t>The list of speakers within the PCEP-PATH-VECTOR TLV MUST be ordered.
      When sending a PCEP message (PCRpt, PCUpd, or PCInitiate), a PCEP
      Speaker MAY add the PCEP-PATH-VECTOR TLV with a PCEP-SPEAKER-INFORMATION
      containing its own information. If the PCEP message sent is the result
      of a previously received PCEP message, and if the PCEP-PATH-VECTOR TLV
      was already present in the initial message, the PCEP speaker MAY append
      a new PCEP-SPEAKER-INFORMATION containing its own information.</t>


    </section>

    <section anchor="Security" numbered="true" toc="default">
      <name>Security Considerations</name>

      <t>The security considerations described in <xref format="default"
      target="RFC8231"/> and <xref format="default" target="RFC5440"/> apply
      to the extensions described in this document as well. Additional
      considerations related to state synchronization and sub-delegation
      between stateful PCEs are introduced, as it could be spoofed and could
      be used as an attack vector. An attacker could attempt to create too
      much state in an attempt to load the PCEP peer. The PCEP peer could respond
      with a PCErr message as described in <xref format="default"
      target="RFC8231"/>. An attacker could impact LSP operations by creating
      bogus state. Further, state synchronization between stateful PCEs could
      provide an adversary with the opportunity to eavesdrop on the network.
      Thus, securing the PCEP session using Transport Layer Security (TLS)
      <xref format="default" target="RFC8253"/>, as per the recommendations
      and best current practices in <xref format="default" target="RFC9325"/>,
      is RECOMMENDED.</t>
    </section>

    <section anchor="Imp" title="Implementation Status">
      <t>[Note to the RFC Editor - remove this section before publication, as
      well as remove the reference to RFC 7942.]</t>

      <t>This section records the status of known implementations of the
      protocol defined by this specification at the time of posting of this
      Internet-Draft, and is based on a proposal described in <xref
      target="RFC7942"/>. The description of implementations in this section
      is intended to assist the IETF in its decision processes in progressing
      drafts to RFCs. Please note that the listing of any individual
      implementation here does not imply endorsement by the IETF. Furthermore,
      no effort has been spent to verify the information presented here that
      was supplied by IETF contributors. This is not intended as, and must not
      be construed to be, a catalog of available implementations or their
      features. Readers are advised to note that other implementations may
      exist.</t>

      <t>According to <xref target="RFC7942"/>, "this will allow reviewers and
      working groups to assign due consideration to documents that have the
      benefit of running code, which may serve as evidence of valuable
      experimentation and feedback that have made the implemented protocols
      more mature. It is up to the individual working groups to use this
      information as they see fit".</t>

      <t>At the time of posting the -06 version of this document, there are no
      known implementations of this mechanism. It is believed that some
      vendors are considering implementations, but these plans are too vague
      to make any further assertions.</t>
    </section>

    <section title="Manageability Considerations" toc="default">
      <section title="Control of Function and Policy" toc="default" anchor="control">
        <t>An operator MUST be allowed to configure the capability to support
        state-sync procedures for a inter-PCE session. They MUST allow
        configuration of a computation priority of the local and remote PCEs at the global level. They MAY also allow configuration of computation priority of the local and remote PCEs per association (or a range of them). Further, they MAY also allow configuration of computation priority per PCC (or range of them). An implementation MAY support other such configuration levels for computation priority of the local and remote PCEs.</t>
      </section>

      <section title="Information and Data Models" toc="default">
        <t>An implementation SHOULD allow the operator to view the capability
        defined in this document. To serve this purpose, the PCEP YANG module
        <xref target="I-D.ietf-pce-pcep-yang"/> could be extended in the
        future.</t>
      </section>

      <section title="Liveness Detection and Monitoring" toc="default">
        <t>Mechanisms defined in this document do not imply any new liveness
        detection and monitoring requirements in addition to those already
        listed in <xref target="RFC5440"/>.</t>
      </section>

      <section title="Verify Correct Operations" toc="default">
        <t>Mechanisms defined in this document do not imply any new operation
        verification requirements in addition to those already listed in <xref
        target="RFC5440"/>.</t>
      </section>

      <section title="Requirements On Other Protocols" toc="default">
        <t>Mechanisms defined in this document do not imply any new
        requirements on other protocols.</t>
      </section>

      <section title="Impact On Network Operations" toc="default">
        <t>Mechanisms defined in this document improves the network operations
        by alleviating the problems described in <xref target="intro"/>.</t>
      </section>
    </section>

    <section anchor="Acknowledgements" numbered="true" toc="default">
      <name>Acknowledgements</name>

      <t>Thanks to <xref format="default" target="I-D.knodel-terminology"/>
      urging for better use of terms.</t>
    </section>

    <section anchor="IANA" numbered="true" toc="default">
      <name>IANA Considerations</name>

      <t>This document requests IANA actions to allocate code points for the
      protocol elements defined in this document.</t>

      <section anchor="IANA-error" numbered="true" toc="default">
        <name>PCEP-Error Object</name>

        <t>IANA is requested to allocate a new Error Value for the Error Type
        6 and 24.</t>

        <table align="left">
          <thead>
            <tr>
              <th align="center">Error-Type</th>

              <th align="left">Meaning</th>

              <th align="left">Reference</th>
            </tr>
          </thead>

          <tbody>
            <tr>
              <td align="center">6</td>

              <td align="left">Mandatory Object Missing</td>

              <td align="left">
                <xref format="default" target="RFC5440"/>
              </td>
            </tr>

            <tr>
              <td align="center"/>

              <td align="left">Error-value=TBD1: SPEAKER-ENTITY-ID TLV
              missing</td>

              <td align="left">This document</td>
            </tr>

            <tr>
              <td align="center">24</td>

              <td align="left">LSP instantiation error</td>

              <td align="left">
                <xref format="default" target="RFC8281"/>
              </td>
            </tr>

            <tr>
              <td align="center"/>

              <td align="left">Error-value=TBD5: No PCEP session with the
              headend</td>

              <td align="left">This document</td>
            </tr>
          </tbody>
        </table>
      </section>

      <section anchor="IANA-TLV" numbered="true" toc="default">
        <name>PCEP TLV Type Indicators</name>

        <t>IANA is requested to allocate new TLV Type Indicator values within
        the "PCEP TLV Type Indicators" sub-registry of the PCEP Numbers
        registry, as follows:</t>

        <table align="left">
          <thead>
            <tr>
              <th align="center">Value</th>

              <th align="center">Meaning</th>

              <th align="center">Reference</th>
            </tr>
          </thead>

          <tbody>
            <tr>
              <td align="center">TBD2</td>

              <td align="center">ORIGINAL-LSP-DB-VERSION TLV</td>

              <td align="center">This document</td>
            </tr>

            <tr>
              <td align="center">TBD3</td>

              <td align="center">PCEP-PATH-VECTOR TLV</td>

              <td align="center">This document</td>
            </tr>
          </tbody>
        </table>
      </section>

      <section anchor="IANA-cap" numbered="true" toc="default">
        <name>STATEFUL-PCE-CAPABILITY TLV</name>

        <t>IANA is requested to allocate a new bit value in the
        STATEFUL-PCE-CAPABILITY TLV Flag Field sub-registry.</t>

        <table align="left">
          <thead>
            <tr>
              <th align="center">Bit</th>

              <th align="center">Description</th>

              <th align="center">Reference</th>
            </tr>
          </thead>

          <tbody>
            <tr>
              <td align="center">TBD4</td>

              <td align="center">INTER-PCE-CAPABILITY</td>

              <td align="center">This document</td>
            </tr>
          </tbody>
        </table>
      </section>
    </section>
  </middle>

  <back>
    <references>
      <name>References</name>

      <references>
        <name>Normative References</name>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5440.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8231.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8232.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8253.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>
      </references>

      <references>
        <name>Informative References</name>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4655.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6805.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7399.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9325.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9552.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7942.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8051.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8281.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8751.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8800.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9059.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://datatracker.ietf.org/doc/bibxml3/draft-knodel-terminology.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>

        <xi:include href="https://datatracker.ietf.org/doc/bibxml3/draft-ietf-pce-pcep-yang.xml"
                    xmlns:xi="http://www.w3.org/2001/XInclude"/>
      </references>
    </references>

    <section numbered="true" toc="default">
      <name>Contributors</name>

      <artwork align="left" alt="" name="" type=""><![CDATA[
Dhruv Dhody
Huawei
India

Email: dhruv.ietf@gmail.com
    ]]></artwork>
    </section>
  </back>
</rfc>
