<?xml version="1.0" encoding="utf-8"?>
<!-- 
     draft-rfcxml-general-template-standard-00
  
     This template includes examples of the most commonly used features of RFCXML with comments 
     explaining how to customise them. This template can be quickly turned into an I-D by editing 
     the examples provided. Look for [REPLACE], [REPLACE/DELETE], [CHECK] and edit accordingly.
     Note - 'DELETE' means delete the element or attribute, not just the contents.
     
     Documentation is at https://authors.ietf.org/en/templates-and-schemas
-->
<?xml-model href="rfc7991bis.rnc"?>  <!-- Required for schema validation and schema-aware editing -->
<!-- <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> -->
<!-- This third-party XSLT can be enabled for direct transformations in XML processors, including most browsers -->


<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<!-- If further character entities are required then they should be added to the DOCTYPE above.
     Use of an external entity file is not recommended. -->

<rfc
  xmlns:xi="http://www.w3.org/2001/XInclude"
  category="std"
  docName="draft-buethe-opus-speech-coding-enhancement-00"
  ipr="trust200902"
  obsoletes=""
  updates="6716"
  submissionType="IETF"
  xml:lang="en"
  version="3">
<!-- [REPLACE] 
       * docName with name of your draft
     [CHECK] 
       * category should be one of std, bcp, info, exp, historic
       * ipr should be one of trust200902, noModificationTrust200902, noDerivativesTrust200902, pre5378Trust200902
       * updates can be an RFC number as NNNN
       * obsoletes can be an RFC number as NNNN 
-->

  <front>
    <title abbrev="Opus Speech Coding Enhancement">Integration of Speech Codec Enhancement Methods into the Opus Codec</title>
    <!--  [REPLACE/DELETE] abbrev. The abbreviated title is required if the full title is longer than 39 characters -->

    <seriesInfo name="Internet-Draft" value="draft-buethe-opus-speech-coding-enhancement-00"/>
   
    <author fullname="Jan" initials="J." role="editor" surname="Buethe">
      <organization>Amazon</organization>
      <address>
        <postal>
          <country>DE</country>
          <!-- Uses two letter country code -->
        </postal>        
        <email>jbuethe@amazon.com</email> 
      </address>
    </author>

    <author fullname="Jean-Marc" initials="J.-M." surname="Valin">
      <organization>Amazon</organization>
      <address>
        <postal>
          <country>CA</country>
          <!-- Uses two letter country code -->
        </postal>        
        <email>jmvalin@amazon.com</email>  
      </address>
    </author>

    <date year="2023"/>
    <!-- On draft subbmission:
         * If only the current year is specified, the current day and month will be used.
         * If the month and year are both specified and are the current ones, the current day will
           be used
         * If the year is not the current one, it is necessary to specify at least a month and day="1" will be used.
    -->

    <area>Applications and Real-Time</area>
    <workgroup>Internet Engineering Task Force</workgroup>
    <!-- "Internet Engineering Task Force" is fine for individual submissions.  If this element is 
          not present, the default is "Network Working Group", which is used by the RFC Editor as 
          a nod to the history of the RFC Series. -->

    <keyword>Opus, RFC6716</keyword>
    <!-- [REPLACE/DELETE]. Multiple allowed.  Keywords are incorporated into HTML output files for 
         use by search engines. -->

    <abstract>
      <t>This document proposes a method for integrating a speech codec enhancement method into the Opus codec <xref target="RFC6716"/></t>
    </abstract>
 
  </front>

  <middle>
    
    <section>
      <name>Introduction</name>
      <t> Since the specification of the original Opus codec <xref target="RFC6716"/> new data-driven speech codec enhancement methods emerged which outperform classical enhancement methods by a large margin.
          This document proposes a method to integrate such enhancement methods into the Opus decoder including a set of requirements that ensure
      </t>
      <ol type="(%d)">
        <li>consistent performance of the enhancement method itself, </li>
        <li>preservation of decoder performance (e.g. seemless mode switching), and</li>
        <li>preservation of basic interoperability when tuning the Opus encoder for use with the enhanced decoder.</li>
      </ol>
       <t>
          The document furthermore contains a description of the linear-adaptive coding enhancer (LACE) and its integration into the Opus decoder as an illustrative example.
      </t>
      
      <section>
        <name>Requirements Language</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
          "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT
          RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
          interpreted as described in BCP 14 <xref target="RFC2119"/>
          <xref target="RFC8174"/> when, and only when, they appear in
          all capitals, as shown here.</t>
      </section>
      <!-- [CHECK] The 'Requirements Language' section is optional -->

    </section>

    <section anchor="LACE">
      <name>An illustrative Example</name>
      <t>
        We use the linear-adaptive coding enhancer (LACE) <xref target="lace-paper"/> as an illustrative example to highlight the specific challenges of integrating a speech codec enhancement method into the Opus decoder.
         LACE is trained to enhance the output signal of the SILK decoder, the speech coding mode of Opus, and <xref target="opus-with-lace"/> depicts a high-level overview of the Opus decoder with LACE added as
         an enhancement model.
      </t>
      <t>
        The first requirement for a speech coding enhancement method concerns the performance of the method itself. In this example it relates to the question how the SILK decoder output compares to the LACE output.
        In <xref target="lace-paper"/> this has been evaluated on clean speech samples using a P.808 listening test as well as the objective method PESQ, which showed consistent improvement for all tested bitrates.
        For a general enhancement method it will be necessary to specify testing material and performance criteria to prevent unintended quality degradation of the Opus codec.
      </t>
      <t>
        The second requirement concerns performance of the Opus decoder as a whole. Depending on the bitstream the decoder may have to perform mode switching, e.g. between SILK and CELT, or it may combine the SILK and CELT outputs
        when the codec operates in hybrid mode. Changes to the SILK output signal by and enhancement method, such as added delay, phase shifts, or lever alterations can therefore negatively impact the performance
        of the Opus decoder even if the first requirement is met. LACE solves this problem by adding no delay and by being approximately phase and level preserving. However, since many enhancement methods are non causal and
        non phase preserving, these requirements may be too strict for a general enhancement method.
      </t>

      <t>
        The third requirement concerns interoperability. The Opus specification provides significant freedom for tuning the encoder and the presence of an enhancement method in the decoder may change the optimal encoding choices
        significantly. In the present example encoding e.g. wideband content at 6 kb/s still leads to fair-to-good quality when using then LACE-enhanced decoder while the quality of a legacy decoder is significantly worse.
        To make full use of these new enhancement methods such encoder tunings should be allowed but basic interoperability with legacy decoders or other enhanced decoders needs to be ensured.
      </t>

      <figure anchor="opus-with-lace">
        <name>A simplified Opus decoder diagram including LACE as enhancement module</name>
        <artwork type="ascii-art" name="lace.txt">
      <![CDATA[
                 ┌──────────────────────────────┐
                 │           Bitstream          │
                 └─────┬──────────────────┬─────┘
                       │                  │
                       ▼                  ▼
                 ┌───────────┐      ┌───────────┐
                 │   CELT    │      │   SILK    │
                 │  decoder  │      │  decoder  │
                 └─────┬─────┘      └─────┬─────┘
                       │                  │
                       │                  ▼
                       │            ┌───────────┐
                       │            │   LACE    │
                       │            └─────┬─────┘
                       │                  │
                       │                  ▼
                       │            ┌───────────┐
                       │            │ Resampler │
                       │            └─────┬─────┘
                       │                  │
                       ▼                  ▼
                 ┌──────────────────────────────┐
                 │        Mode Handling         │
                 └──────────────┬───────────────┘
                                │
                                ▼
                         decoded  signal
      ]]>
        </artwork>
      </figure>

      <t>
        
      </t>
    </section>

    <section anchor="enhancement">
      <name>Requirements on the Enhancement Method</name>
      <t>
        TBD
      </t>
    </section>

    <section anchor="decoder">
      <name>Requirements for Opus Decoder Integration</name>
      <t>
        TBD
      </t>
    </section>

    <section anchor="interop">
      <name>Interoperability</name>
      <t>
        TBD
      </t>
    </section>

    <section anchor="IANA">
    <!-- All drafts are required to have an IANA considerations section. See RFC 8126 for a guide.-->
      <name>IANA Considerations</name>
      <t>The decoder should be able to signal the presence of an enhancement method to the encoder over SDP. The exact mechanism is TBD and the following options are
      open for discussion.
      </t>
      <ol type="(%d)">
        <li> update audio/opus media type registration <xref target="RFC7587"/> to include a parameter speech_enhancement with possible values 0 and 1</li>
        <li> assign an extension ID, e.g. 33, from the registry defined in <xref target="opus-extension"/> to implement speech coding enhancment. This has the
        advantage of a double use, meaning the extension ID can both be used to signal the decoder capability to the encoder and for transmitting side information
        to guide a speech enhancment method from the encoder to the decoder. However, it needs to be proven that side information is useful. </li>
        <li> update <xref target="opus-extension"/> to include extension IDs beyond 127 for data-less extensions </li>
      </ol>
    </section>
    
    <section anchor="Security">
      <!-- All drafts are required to have a security considerations section. See RFC 3552 for a guide. -->
      <name>Security Considerations</name>
      <t>TBD</t>
    </section>
    
    <!-- NOTE: The Acknowledgements and Contributors sections are at the end of this template -->
  </middle>

  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.8174.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.6716.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.7587.xml"/>
        <!-- The recommended and simplest way to include a well known reference -->
        <reference anchor="opus-extension">
        <!-- Manually added reference -->
          <front>
            <title>Extension Formatting for the Opus Codec (draft-valin-opus-extension)</title>
            <author initials="J.-M." surname="Valin" fullname="Jean-Marc Valin">
              <organization/>
            </author>
            <date year="2023" month="April"/>
            <abstract>
              <t>Opus extension format.
              </t>
            </abstract>
          </front>
        </reference>
      </references>
 
      <references>
        <name>Informative References</name>    
        <reference anchor="lace-paper">
          <front>
            <title>LACE: A light-weight, causal Model for enhancing coded Speech through Adaptive Convolutions</title>
            <author initials="J." surname="Buethe"/>
            <author initials="J.-M." surname="Valin"/>
            <author initials="A." surname="Mustafa"/>
            <date year="2023"/>
          </front>
        </reference>
      </references>
    </references>
    

 </back>
</rfc>
