<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude"
     category="std"
     docName="draft-trans-pages-01"
     ipr="trust200902"
     obsoletes=""
     updates=""
     submissionType="IETF"
     xml:lang="en"
     tocInclude="true"
     tocDepth="4"
     symRefs="true"
     sortRefs="true"
     version="3">

  <front>
    <title abbrev="CT Pages Extension">Certificate Transparency Pages Extension</title>
    <seriesInfo name="Internet-Draft" value="draft-trans-pages-01"/>
    
    <author fullname="Pierre Barre" initials="PB." surname="Pierre" role="editor">
      <organization>Merklemap</organization>
      <address>
        <postal>
          <street>320 rue saint honoré</street>
          <city>Paris</city>
          <code>75001</code>
          <country>FR</country>
        </postal>
        <email>pierre@barre.sh</email>
      </address>
    </author>
    
    <date year="2025" month="June" day="28"/>
    
    <area>Security</area>
    
    <keyword>certificate transparency</keyword>
    <keyword>TLS</keyword>
    <keyword>PKI</keyword>
    
    <abstract>
      <t>This document specifies an extension to RFC 6962 Certificate
      Transparency (CT) logs that enables efficient caching and batch
      retrieval through page-based access patterns. The extension
      introduces a binary format that eliminates base64 encoding overhead
      and certificate chain duplication while maintaining full backward
      compatibility with existing RFC 6962 implementations.</t>
    </abstract>
  </front>
  
  <middle>
    <section numbered="true" toc="default">
      <name>Introduction</name>
      <t>Certificate Transparency (CT) <xref target="RFC6962"/> provides a framework for
      publicly logging the existence of Transport Layer Security (TLS)
      certificates as they are issued or observed. The current
      specification defines a "get-entries" endpoint that accepts
      arbitrary start and end parameters for retrieving log entries.</t>
      
      <section numbered="true" toc="default">
        <name>Motivation</name>
        <t>The current RFC 6962 design presents several challenges:</t>
        <ul spacing="normal">
          <li>Arbitrary range requests make caching difficult or impossible,
          as responses for overlapping ranges cannot be efficiently cached
          or reused.</li>
          <li>Base64 encoding of certificates adds approximately 33% overhead
          to response sizes.</li>
          <li>Certificate chains are duplicated in full for each entry, even
          when many entries share the same intermediate certificates.</li>
          <li>Variable response sizes complicate client implementation and
          server resource planning.</li>
        </ul>
        <t>This extension addresses these issues by introducing fixed-size
        pages with an efficient binary format, while maintaining full
        backward compatibility with existing CT infrastructure.</t>
      </section>
      
      <section numbered="true" toc="default">
        <name>Requirements Language</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in
        BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they appear in all
        capitals, as shown here.</t>
      </section>
    </section>
    
    <section numbered="true" toc="default">
      <name>Page-Based Entry Retrieval</name>
      <t>This extension introduces a page-based mechanism for retrieving log
      entries in fixed-size batches.</t>
      
      <section numbered="true" toc="default">
        <name>Request Format</name>
        <t>Clients request pages using the following HTTP GET request:</t>
        <artwork><![CDATA[
GET /ct-pages/v1/page/{page_number}
]]></artwork>
        <t>Where:</t>
        <ul spacing="normal">
          <li>page_number: A non-negative integer representing the zero-indexed
          page number.</li>
        </ul>
        <t>Pages accessed by number ALWAYS contain exactly page_size entries.
        Partial pages (those with fewer than page_size entries) are not
        accessible via numbered endpoints until they become complete.</t>
      </section>
      
      <section numbered="true" toc="default">
        <name>Response Format</name>
        
        <section numbered="true" toc="default">
          <name>HTTP Headers</name>
          <t>Successful responses include the following HTTP headers:</t>
          <artwork><![CDATA[
Content-Type: application/octet-stream
Cache-Control: public, max-age=31536000, immutable
]]></artwork>
          <t>For a partially filled page:</t>
          <artwork><![CDATA[
Cache-Control: no-store
]]></artwork>
          <t>Where:</t>
          <ul spacing="normal">
            <li>Cache-Control: Complete pages (those containing exactly page_size entries)
            are immutable and can be cached indefinitely. Partial pages (those
            containing fewer than page_size entries) must not be cached as they
            may receive additional entries.</li>
          </ul>
        </section>
        
        <section numbered="true" toc="default">
          <name>Binary Response Structure</name>
          <t>The response body uses the following binary format, expressed using
          the TLS presentation language from <xref target="RFC8446"/>:</t>
          <sourcecode type="tls-presentation"><![CDATA[
enum { v1(0), (255) } Version;

struct {
    Version format_version;
    uint64 entry_count;
    uint64 first_entry_index;
    PageEntry entries[entry_count];
} EntriesPage;

struct {
    TimestampedEntry timestamped_entry;
    uint16 chain_length;
    opaque issuer_hashes[chain_length][32];
} PageEntry;
]]></sourcecode>
          <t>Where:</t>
          <ul spacing="normal">
            <li>format_version: Set to v1(0) for this specification.</li>
            <li>entry_count: The number of entries in this page.</li>
            <li>first_entry_index: The log index of the first entry in this page.</li>
            <li>timestamped_entry: The TimestampedEntry structure as defined in
            Section 3.4 of <xref target="RFC6962"/>.</li>
            <li>chain_length: The number of certificates in the chain (excluding
            the leaf certificate, which is included in timestamped_entry).</li>
            <li>issuer_hashes: SHA-256 hashes of the DER-encoded issuer
            certificates in the chain, ordered from the leaf's issuer to
            the root.</li>
          </ul>
        </section>
      </section>
      
      <section numbered="true" toc="default">
        <name>Certificate Resolution</name>
        <t>To retrieve the actual certificate data for entries in the chain,
        clients make separate requests:</t>
        <artwork><![CDATA[
GET /ct-pages/v1/certificate/{base64url_sha256_hash}
]]></artwork>
        <t>Where base64url_sha256_hash is the base64url encoding (without
        padding) of the SHA-256 hash of the certificate.</t>
        <t>Successful responses return:</t>
        <artwork><![CDATA[
Content-Type: application/pkix-cert
Cache-Control: public, max-age=31536000, immutable

[binary certificate data]
]]></artwork>
        <t>This mechanism allows efficient deduplication of commonly used
        intermediate certificates across many log entries.</t>
      </section>
      
      <section numbered="true" toc="default">
        <name>Latest Page Retrieval</name>
        <t>To retrieve the highest-numbered page containing the most recent entries,
        clients make the following request:</t>
        <artwork><![CDATA[
GET /ct-pages/v1/latest
]]></artwork>
        <t>The response uses the same binary format as regular page requests
        (see Section 2.2.2), with the following HTTP headers:</t>
        <artwork><![CDATA[
Content-Type: application/octet-stream
Cache-Control: no-store
]]></artwork>
        <t>This endpoint returns the current page being filled with entries,
        which may be partial (containing fewer than page_size entries) or
        complete (containing exactly page_size entries). This page does not
        have a page number until it becomes complete.</t>
        <t>Note: There may be temporary duplication between the /ct-pages/v1/latest
        endpoint and the highest numbered page endpoint in two scenarios:</t>
        <ul spacing="normal">
          <li>When a page has just been completed (exactly page_size entries)</li>
          <li>During the transition when a completed page is being assigned a number</li>
        </ul>
        <t>Clients MUST handle potential duplication by using the first_entry_index
        and entry_count fields to identify unique pages. Clients MUST NOT cache
        responses from this endpoint as the content may change as new entries
        are added to the log.</t>
      </section>
      
      <section numbered="true" toc="default">
        <name>Discovery Mechanism</name>
        <t>Logs implementing this extension MUST provide a discovery endpoint:</t>
        <artwork><![CDATA[
GET /ct-pages/v1/discover
]]></artwork>
        <t>The response is a JSON object:</t>
        <artwork><![CDATA[
Content-Type: application/json

{
  "page_size": 1000,
  "static_endpoint": "https://static.example.com",
  "last_page_at_static": false
}
]]></artwork>
        <t>Where:</t>
        <ul spacing="normal">
          <li>page_size: The fixed number of entries per complete page.</li>
          <li>static_endpoint: (OPTIONAL) An alternative base URL for fetching
          pages and certificates. If provided, clients MUST use this
          endpoint for all complete page and certificate requests. The same
          path structure is used as with the main log server (e.g., if
          static_endpoint is "https://static.example.com", then page 0 would
          be fetched from "https://static.example.com/ct-pages/v1/page/0").
          This field SHOULD be omitted if the static content is served from
          the same host as the log endpoints.</li>
          <li>last_page_at_static: (OPTIONAL) A boolean indicating whether the
          latest page should be fetched from the static_endpoint (true) or
          from the main log server (false). Defaults to false if not specified.
          This field MUST NOT be present if static_endpoint is not provided.</li>
        </ul>
      </section>
    </section>
    
    <section numbered="true" toc="default">
      <name>Backward Compatibility</name>
      <t>This extension maintains full backward compatibility with RFC 6962:</t>
      <ul spacing="normal">
        <li>All original RFC 6962 endpoints remain unchanged and continue
        to function.</li>
        <li>The extension introduces new endpoints under the "/ct-pages/v1/"
        path prefix.</li>
        <li>Clients unaware of this extension continue to work with the
        original endpoints.</li>
        <li>Logs can implement this extension without breaking existing
        clients.</li>
      </ul>
      <t>Unless a separate static_endpoint is specified in the discovery
      response, the pages endpoints MUST be served on the same host as
      the main log endpoints.</t>
    </section>
    
    <section numbered="true" toc="default">
      <name>Operational Considerations</name>
      
      <section numbered="true" toc="default">
        <name>Page Size Stability</name>
        <t>Once a log begins serving pages with a particular page size, it
        MUST NOT change this size. Changing the page size would:</t>
        <ul spacing="normal">
          <li>Invalidate all cached responses</li>
          <li>Break client assumptions about page boundaries</li>
          <li>Complicate client implementation significantly</li>
        </ul>
        <t>Logs SHOULD choose a page size that balances response size with
        the number of requests needed to retrieve large ranges of entries.
        A page size of 1000 entries is RECOMMENDED as a reasonable default.</t>
      </section>
      
      <section numbered="true" toc="default">
        <name>Static Deployment</name>
        <t>Since pages become immutable once they contain page_size entries,
        logs can:</t>
        <ul spacing="normal">
          <li>Pre-generate page files for all complete pages</li>
          <li>Serve pages from static file hosting or CDN infrastructure</li>
          <li>Significantly reduce computational load on log servers</li>
          <li>Improve response times and reliability</li>
        </ul>
        <t>The last_page_at_static flag in the discovery response provides
        deployment flexibility:</t>
        <ul spacing="normal">
          <li>When set to false (default), the latest page is always fetched
          from the main log server, ensuring real-time accuracy while complete
          pages are served from static infrastructure.</li>
          <li>When set to true, even the latest page is served from the static
          endpoint. This requires the static infrastructure to be updated
          frequently but allows for complete offloading of read traffic.</li>
        </ul>
        <t>Clients retrieving the latest page using the /ct-pages/v1/latest
        endpoint MUST respect the last_page_at_static setting when a
        static_endpoint is configured. When last_page_at_static is true,
        the request would be sent to "{static_endpoint}/ct-pages/v1/latest".</t>
      </section>
    </section>
    
    <section numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>This extension does not alter the security properties of Certificate
      Transparency as defined in <xref target="RFC6962"/>. The cryptographic proofs and
      append-only properties of the log remain unchanged.</t>
      <t>Clients MUST verify that certificate hashes match the actual
      certificates retrieved. This ensures that a compromised CDN or
      static hosting provider cannot substitute different certificates.</t>
      <t>The use of SHA-256 for certificate identification is consistent
      with its use in RFC 6962 for Merkle tree operations.</t>
    </section>
    
    <section numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>This document has no IANA actions.</t>
    </section>
  </middle>
  
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6962.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8446.xml"/>
      </references>
      <references>
        <name>Informative References</name>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7231.xml"/>
      </references>
    </references>
  </back>
</rfc>
