<?xml version="1.0" encoding="US-ASCII"?>
<!-- vim: set et ts=2 sw=2 : -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC3629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml">
<!ENTITY RFC5234 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5234.xml">
<!ENTITY IEEE.754.1985 SYSTEM "https://xml2rfc.tools.ietf.org/public/rfc/bibxml-ieee/reference.IEEE.754.1985.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc strict="yes" ?>
<?rfc toc="yes"?>
<?rfc tocdepth="4"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<rfc category="info" docName="draft-devault-bare-00" ipr="trust200902">
  <front>
    <title abbrev="BARE">Binary Application Record Encoding (BARE)</title>
    <author fullname="Drew DeVault" initials="D." surname="DeVault">
      <organization>SourceHut</organization>
      <address>
        <postal>
          <street>454 E. Girard Ave #2R</street>
          <city>Philadelphia</city>
          <region>PA</region>
          <code>19125</code>
          <country>US</country>
        </postal>
        <phone>+1 719 213 5473</phone>
        <email>sir@cmpwn.com</email>
      </address>
    </author>

    <date year="2020" />
  
    <area>General</area>
    <workgroup>Internet Engineering Task Force</workgroup>
    <keyword>encoding</keyword>
    <keyword>bare</keyword>
  
    <abstract>
      <t>
        The Binary Application Record Encoding (BARE) is a data format used to
        represent application records for storage or transmission between
        programs. BARE messages are concise and have a well-defined schema, and
        implementations may be simple and broadly compatible. A schema language
        is also provided to express message schemas out-of-band.
     </t>
    </abstract>

    <note title="Comments">
      <t>
        Comments are solicited and should be addressed to the mailing list at
        ~sircmpwn/public-inbox@lists.sr.ht and/or the author(s).
      </t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>
        The purpose of the BARE message encoding, like hundreds of others, is to
        encode application messages. The goals of such encodings vary (leading to
        their proliferation); BARE's goals are the following:
      </t>
      <t>
        <list style="symbols">
          <t>Concise messages</t>
          <t>A well-defined message schema</t>
          <t>Broad compatibility with programming environments</t>
          <t>Simplicity of implementation</t>
        </list>
  
      </t>
      <t>
        This document specifies the BARE message encoding, as well as a schema
        language which may be used to describe the layout of a BARE message. The
        schema of a message must be agreed upon in advance by each party
        exchanging a BARE message; message structure is not encoded into the
        representation. The schema language is useful for this purpose, but not
        required.
      </t>
  
      <section title="Terminology">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
        document are to be interpreted as described in <xref
        target="RFC2119">RFC 2119</xref>.</t>
      </section>
    </section>

    <section title="Specification of the BARE Message Encoding">
      <t>
        A BARE message is a single value of a pre-defined type, which may be
        of an aggregate type enclosing multiple values. Unless otherwise
        specified there is no additional container or structure around the
        value; it is encoded plainly.
      </t>
      <t>
        A BARE message does not necessarily have a fixed length, but the schema
        author may make a deliberate choice to constrain themselves to types of
        well-defined lengths if this is desired.
      </t>
      <t>
        The names for each type are provided to establish a vocabulary for
        describing a BARE message schema out-of-band, by parties who plan to
        exchange BARE messages. The type names used here are provided for this
        informative purpose, but are more rigourously specified by the schema
        language specification in <xref target="schema_language" />.
      </t>

      <section title="Primitive Types">
        <t>
          Primitive types represent exactly one value.
        </t>

        <t>
          <list hangIndent="8" style="hanging">
            <t hangText="uint">
              An unsigned integer with a variable-length encoding. Each octet of
              the encoded value has the most-significant bit set, except for the
              last octet. The remaining bits are the integer value in 7-bit
              groups, least-significant first.
              <vspace blankLines="1" />
              The maximum precision of such a number is 64-bits. The maximum
              length of an encoded uint is therefore 10 octets.
            </t>

            <t hangText="int">
              A signed integer with a variable-length encoding. Signed integers
              are represented as uint using a "zig-zag" encoding: positive
              values x are written as 2x + 0, negative values are written as
              2(^x) + 1. In other words, negative numbers are complemented and
              whether to complement is encoded in bit 0.
              <vspace blankLines="1" />
              The maximum precision of such a number is 64-bits. The maximum
              length of an encoded int is therefore 10 octets.
            </t>

            <t hangText="u8, u16, u32, u64">
              <vspace blankLines="1" />
              Unsigned integers of a fixed precision, respectively 8, 16, 32,
              and 64 bits. They are encoded in little-endian (least
              significant octet first).
            </t>

            <t hangText="i8, i16, i32, i64">
              <vspace blankLines="1" />
              Signed integers of a fixed precision, respectively 8, 16, 32, and
              64 bits. They are encoded in little-endian (least significant
              octet first), with two's compliment notation.
            </t>

            <t hangText="f32, f64">
              Floating-point numbers represented with the <xref
              target="IEEE.754.1985">IEEE 754</xref> binary32 and binary64
              floating point number formats.
            </t>

            <t hangText="bool">
              A boolean value, either true or false, encoded as a u8 type with
              a value of one or zero, respectively representing true or false.
              <vspace blankLines="1" />
              If a value other than one or zero is found in the u8
              representation of the bool, the message is considered invalid,
              and the decoder SHOULD raise an error if it encounters such a
              value.
            </t>

            <t hangText="enum">
              An unsigned integer value from a set of possible values agreed
              upon in advance, encoded with the uint type.
              <vspace blankLines="1" />
              An enum whose uint value is not a member of the values agreed
              upon in advance is considered invalid, and the decoder SHOULD
              raise an error if it encounters such a value.
              <vspace blankLines="1" />
              Note that this makes the enum type unsuitable for representing a
              several enum values which have been combined with a bitwise OR
              operation.
            </t>

            <t hangText="string">
              A string of text. The length of the text in octets is encoded
              first as a uint, followed by the text data represented with the
              <xref target="RFC3629">UTF-8 encoding</xref>.
              <vspace blankLines="1" />
              If the data is found to contain invalid UTF-8 sequences, it is
              considered invalid, and the decoder SHOULD raise an error if it
              encounters such a value.
            </t>

            <t hangText="data&lt;length&gt;">
              Arbitrary data with a fixed "length" in octets, e.g.
              data&lt;16&gt;. The data is encoded literally in the message, and
              MUST NOT be greater than 18,446,744,073,709,551,615 octets in
              length (the maximum value of a u64).
            </t>

            <t hangText="data">
              Arbitrary data of a variable length in octets. The length is
              encoded first as a uint, followed by the data itself encoded
              literally.
            </t>

            <t hangText="void">
              A type with zero length. It is not encoded into BARE messages.
            </t>
          </list>
        </t>
      </section>

      <section title="Aggregate Types">
        <t>
          Aggregate types may store zero or more primitive or aggregate values.
        </t>

        <t>
          <list hangIndent="8" style="hanging">
            <t hangText="optional&lt;type&gt;">
              <vspace blankLines="1" />
              A value of "type" which may or may not be present, e.g.
              optional&lt;u32&gt;. Represented as either a u8 with a value of
              zero, indicating that the optional value is unset; or a u8 with a
              value of one, followed by the encoded data of the optional type.
              <vspace blankLines="1" />
              An optional value whose initial u8 is set to a number other than
              zero or one is considered invalid, and the decoder SHOULD raise
              an error if it encounters such a value.
            </t>

            <t hangText="[length]type">
              <vspace blankLines="1" />
              A list of "length" values of "type", e.g. [10]uint. The length is
              not encoded into the message. The encoded values of each member of
              the list are concatenated to form the encoded list.
            </t>

            <t hangText="[]type">
              A variable-length list of values of "type", e.g. []string. The
              length of the list (in values) is encoded as a uint, followed by
              the encoded values of each member of the list concatenated.
            </t>

            <t hangText="map[type A]type B">
              An associative list of values of type B keyed by values of type A,
              e.g. map[u32]string. The encoded representation of a map begins
              with the number of key/value pairs as a uint, followed by the
              encoded key/value pairs concatenated. Each key/value pair is
              encoded as the encoded key concatenated with the encoded value.
              <vspace blankLines="1" />
              A message with repeated keys is considered invalid, and the
              decoder SHOULD raise an error if it encounters such a value.
            </t>

            <t hangText="(type | type | ...)">
              A tagged union whose value may be one of any type from a set of
              types, e.g. (int | uint | string). Each type in the set is
              assigned a numeric identifier.  The value is encoded as the
              selected type's identifier represented with the uint encoding,
              followed by the encoded value of that type.
              <vspace blankLines="1" />
              A union with a tag value that does not have a corresponding type
              assigned is considered invalid, and the decoder SHOULD raise an
              error if it encounters such a value.
            </t>

            <t hangText="struct">
              A set of values of arbitrary types, concatenated in an order
              agreed upon in advance. Each value is referred to as a "field",
              and field has a name and type.
            </t>
          </list>
        </t>
      </section>

      <section title="User-Defined Types">
        <t>
          A user-defined type gives a name to another type. This creates a
          distinct type whose representation is equivalent to the named type.
          An arbitrary number of user-defined types may be used for the same
          underlying type; each is distinct from the other.
        </t>
      </section>

      <section title="Invariants">
        <t>
          The following invariants are specified:
          <list style="symbols">
            <t>
              Any type which is ultimately a void type (either directly or via a
              user-defined type) MUST NOT be used as an optional type, struct
              member, list member, map key, or map value. Void types may only be
              used as members of the set of types in a tagged union.
            </t>
            <t>The lengths of fixed-length arrays and data types MUST be at least one.</t>
            <t>Structs MUST have at least one field.</t>
            <t>Unions MUST have at least one type, and each type MUST NOT be repeated.</t>
            <t>Map keys MUST be of a primitive type which is not data or data&lt;length&gt;.</t>
            <t>Each named value of an enum type MUST have a unique value.</t>
          </list>
        </t>
      </section>
    </section>

    <section anchor="schema_language" title="BARE Schema Language Specification">
      <t>
        The use of the schema language is optional. Implementations SHOULD
        support decoding arbitrary BARE messages without a schema document, by
        defining the schema in a manner which utilizes more native tools
        available from the programming environment.
      </t>
      <t>
        However, it may be useful to have a schema document for use with code
        generation, documentation, or interoperability. A domain-specific
        language is provided for this purpose.
      </t>

      <section title="Lexical Analysis">
        <t>
          During lexical analysis, "#" is used for comments; if encountered,
          the "#" character and any subsequent characters are discarded until a
          line feed (%x0A) is found.
        </t>
      </section>

      <section title="ABNF Grammar">
        <figure>
          <preamble>
            The syntax of the schema language is provided here in
            <xref target="RFC5234">Augmented Backus-Naur form</xref>.
            However, this grammar differs from <xref target="RFC5234" /> in
            that strings are case-sensitive (e.g. "type" does not match TypE).
          </preamble>
   
          <artwork type="abnf"><![CDATA[schema = [WS] user-types [WS]

user-type =  "type" WS user-type-name WS non-enum-type
user-type =/ "enum" WS user-type-name WS enum-type
user-types = user-type / (user-types WS user-type)

type            = non-enum-type / enum-type
non-enum-type   = primitive-type / aggregate-type / user-type-name

user-type-name  = UPPER *(ALPHA / DIGIT) ; First letter is uppercase

primitive-type  =  "int" / "i8"  / "i16" / "i32" / "i64"
primitive-type  =/ "uint" / "u8"  / "u16" / "u32" / "u64"
primitive-type  =/ "f32" / "f64"
primitive-type  =/ "bool"
primitive-type  =/ "string"
primitive-type  =/ "data" / ("data<" integer ">")
primitive-type  =/ "void"

enum-type       =  "{" [WS] enum-values [WS] "}"
enum-values     =  enum-value / (enum-values WS enum-value)
enum-value      =  enum-value-name
enum-value      =/ (enum-value-name [WS] "=" [WS] integer)
enum-value-name =  UPPER *(UPPER / DIGIT / "_")

aggregate-type  =  optional-type
aggregate-type  =/ array-type
aggregate-type  =/ map-type
aggregate-type  =/ union-type
aggregate-type  =/ struct-type

optional-type   = "optional<" type ">"

array-type      = "[" [integer] "]" type
integer         = 1*DIGIT

map-type        = "map[" type "]" type

union-type      =  "(" union-members ")"
union-members   =  union-member
union-members   =/ (union-members [WS] "|" [WS] union-member)
union-member    =  type [[WS] "=" [WS] integer]

struct-type     = "{" [WS] fields [WS] "}"
fields          = field / (fields WS field)
field           = 1*ALPHA [WS] ":" [WS] type

UPPER           = %x41-5A ; uppercase ASCII letters
ALPHA           = %x41-5A / %x61-7A ; A-Z / a-z
DIGIT           = %x30-39 ; 0-9

WS              = 1*(%x0A / %x09 / " ") ; whitespace]]></artwork>
        </figure>

        <t>
          See <xref target="appx_example_schema" /> for an example schema
          written in this language.
        </t>
      </section>

      <section title="Semantic Elements">
        <t>
          The names of fields and user-defined types are informational: they are
          not represented in BARE messages. They may be used by code generation
          tools to inform the generation of field and type names in the native
          programming environment.
        </t>
        <t>
          Enum values are also informational. Values without an integer token
          are assigned automatically in the order that they appear, starting
          from zero and incrementing for each subsequent unassigned value. If a
          value is explicitly specified, automatic assignment continues from
          that value plus one for subsequent enum values.
        </t>
        <t>
          Union type members are assigned a tag in the order that they appear,
          starting from zero and incrementing for each subsequent type. If a tag
          value is explicitly specified, automatic assignment continues from
          that value plus one for subsequent values.
        </t>
      </section>
    </section>

    <section title="Application Considerations">
      <t>
        Message authors who wish to design a schema which is backwards- and
        forwards-compatible with future messages are encouraged to use union
        types for this purpose. New types may be appended to the members of a
        union type while retaining backwards compatibility with older message
        types. The choice to do this must be made from the first message
        version&mdash; moving a struct into a union <spanx style="emph">does
        not</spanx> produce a backwards-compatible message.
      </t>

      <figure>
        <preamble>The following schema provides an example:</preamble>
        <artwork><![CDATA[type Message (MessageV1 | MessageV2 | MessageV3)

type MessageV1 ...

type MessageV2 ...

type MessageV3 ...]]>
        </artwork>
        <postamble>
          An updated schema which adds a MessageV4 type would still be able to
          decode versions 1, 2, and 3.
        </postamble>
      </figure>

      <t>
        If a message version is later deprecated, it may be removed in a manner
        compatible with future versions 2 and 3 if the initial tag is specified
        explicitly.
      </t>

      <figure>
        <artwork><![CDATA[type Message (MessageV2 = 1 | MessageV3)]]>
        </artwork>
      </figure>
    </section>

    <section title="Future Considerations">
      <t>
        To ensure message compatibility between implementations and backwards-
        and forwards-compatibility of messages, constraints on vendor extensions
        are required. This specification is final, and new types or extensions
        will not be added in the future. Implementors MUST NOT define
        extensions to this specification.
      </t>
      <t>
        To support the encoding of novel data structures, the implementor SHOULD
        make use of user-defined types in combination with the data or
        data&lt;length&gt; types.
      </t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This memo includes no request to IANA.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>
        Message parsers are common vectors for security vulnerabilities. BARE
        addresses this by making the message format as simple as possible.
        However, the parser MUST be prepared to handle a number of error cases
        when decoding untrusted messages, such as a union type with an invalid
        tag, or an enum with an invalid value. Such errors may also arise by
        mistake, for example when attempting to decode a message with the wrong
        schema.
      </t>
      <t>
        Support for data types of an arbitrary, message-defined length (lists,
        maps, strings, etc) is commonly exploited to cause the implementation to
        exhaust its resources while decoding a message. However, legitimate
        use-cases for extremely large data types (possibly larger than the
        system has the resources to store all at once) do exist. The decoder
        MUST manage its resources accordingly, and SHOULD provide the
        application a means of providing their own decoder implementation for
        values which are expected to be large.
      </t>
      <t>
        There is only one valid interpretation of a BARE message for a given
        schema, and different decoders and encoders should be expected to
        provide that interpretation. If an implementation has limitations
        imposed from the programming environment (such as limits on numeric
        precision), the implementor MUST document these limitations, and prevent
        conflicting interpretations from causing undesired behavior.
      </t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?-->
      &RFC2119;

      &RFC5234;

      &RFC3629;

      &IEEE.754.1985;
    </references>

    <section anchor="appx_example_schema" title="Example message schema">
      <figure>
        <preamble>
          The following is an example of a schema written in the BARE schema
          language.
        </preamble>
 
        <artwork><![CDATA[type PublicKey data<128>
type Time string # ISO 8601

enum Department {
  ACCOUNTING
  ADMINISTRATION
  CUSTOMER_SERVICE
  DEVELOPMENT

  # Reserved for the CEO
  JSMITH = 99
}

type Customer {
  name: string
  email: string
  address: Address
  orders: []{
    orderId: i64
    quantity: i32
  }
  metadata: map[string]data
}

type Employee {
  name: string
  email: string
  address: Address
  department: Department
  hireDate: Time
  publicKey: optional<PublicKey>
  metadata: map[string]data
}

type TerminatedEmployee void

type Person (Customer | Employee | TerminatedEmployee)

type Address {
  address: [4]string
  city: string
  state: string
  country: string
}]]></artwork>
      </figure>
    </section>

    <section title="Example Messages">
      <t>
        Some basic example messages in hexadecimal are provided for the schema
        specified in <xref target="appx_example_schema" />.
      </t>

      <t>
        A "Person" value of type "Customer" with the following values:
        <list hangIndent="12" style="hanging">
          <t hangText="name">James Smith</t>
          <t hangText="email">jsmith@example.org</t>
          <t hangText="address">
            123 Main Street; Philadelphia; PA; United States
          </t>
          <t hangText="orders (1)">
            orderId: 4242424242; quantity: 5
          </t>
          <t hangText="metadata">(unset)</t>
        </list>
        <figure>
          <preamble>Encoded BARE message:</preamble>
          <artwork><![CDATA[00 0b 4a 61 6d 65 73 20 53 6d 69 74 68 12 6a 73
6d 69 74 68 40 65 78 61 6d 70 6c 65 2e 6f 72 67
0b 31 32 33 20 4d 61 69 6e 20 53 74 00 00 00 0c
50 68 69 6c 61 64 65 6c 70 68 69 61 02 50 41 0d
55 6e 69 74 65 64 20 53 74 61 74 65 73 01 b2 41
de fc 00 00 00 00 05 00 00 00 00]]></artwork>
        </figure>
      </t>

      <t>
        A "Person" value of type "Employee" with the following values:
        <list hangIndent="12" style="hanging">
          <t hangText="name">Tiffany Doe</t>
          <t hangText="email">tiffanyd@acme.corp</t>
          <t hangText="address">
            123 Main Street; Philadelphia; PA; United States
          </t>
          <t hangText="department">ADMINISTRATION</t>
          <t hangText="hireDate">2020-06-21T21:18:05Z</t>
          <t hangText="publicKey">(unset)</t>
          <t hangText="metadata">(unset)</t>
        </list>
        <figure>
          <preamble>Encoded BARE message:</preamble>
          <artwork><![CDATA[01 0b 54 69 66 66 61 6e 79 20 44 6f 65 12 74 69
66 66 61 6e 79 64 40 61 63 6d 65 2e 63 6f 72 70
0b 31 32 33 20 4d 61 69 6e 20 53 74 00 00 00 0c
50 68 69 6c 61 64 65 6c 70 68 69 61 02 50 41 0d
55 6e 69 74 65 64 20 53 74 61 74 65 73 01 19 32
30 32 30 2d 30 36 2d 32 31 54 32 31 3a 31 38 3a
30 35 2b 30 30 3a 30 30 00 00]]></artwork>
        </figure>
      </t>

      <t>
        A "Person" value of type "TerminatedEmployee":
        <figure>
          <preamble>Encoded BARE message:</preamble>
          <artwork><![CDATA[02]]></artwork>
        </figure>
      </t>

    </section>
  </back>
</rfc>
