<?xml version="1.0" ?>

<!--
xxx make n2t recognize the ?info so that example works with n2t
    ?xxx check with Mark P. that UNT example is ok
xxx mxerxes normalize
xxx remove arks embedded in language due to (a) ineffective rules given
  differences in language-specific orthographic practice (eg, French space
  before ' ?') and (b) there being too few short inflections to reserve that
  are _not_ already used widely for end-of-sentence punctuation.

XXXXXXzzz !! have introduced "shoulder" in ANATOMY but need to define it
zzz DONE
  Julien Raemy: change 12025 in examples to 12345? The FAQ does this
XXXXXXXXXzzz
> From: Mario Xerxes Castelán Castro «Ksenia» via ucop.edu Aug 12, 2019, 3:34 PM
> Thanks for the feedback. According to the intentions of the ARK spec
> (latest draft  https://tools.ietf.org/html/draft-kunze-ark-22), the two
> example ARKs you gave,
> 
>    ark:/12345/a%2db
>    ark:/12345/a-b
> 
> are actually not different, since the first normalizes to become the
> second, and in that turn, normalizes to become
> 
>    ark:/12345/ab
> 
> due to the special case of hyphens (being valid but "identity inert").
> It may be that the spec is unclear. If you agree, would you point out
> where it could be improved?

Hello. Prior to your clarification, my interpretation of the
specification was that percent-encoding special characters within ARKs
could be used to make them non-special (much like %2F can be used in
URIs to avoid it being interpreted as separating path components) and
thus different from special characters that appear in encoded form. Here
is the fragment that suggest that interpretation:

“To keep ARK string variation to a minimum, no reserved ARK characters
should be %-encoded _unless it is deliberately to conceal their reserved
meanings._”

> Regarding your example with ".", I agree with your approach,
> 
>     /My proposal: That “.” in native ARK maps to/
>     /“.” in the corresponding IRI but “%2e” in native ARK maps to
>     “%252e” and/
>     /“%2d” in native ARK maps to “%252d” in the corresponding URI. Under my/
>     /proposal ark:/12345/a%2db and ark:/12345/a-b are still valid and/
>     /different native ARKs and map to different URIs./
> 
> 
> and, again, that was the intention, but it could probably be made more
> explicit. I don't think there's a sentence describing a procedure for
> converting normalized ARKs with embedded %-encoding into URIs.

I see. My interpretation based on the ASCII-art diagram in page 9 of the
current draft <https://tools.ietf.org/html/draft-kunze-ark-22#page-9>
was that ARKs could be converted into URIs by simply appending the NMA
part.

I suggest to define a *normative* algorithm to normalize any ARK, and
then declare that ARKs are equivalent at the syntactic level iff they
have the same normal form. Also to define an algorithm to convert any
ARK to an URI given a NMA that covers corner cases like ARKs with
percent-encoded characters. The first step should be normalizing the
ARK, so that equivalent ARKs lead to equivalent URIs. I would suggest
that this algorithm converts the hexadecimal digits used for
percent-encoding to uppercase so that it always generates canonical URIs
given a canonical NMA (RFC 3986 allows both but prefers uppercase).

-->

<!-- See http://xml.ietf.org/ for formatting tools that can deal with
     this RFC2629 (and beyond) XML format.
     -->

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [

  <!ENTITY mdash '&#8212;' >

  <!ENTITY rfc0854 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.0854.xml'>
  <!ENTITY rfc1034 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.1034.xml'>
  <!ENTITY rfc1321 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.1321.xml'>
  <!ENTITY rfc2141 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2141.xml'>
  <!ENTITY rfc2288 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2288.xml'>
  <!ENTITY rfc2611 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2611.xml'>
  <!ENTITY rfc2616 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2616.xml'>
  <!ENTITY rfc2822 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2822.xml'>
  <!ENTITY rfc2915 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2915.xml'>
  <!ENTITY rfc3986 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3986.xml'>
  <!ENTITY rfc5013 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5013.xml'>

]>

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc strict="yes" ?>
<?rfc comments="no">
<?rfc inline="yes"?>
<?rfc symrefs="yes"?>
<?rfc toc="yes"?>

<rfc category="info" docName="draft-kunze-ark-26" ipr="trust200902"
     submissionType="independent">

 <front>
  <title abbrev="ARK">
   The ARK Identifier Scheme
  </title>

  <author initials="J." surname="Kunze" 
          fullname="John A. Kunze"> 
   <organization>
    California Digital Library 
   </organization>
   <address>
    <postal>
     <street>1111 Franklin Street</street>
     <city>Oakland</city> <region>CA</region>
     <code>94607</code>
     <country>USA</country>
    </postal>
    <email>jak@ucop.edu</email>
   </address>
  </author>

  <author initials="E." surname="Bermès" 
          fullname="Emmanuelle Bermès"> 
   <organization>
    Bibliothèque nationale de France
   </organization>
   <address>
    <postal>
     <street>Quai François Mauriac</street>
     <city>Paris</city>
     <code>75706</code>
     <country>France</country>
    </postal>
    <email>emmanuelle.bermes@bnf.fr</email>
   </address>
  </author>

<!-- Rick ok to be removed as co-author as of convo 2018 June 12
  <author initials="R.P.C." surname="Rodgers" 
          fullname="R. P. C. Rodgers"> 
   <organization>
    University of California San Francisco
   </organization>
   <address>
    <postal>
     <street>Box 0134, 185 Berry, China Basin, Lobby 6 290</street>
     <city>San Francisco</city> <region>CA</region>
     <code>94143-0134</code>
     <country>USA</country>
    </postal>
    <email>rodgers@arborvitae.com</email>
   </address>
  </author>
-->

<!-- NB: !! all elements of date are required for IETF submissions -->

  <date day="31" month="January" year="2021" />

<!--
     submission form at https://datatracker.ietf.org/idst/upload.cgi
     differencing tool
     https://www.ietf.org/rfcdiff?url1=draft-kunze-ark-18.txt&url2=draft-kunze-ark-24.txt
     https://www.ietf.org/rfcdiff?url1=draft-kunze-ark-23.txt&url2=draft-kunze-ark-24.txt
     https://tools.ietf.org/html/draft-kunze-ark-18

     xxx see note in file juha20190729 query strings from Juha Hakala and
         URN components
     xxx should declare query strings not part of identity, eg, so that
     thump "was()" and "accessed on <date>" don't participate?
     -->

<!-- XXX reviewers include:
XXXX want to add to all drafts a link of the updated HTML version
key consists of blade and bow (confirm at Rex key)
planchet: metal disk to be stamped into a coin (also "blank")
U. Chicago, C. Blair
-->
<!--
to do:
DONE 0. easy: update http:// to https://      zzz DONE
DONE 0. easy: update NMAH to NMA        zzz DONE
0. up front say that ARK deviates from the URI spec for the
   following reasons
0. up front say that ARK inflections pose a small risk by reserving
   meaning for certain query strings, but that such risk is mitigated
   for ARK clients because they should not trust an ARK-compliant
   response without seeing a THUMP response header
DONE 0. make OPTIONAL the first "/" in "ark:/12345", so "ark:12345" works
2018.03.22 experts day bnf
   -> this means normalization before comparison should change to remove it
2018.03.22 experts day bnf
   -> however, implementations are free to store arks in either form (provided
      of course that for normalization, the "/" is removed
0. change default inflection output format from ANVL to YAML ??
!!    change THUMP spec too.
0. change: remove terminal structural chars only after inflection check
     - this resolver-only behavior, right? (or registration behavior?)
     (makes final "/" safe as inflection)
2018.03.22 experts day bnf
     terminal structural chars set aside for inflection check on resolution
     terminal structural chars discarded for other purposes, eg, registration
        or normalization

0. add no-op inflection? ".", a terminal "." ends the ARK, thus
     <ark>.? can be used in a sentence without activating the ?
     inflection
0. change: internal structural chars "/." normalize to "."
actually normalize "/." to "." (internal only?)
... or don't normalize and allow local interpretation? no - too important
Consider "." as inflection for immersive (or computable) experience
Consider "!" as inflection for reporting broken link to a resolver
Consider "?!" as inflection for reporting out of date metadata to a resolver
Consider "..." as THUMP to report broken links
Consider "..." as inflection to return multiple targets
In section title="ARKS Embedded in Language", consider adding a sentence
   explaining that inflections kind of mess up this goal.

"Third, structural characters (slash and period) are normalized.
Initial and final occurrences are removed, and two structural
characters in a row (e.g., // or ./) are replaced by the first
character, iterating until each occurrence has at least one
non-structural character on either side.  Finally, if there are
"

   (note that the "/" after the ":" in "ark:/" was intended,
   similar to :// in http://, to make this string _much_
   less likely to occur randomly in the beginning of a URL path)

1. make the NAAN more flexible - not just 5 digits or 9 digits, but any
   "beta-numeric" string with no runs of adjacent letters longer than two
2. introduce "base identifier" concept to diagram?
3. introduce reporting file at well-known location? with YAML fields

   Updated: yyyy-mm-dd <update time of stats (not file, since may be dynamic)
   # one or many Prefix entries
   <Prefix>:	# scheme, naan, or shoulder ark:12345, 
     available: < approx number of ids exposed to public >
     total: < approx total number of ids including those exposed, not exposed
              (eg, behind paywall, or not ready for release, or
	      embargoed, or withdrawn),
     url_end_point: < end resolver url base serving them >
     prefix_metadata_url: < url to descriptive file, preferably with
              schema.org or bioschema.org tags >
     additive_source: < optional >
     authoritative_source: < optional >

   # a prefix once reported by one source, will be associated with that
   # source by harvester as authoritative; other sources that appear will
   # be marked non-authoritative until intervention (eg, two prefixes 
   # considered additive for a split namespace), or the original prefix
   # names another source as "authoritative", or "additive"

   [ pursue generic approach with Sarala and Melissa/Julie? justify by
   referencing my not-yet paper on identifier concepts that spells out
   namespace hierarchies? ]
   [ pursue generic approach in context of governance with Henning? ]
   [ name of approach:?  prefix reporting, prefix stats,
       prefix plumbing (plumbing/measuring the depth),

4. ** introduce arks.org as the place to learn of latest developments
   (for future proofing)

=== older ===
1. officially establish n2t.net
   support nordutch proof-of-concept
   establish n2t as registry of permanent names for cultural memory orgs
2. establish ark/noid mailing discussion list
   create training and marketing materials
   support for iso standardization
   developer support for noid/bind 1.0
      (UC and UC partner (eg, datanet,ndiipp) minters of preservation ready ids)
      (guids, extended sets, bignums, less bdb)
      (GUI for binder/resolver/multiple resolution)
3. 
TO DO:  
        Genericize NMA and NAA concepts to live outside ARK (eg, n2t)
	Name concept of "ARK(-like) probes/inflections" so other id schemes to reference
	Name concept of "ARK(-like) naming principles"
	Name concept of "ARK(-like) char repertoires"
(12/17/03) changed HKMP to THUMP
(12/17/03) squeezed space from dates in ERCs
TO DO: - expand on fact that NAAN is "permanent", never re-assigned
TO DO: - worry about the succession of NAA's and their policies
TO DO: - Change SDSC NAA cyberinfrastructure? (more stable org)
TO DO: - Read up on "ProQuest durable links" and "Gale InfoMark".
TO DO: - idea of sub-publishing via suffixes (/ and .)?

TO DO:  ?? let NMA set own policy re. granularity and inclusion,
            which includes ability to semi-generate new names based
            on a new "format" of originally named doc
TO DO:  ?? how can ARKs usefully disclose (a) concrete vs abstract
            (where there's no object to access) and (b) blob vs
            queryable group?  [all given NMA's ability to set policy
            on granularity and inclusion ]


XXX .ie \ne (http://www.cdlib.org/inside/diglib/ark/arkspec.pdf)
XXX .el (http://www.ietf.org/internet-drafts/\*(Id)

XXX  Copyright (C) The Internet Society (\*(Cp).  All Rights Reserved.
XXX  Creative Commons Copyright (CC) The Internet Society (2005).  Public Domain.
XXX Distribution of this document is unlimited.  Please send comments to
XXX jak@ucop.edu.
-->

  <abstract>

<t>
The ARK (Archival Resource Key) naming scheme is designed to facilitate
the high-quality and
persistent identification of information objects. A founding principle
of the ARK is that persistence is purely a matter of service and is
neither inherent in an object nor conferred on it by a particular
naming syntax. The best that an identifier can do is to lead users to
the services that support robust reference. The term ARK itself refers both
to the scheme and to any single identifier that conforms to it.
An ARK has five components:
</t>

<t>
    [https://NMA/]ark:[/]NAAN/Name[Qualifier]
</t>

<t>
an optional and mutable Name Mapping Authority (usually a
hostname), the "ark:" label, the Name Assigning Authority Number (NAAN),
the assigned Name, and an optional and possibly mutable Qualifier
supported by the NMA.  The NAAN and Name together form the immutable
persistent identifier for the object independent of the URL hostname.
An ARK is a special kind of URL that connects users to three things:
the named object, its metadata, and the provider's promise about its
persistence. When entered into the location field of a Web browser,
the ARK leads the user to the named object. That same ARK, inflected by
appending `?info', returns a metadata record that
is both human- and machine-readable. The returned record contains core
metadata and a commitment statement from the current provider.
Tools exist for minting, binding, and resolving ARKs.
</t>
  </abstract>

 </front>

 <middle>

<!--
.\" The ARK (Archival Resource Key) is a scheme that facilitates the
.\" persistent naming and retrieval of information objects.  It comprises an
.\" identifier syntax and three services.  An ARK has four components:
.\" .Cs
.\"                [http://NMA/]ark:/NAAN/Name
.\" .Ce
.\" an optional and mutable Name Mapping Authority Hostport part (NMAH,
.\" where "hostport" is a hostname followed optionally by a colon and port
.\" number), the "ark:" label, the Name Assigning Authority Number (NAAN),
.\" and the assigned Name.  The NAAN and Name together form the immutable
.\" persistent identifier for the object.
.\" 
.\" An ARK request is an ARK with a service request and a question mark
.\" appended to it.  Use of an ARK request proceeds in two steps.
.\" First, the NMAH, if not specified, is discovered based on the
.\" NAAN.  Two discovery methods are proposed:  one is file based, the
.\" other based on the DNS NAPTR record.  Second, the ARK request is
.\" submitted to the NMAH.  Three ARK services are defined, gaining access
.\" to:  (1) the object (or a sensible substitute), (2) a description of the
.\" object (metadata), and (3) a description of the commitment made by the
.\" NMA regarding the persistence of the object (policy).  These services are
.\" defined initially to use the HTTP protocol.
.\" When the NMAH is specified, the ARK is a valid URL that can gain access
.\" to ARK services using an unmodified Web client.
-->

<section title="Introduction">

<t>
[ Note about this transitional draft. The ARKsInTheOpen.org Technical Working
Group (https://wiki.duraspace.org/display/ARKs/Technical+Working+Group) is in
the process of revising the ARK spec via a series of Internet-Drafts.
This draft contains many minor but noisy changes (lots of diffs but not much
real change). While the spec is in transition, new implementors should follow
https://datatracker.ietf.org/doc/html/draft-kunze-ark-18.
]
</t>

<t>
This document describes a scheme for the high-quality naming of
information resources.  The scheme, called the Archival Resource
Key (ARK), is well suited to long-term access and identification
of any information resources that accommodate reasonably regular
electronic description.  This includes digital documents, databases,
software, and websites, as well as physical objects (books, bones,
statues, etc.) and intangible objects (chemicals, diseases,
vocabulary terms, performances).  Hereafter the term "object" refers
to an information resource.  The term ARK itself refers both to the
scheme and to any single identifier that conforms to it.  A reasonably
concise and accessible overview and rationale for the scheme is
available at <xref target="ARK"/>.
</t>

<t>
Schemes for persistent identification of network-accessible objects
are not new.  In the early 1990's, the design of the Uniform Resource
Name <xref target="RFC2141"/> responded to the observed failure rate of URLs by
articulating an indirect, non-hostname-based naming scheme and the
need for responsible name management.  Meanwhile, promoters of the
Digital Object Identifier <xref target="DOI"/> succeeded in building a community of
providers around a mature software system <xref target="Handle"/> that supports name
management.  The Persistent Uniform Resource Locator <xref
target="PURL"/> was another
scheme that had the advantage of working with unmodified web
browsers.  ARKs represent an approach that attempts to build on
the strengths and to avoid the weaknesses of these schemes.
</t>

<t>
A founding principle of the ARK is that persistence is purely a matter
of service.  Persistence is neither inherent in an object nor conferred
on it by a particular naming syntax.  Nor is the technique of name
indirection &mdash; upon which URNs, Handles, DOIs, and PURLs are
founded &mdash;
of central importance.  Name indirection is an ancient and well-understood
practice; new mechanisms for it keep appearing and distracting practitioner
attention, with the Domain Name System (DNS) <xref target="RFC1034"/>
being a particularly dazzling
and elegant example.  What is often forgotten is that maintenance of an
indirection table is an unavoidable cost to the
organization providing persistence, and that cost is equivalent across
naming schemes.  That indirection has always been a native part of the web
while being so lightly utilized for the persistence of web-based objects
indicates how unsuited most organizations will probably be to the task of
table maintenance and to the much more fundamental challenge of keeping
the objects themselves viable.
</t>

<t>
Persistence is achieved
through a provider's successful stewardship of objects and their
identifiers.  The highest level of persistence will be reinforced by a
provider's robust contingency, redundancy, and succession strategies.
It is further safeguarded to the extent that a provider's mission is
shielded from funding and political instabilities.  These are by far
the major challenges confronting persistence providers, and no
identifier scheme has any direct impact on them.  In fact, some
schemes may actually be liabilities for persistence because they create
short- and long-term dependencies for every object access on complex,
special-purpose infrastructures, parts of which are
proprietary and all of which increase the carry-forward burden for
the preservation community.  It is for this reason that the ARK scheme
relies only on educated name assignment and light use of general-purpose
infrastructures that are maintained mostly by the internet community at
large (the DNS, web servers, and web browsers).
</t>

<section title="Reasons to Use ARKs">

<t>
If no persistent identifier scheme contributes directly to persistence,
why not just use URLs?  A particular URL may be as durable an identifier
as it is possible to have, but nothing distinguishes it from an ordinary
URL to the recipient who is wondering if it is suitable for long-term
reference.  An ARK embedded in a URL provides some of the necessary
conditions for credible persistence, inviting access to not one, but to
three things:  to the object, to its metadata, and to a nuanced statement
of commitment from the provider in question (the NMA, described below)
regarding the object.  Existence of the extra service can be probed
automatically by appending `?info' to the ARK.
</t>

<t>
The form of the ARK also supports the natural separation of naming
authorities into the original name assigning authority and the diverse
multiple name mapping (or servicing) authorities that in succession and
in parallel will take over custodial responsibilities from the original
assigner (assuming the assigner ever held that responsibility) for the
large majority of a long-term object's archival lifetime.  The name
mapping authority, indicated by the hostname part of the URL that
contains the ARK, serves to launch the ARK into cyberspace.  Should it
ever fail (and there is no reason why a well-chosen hostname for a
100-year-old cultural memory institution shouldn't last as long as the
DNS), that host name is considered disposeable and replaceable.  Again,
the form of the ARK helps because it defines exactly how to recover the
core immutable object identity, and simple algorithms (one based on the
URN model) or even by-hand internet query can be used for for locating
another mapping authority.
</t>

<t>
There are tools to assist in generating ARKs and other identifiers, such
as <xref target="NOID"/> and "uuidgen", both of which rely for uniqueness
on human-maintained registries.  This document also contains some
guidelines and considerations for managing namespaces and choosing
hostnames with persistence in mind.
</t>

</section>

<section title="Three Requirements of ARKs">

<t>
The first requirement of an ARK is to give users a link from an object to
a promise of stewardship for it.  That promise is a multi-faceted covenant
that binds the word of an identified service provider to a specific set of
responsibilities.  It is critical for the promise to come from a current
provider and almost irrelevant, over a long period of time, what the
original assigner's intentions were.  No one can tell if successful
stewardship will take place because no one can predict the future.
Reasonable conjecture, however, may be based on past performance.  There
must be a way to tie a promise of persistence to a provider's
demonstrated or perceived ability &mdash; its reputation &mdash; in that
arena.  Provider reputations would then rise and fall as promises are
observed variously to be kept and broken.  This is perhaps the best way
we have for gauging the strength of any persistence promise.
</t>

<t>
The second requirement of an ARK is to give users a link from an object
to a description of it.  The problem with a naked identifier is that
without a description real identification is incomplete.  Identifiers
common today are relatively opaque, though some contain ad hoc clues
reflecting assertions that were briefly true, such as where in a
filesystem hierarchy an object lived during a short stay.  Possession of
both an identifier and an object is some improvement, but positive
identification may still be uncertain since the object itself might not
include a matching identifier or might not carry evidence obvious enough
to reveal its identity without significant research.  In either case,
what is called for is a record bearing witness to the identifier's
association with the object, as supported by a recorded set of object
characteristics.  This descriptive record is partly an identification
"receipt" with which users and archivists can verify an object's identity
after brief inspection and a plausible match with recorded
characteristics such as title and size.
<!--
.\" Among the recorded
.\" characteristics, a checksum (e.g., <xref target="RFC1321"/>) recorded at the time of last
.\" handling may assist automated identification of digital objects (although
.\" checksums will require recomputation periodically if extremely persistent
.\" objects' bitstreams change as predicted due to inevitable media migration).
-->
</t>

<t>
The final requirement of an ARK is to give users a link to the object
itself (or to a copy) if at all possible.  High quality access is the
central duty of an ARK.  Persistent identification plays a vital
supporting role but, strictly speaking, it can be construed as no more
than a record attesting to the original assignment of a never-reassigned
identifier.  Object access may not be feasible for various reasons, such
as a transient service outage, a catastrophic loss, a licensing agreement
that keeps an archive "dark" for a period of years, or when an object's
own lack of tangible existence confuses normal concepts of access (e.g.,
a vocabulary term might be "accessed" through its definition).  In such
cases the ARK's identification role assumes a much higher profile.  But
attempts to simplify the persistence problem by decoupling access from
identification and concentrating exclusively on the latter are of
questionable utility.  A perfect system for assigning forever unique
identifiers might be created, but if it did so without reducing access
failure rates, no one would be interested.  The central issue &mdash; which
may be summed up as the "HTTP 404 Not Found" problem &mdash; would not have
been addressed.
</t>

<t>
ARK resolvers must support the `?info' inflection for requesting metadata.
Older versions of this specification distinguished between two minimal
inflections: `?' (brief metadata) and `??' (more metadata).
While these older inflections are still reserved, because they have proven
hard to recognize in some environments, supporting them is optional.
</t>

</section>

<section title="Organizing Support for ARKs:  Our Stuff vs. Their Stuff">

<t>
An organization and the user community it serves can often be seen to
struggle with two different areas of persistent identification: the Our
Stuff problem and the Their Stuff problem.  In the Our Stuff problem,
we in the organization want our own objects to acquire persistent names.
Since we possess or control these objects, our organization tackles the
Our Stuff problem directly.  Whether or not the objects are named by ARKs,
our organization is the responsible party, so it can plan for, maintain,
and make commitments about the objects.
</t>

<t>
In the Their Stuff problem, we in the organization want others' objects
to acquire persistent names.  These are objects that we do not own or
control, but some of which are critically important to us.  But because
they are beyond our influence as far as support is concerned, creating
and maintaining persistent identifiers for Their Stuff is not especially
purposeful or feasible for us to engage in.  There is little that we can
do about someone else's stuff except encourage their uptake or adoption
of persistence services.
</t>

<t>
Co-location of persistent access and identification services is natural.
Any organization that undertakes ongoing support of true persistent
identification (which includes description) is well-served if it controls,
owns, or otherwise has clear internal access to the identified objects,
and this gives it an advantage if it wishes also to support persistent
access to outsiders.  Conversely, persistent access to outsiders requires
orderly internal collection management procedures that include monitoring,
acquisition, verification, and change control over objects, which in turn
requires object identifiers persistent enough to support auditable
record keeping practices.
</t>

<t>
Although organizing ARK support under one roof thus tends to make sense,
object hosting can successfully be separated from name mapping.  An
example is when a name mapping authority centrally provides uniform
resolution services via a protocol gateway on behalf of organizations that
host objects behind a variety of access protocols.  It is also reasonable
to build value-added description services that rely on the underlying
services of a set of mapping authorities.
</t>

<t>
Supporting ARKs is not for every organization.  By requiring specific,
revealed commitments to preservation, to object access, and to description,
the bar for providing ARK services is higher than for some other identifier
schemes.  On the other hand, it would be hard to grant credence to a
persistence promise from an organization that could not muster the minimum
ARK services.  Not that there isn't
a business model for an ARK-like, description-only service built on top
of another organization's full complement of ARK services.  For example,
there might be competition at the description level for abstracting and
indexing a body of scientific literature archived in a combination of
open and fee-based repositories.  The description-only service would
have no direct commitment to the objects, but would act as an intermediary,
forwarding commitment statements from object hosting services to requestors.
<!--
.\" xxx new section here on Name categories? eg, NAA-assigned, NMA-assigned
.\"     extended names ("service units"), NMA-allowed "subnames [ better
.\"     word? ] or addresses.
-->
</t>

<t>
<!--
.\" Administrative user interfaces that support persistent identification
.\" will tend to treat locations as just another piece of metadata;
.\" unrealistic to have location screens separate from cataloging screens
-->
</t>

</section>

<section title="Definition of Identifier">

<t>
An identifier is not a string of character data &mdash; an identifier is an
association between a string of data and an object.  This abstraction
is necessary because without it a string is just data.  It's
nonsense to talk about a string's breaking, or about its being strong,
maintained, and authentic.  But as a representative of an association,
a string can do, metaphorically, the things that we expect of it.
</t>

<t>
Without regard to whether an object is physical, digital, or conceptual,
to identify it is to claim an association between it and a representative
string, such as "Jane" or "ISBN 0596000278".  What gives a claim
credibility is a set of verifiable assertions, or metadata, about the
object, such as age, height, title, or number of pages.  In other words,
the association is made manifest by a record (e.g., a cataloging or other
metadata record) that vouches for it.
</t>

<t>
In the complete absence of any testimony (metadata) regarding an
association, a would-be identifier string is a meaningless sequence of
characters.  To keep an externally visible but otherwise internal string
from being perceived as an identifier by outsiders, for example, it suffices
for an organization not to disclose the nature of its association.  For
our immediate purpose, actual existence of an association record is more
important than its authenticity or verifiability, which are outside the
scope of this specification.
</t>

<t>
It is a gift to the identification process if an object carries its own
name as an inseparable part of itself, such as an identifier imprinted
on the first page of a document or embedded in a data structure element
of a digital document header.  In cases where the object is large, unwieldy,
or unavailable (such as when licensing restrictions are in effect), a
metadata record that includes the identifier string will usually suffice.
That record becomes a conveniently manipulable object surrogate, acting
as both an association "receipt" and "declaration".
</t>

<t>
Note that our definition of identifier extends the one in use for Uniform
Resource Identifiers <xref target="RFC3986"/>.  The present document still sometimes
(ab)uses the terms "ARK" and "identifier" as shorthand for the string
part of an identifier, but the context should make the meaning clear.
</t>

</section>

</section>

<section title="ARK Anatomy">

<t>
An ARK is represented by a sequence of characters (a string) that
contains the label, "ark:", optionally preceded by the beginning
part of a URL.  Here is a diagrammed example.
</t>

<figure>
 <artwork>
ARK ANATOMY         
===========          
                      
      Resolver Service   Base Object Name    Qualifiers
     __________________  ________________  _____________
    /                  \/                \/             \
    https://example.org/ark:12345/x54xz321/s3/f8.05v.tiff
            \_________/ \__/\___/\_/\____/\____/\_______/
                |        |    |   |  blade   |       |
                |      Label  |  shoulder  Sub-parts Variants
                |             |  \_______/
Name Mapping Authority (NMA)  |  Assigned Name
                              |
               Name Assigning Authority Number (NAAN) </artwork>
</figure>

<!--
     Resolver Service   Base Object Name     Qualifier
             |                  |                |
     ________|________  ________|_______  _______|_____

.\"          http://foobar.zaf.org/ark:/12345/654xz321/s3/master.tiff
.\"          \___________________/ \__/ \___/ \______/ \____________/
.\"            (replaceable)        |     |      |        Qualifier (NMA)
.\"                 |         ARK Label   |    Name (assigned by the NAA)
.\"                 |                     | 
.\"   Name Mapping Authority             Name Assigning Authority
.\"          Hostport (NMA)              Number (NAAN)
-->

<t>
The ARK syntax can be summarized,
</t>

<figure>
 <artwork>
               [https://NMA/]ark:[/]NAAN/Name[Qualifier] </artwork>
</figure>
<t>
where the NMA, '/', and Qualifier parts are in brackets to indicate that
they are optional.
The Base Object Name is the substring comprising the "ark:" label,
the NAAN and the assigned Name. The Resolver Service is replaceable and
makes the ARK actionable for a period of time. Without the Resolver Service
part, what remains is the Core Immutable Identity (the "persistible")
part of the ARK.
</t>

<section title="The Name Mapping Authority (NMA)">

<t>
Before the "ark:" label may appear an optional Name Mapping Authority
(NMA) that is a temporary address where ARK service requests
may be sent. Preceded by a URI-type protocol designation such as "https://",
it specifies a Resolver Service. The NMA itself is an Internet hostname
or host/port combination
having the same format and semantics as the host/port part of a URL.
The most important thing about the NMA is that it is "identity inert"
from the point of view of object identification.  In other words, ARKs
that differ only in the optional NMA part identify the same object.
Thus, for example, the following three ARKs are synonyms for just
one information object:
</t>
<figure>
 <artwork>
                 https://loc.gov/ark:12345/x54xz321
             https://rutgers.edu/ark:12345/x54xz321
                                 ark:12345/x54xz321 </artwork>
</figure>
<t>
Strictly speaking, in the realm of digital objects, these ARKs may lead
over time to somewhat different or diverging instances of the originally
named object.  In an ideal world, divergence of persistent objects is
not desirable, but it is widely believed that digital preservation efforts
will inevitably lead to alterations in some original objects (e.g, a format
migration in order to preserve the ability to display a document).  If any
of those objects are held redundantly in more than one organization
(a common preservation strategy), chances are small that all holding
organizations will perform the same precise transformations and all
maintain the same object metadata.  More significant divergence would
be expected when the holding organizations serve different audiences or
compete with each other.
</t>

<t>
The NMA part makes an ARK into an actionable URL.  As with many internet
parameters, it is helpful to approach the NMA being liberal in what you
accept and conservative in what you propose.  From the recipient's point
of view, the NMA part should be treated as temporary, disposable, and
replaceable.  From the NMA's point of view, it should be chosen with the
greatest concern for longevity.  A carefully chosen NMA should be at
least as permanent as the providing organization's own hostname.
In the case of a national or university library, for example, there is
no reason why the NMA should not be considerably more permanent than
soft-funded proxy hostnames such as hdl.handle.net, dx.doi.org, and
purl.org.  In general and over time, however, it is not unexpected for
an NMA eventually to stop working and require replacement with the NMA
of a currently active service provider.
</t>

<t>
This replacement relies on a mapping authority "resolver" discovery
process, of which two alternate methods are outlined in a later section.
The ARK, URN, Handle, and DOI schemes all use a resolver discovery model
that sooner or later requires matching the original assigning authority
with a current provider servicing that authority's named objects; once
found, the resolver at that provider performs what amounts to a redirect
to a place where the object is currently held.  All the schemes rely on
the ongoing functionality of currently mainstream technologies such as the
Domain Name System <xref target="RFC1034"/> and web browsers.  The Handle and DOI schemes
in addition require that the Handle protocol layer and global server grid
be available at all times.
</t>

<t>
The practice of prepending "https://" and an NMA to an ARK is a way of
creating an actionable identifier by a method that is itself temporary.
Assuming that infrastructure supporting <xref target="RFC2616"/> information retrieval will
no longer be available one day, ARKs will then have to be converted into
new kinds of actionable identifiers.  By that time, if ARKs see widespread
use, web browsers would presumably evolve to perform this (currently
simple) transformation automatically.
</t>

</section>

<section title="The ARK Label Part (ark:)">

<t>
The label part distinguishes an ARK from an ordinary identifier.
There is a new form of the label, "ark:", and an old form, "ark:/",
both of which must be recognized in perpetuity. Implementations should
generate new ARKs in the new form (without the "/") and resolvers must
always treat received ARKs as equivalent if they differ only in regard
to new form versus old form labels. Thus these two ARKs are equivalent:
</t>
<figure>
 <artwork>
                          ark:/12345/x54xz321
                           ark:12345/x54xz321 </artwork>
</figure>
<t>
In a URL found in the wild, the label indicates that the URL stands a
reasonable chance of being an ARK.
If the context warrants, verification that it actually is an ARK
can be done by testing it for existence of the three ARK services.
</t>

<t>
Since nothing about an identifier syntax directly affects persistence,
the "ark:" label (like "urn:", "doi:", and "hdl:") cannot tell you
whether the identifier is persistent or whether the object is available.
It does tell you that the original Name Assigning Authority (NAA) had
some sort of hopes for it, but it doesn't tell you whether that NAA is
still in existence, or whether a decade ago it ceased to have any
responsibility for providing persistence, or whether it ever had any
responsibility beyond naming.
</t>

<t>
Only a current provider can say for certain what sort of commitment it
intends, and the ARK label suggests that you can query the NMA directly
to find out exactly what kind of persistence is promised.  Even if what
is promised is impersistence (i.e., a short-term identifier), saying so
is valuable information to the recipient.  Thus an ARK is a high-functioning
identifier in the sense that it provides access to the object, the
metadata, and a commitment statement, even if the commitment is
explicitly very weak.
</t>

</section>

<section title="The Name Assigning Authority Number (NAAN)">

<t>
Recalling that the general form of the ARK is,
</t>
<figure>
 <artwork>
               [https://NMA/]ark:[/]NAAN/Name[Qualifier] </artwork>
</figure>
<t>
the part of the ARK directly following the "ark:" (or older "ark:/") label
is the Name Assigning Authority Number (NAAN), up to but not including
the next `/' (slash) character.  This part is always required, as
it identifies a hostname of the organization that originally assigned
the Name of the object. Typically the organization is an institution,
a department, a laboratory, or any group that conducts a stable,
policy-driven name assigning effort.
<!-- xxxzzz review -->
It is used to discover a currently valid NMA and to provide
top-level partitioning of the space of all ARKs.
</t>

<t>
An organization may request a NAAN from the ARK Maintenance Agency <xref
target="ARKagency"/> (described in <xref target="agency"/>) by filling out
the form at <xref target="NAANrequest"/>. NAANs are opaque strings of one
or more "betanumeric" characters, specifically,
</t>
<figure>
 <artwork>
    0123456789bcdfghjkmnpqrstvwxz </artwork>
</figure>
<t>
which consists of digits and consonants, minus the letter 'l'. Restricting
NAANs to betanumerics (alphanumerics without vowels or 'l') serves two goals.
It reduces the chances that words -- past, present, and future -- will appear
in NAANs and carry unintended semantics. It also helps usability by not mixing
commonly confused characters ('0' and 'O', '1' and 'l') and by being compatible
with strong transcription error detection (eg, the <xref target="NOID"/> check
digit algorithm). Since 2001, every assigned NAAN has consisted of
exactly five digits.
</t>

<t>
The NAAN designates a top-level ARK namespace.  Once registered for a
namespace, a NAAN is never re-registered.  It is possible, however,
for there to be a succession of organizations that manage an ARK
namespace.
</t>

</section>

<section title="The Name Part">

<t>
The part of the ARK just after the NAAN is the Name assigned by the NAA,
and it is also required.  Semantic opaqueness in the Name part is
strongly encouraged in order to reduce an ARK's vulnerability to era- and
language-specific change.  Identifier strings containing linguistic
fragments can create support difficulties down the road.  No matter how
appropriate or even meaningless they are today, such fragments may one day
create confusion, give offense, or infringe on a trademark as the semantic
environment around us and our communities evolves.
</t>

<t>
Names that look more or less like numbers avoid common problems that
defeat persistence and international acceptance.  The use of digits is
highly recommended.  Mixing in non-vowel alphabetic characters (eg,
betanumerics) a couple at a time is a relatively safe and easy way to
achieve a denser namespace (more possible names for a given length of
the name string). Such names have a chance of aging and traveling well.
The absence of recognizable words makes typos harder to detect in opaque
strings, so a common mitigation is to add a check character. Tools exists that
mint, bind, and resolve opaque identifiers, with or without check characters
<xref target="NOID"/>. More on naming considerations is given in a subsequent
section.
</t>

<section title="Optional: Shoulder and Blade">
<t>
Just as a ARK namespace is subdivided by NAANs reserved for NAAs,
each NAAN is a namespace that can be subdivided into "shoulders",
where each shoulder is reserved for an internal department or unit.
Like the NAAN, which is a string of characters that follows the "ark:"
label, a shoulder is a string of characters (starting with a "/")
that extends the NAAN. The base object name assigned by the NAA consists
of the NAAN, the shoulder, a final string known as the "blade". (The shoulder
plus blade terminology mirrors locksmith jargon describing the
information-bearing parts of a key.)
</t>

<t>
The blade string is chosen by the NAA such that the string created by
concatenating the NAAN plus shoulder plus blade becomes the unique base object
name. Otherwise the blade may come from any source, for example, it might
come from a counter, a timestamp, a <xref target="NOID"/> minter, 
a legacy 100-year-old accession number, etc. If there is a check digit,
it is expected to appear at the end of the blade and to be computed over
the base object name, which is generally the most important part of an
ARK to make opaque. In particular, check digits are not expected to
cover qualifiers, which often name subobjects of a persistent object
that are less stable and less opaquely named than the parent object
(for example, ten years hence, the object's thumbnail image will be of
a higher resolution and the OCR text file will be re-derived with improved
algorithms.
</t>

<t>
It is important not to use any delimiter between the shoulder string
and blade string, especially not a "/" since it declares an object
boundary (see the section on ARKs that reveal object hierarchy). This
little bit of discretion shields organizations from end users making
inferences about expected levels of support based on recognizable
shoulders. To help in-house ARK administrators reliably know where the
shoulder ends, it is recommended to use the "first-digit convention"
so that shoulders are "primordinal". A primordinal shoulder is a
sequence of one or more betanumeric characters ending in a digit. This means
that the shoulder is all consonant letters (often just one) after the NAAN
and "/" up to and including the first digit encountered after the NAAN.
One property of primordinal shoulders is that there is an infinite number
of them possible under any NAAN.
</t>

<t>
To help manage each namespace into the future, NAAs are encouraged to
create at shoulders, even if there is only one to start with. There are
four NAANs (99999, 12345, 99152, 99166, XXX describe these) that are shared
across organizations. The create a shoulder on one of them requires
a registration process (XXX).
</t>

</section>

</section>

<section title="The Qualifier Part">

<t>
The part of the ARK following the NAA-assigned Name is an optional
Qualifier.  It is a string that extends the base ARK in order to create
a kind of service entry point into the object named by the NAA.  At the
discretion of the providing NMA, such a service entry point permits an
ARK to support access to individual hierarchical components and
subcomponents of an object, and to variants (versions, languages, formats)
of components.  A Qualifier may be invented by the NAA or by any NMA
servicing the object.
</t>

<t>
In form, the Qualifier is a ComponentPath, or a VariantPath, or a
ComponentPath followed by a VariantPath.  A VariantPath is introduced
and subdivided by the reserved character `.', and a ComponentPath is
introduced and subdivided by the reserved character `/'.  In this
example,
</t>

<figure>
 <artwork>
    https://example.org/ark:12345/x54xz321/s3/f8.05v.tiff </artwork>
</figure>
<t>
the string "/s3/f8" is a ComponentPath and the string ".05v.tiff" is a
VariantPath.  The ARK Qualifier is a
formalization of some currently mainstream URL syntax conventions.
This formalization specifically reserves meanings that permit
recipients to make strong inferences about logical sub-object containment
and equivalence based only on the form of the received identifiers;
there is great efficiency in not having to inspect metadata records
to discover such relationships.  NMAs are free not to disclose any of
these relationships merely by avoiding the reserved characters above.
Hierarchical components and variants are discussed further in the
next two sections.
</t>

<t>
The Qualifier, if present, differs from the Name in several
important respects.  First, a Qualifier may have been assigned either by
the NAA or later by the NMA.  The assignment of a Qualifier by an NMA
effectively amounts to an act of publishing a service entry point within
the conceptual object originally named by the NAA.  For our purposes,
an ARK extended with a Qualifier assigned by an NMA will be called an
NMA-qualified ARK.
</t>

<t>
Second, a Qualifier assignment on the part of an NMA is made in fulfillment
of its service obligations and may reflect changing service expectations
and technology requirements.  NMA-qualified ARKs could therefore be
transient, even if the base, unqualified ARK is persistent.  For example,
it would be reasonable for an NMA to support access to an image object
through an actionable ARK that is considered persistent even if the
experience of that access changes as linking, labeling, and presentation
conventions evolve and as format and security standards are updated.
For an image "thumbnail", that NMA could also support an NMA-qualified
ARK that is considered impersistent because the thumbnail will be
replaced with higher resolution images as network bandwidth and CPU
speeds increase.  At the same time, for an originally scanned,
high-resolution master, the NMA could publish an
NMA-qualfied ARK that is itself considered persistent.  Of course, the
NMA must be able to return its separate commitments to unqualified,
NAA-assigned ARKs, to NMA-qualified ARKs, and to any NAA-qualified ARKs
that it supports.
</t>

<t>
A third difference between a Qualifier and a Name concerns the semantic
opaqueness constraint.  When an NMA-qualified ARK is to be used as a
transient service entry point into a persistent object, the priority
given to semantic opaqueness observed by the NAA in the Name part may be
relaxed by the NMA in the Qualifier part.  If service priorities in the
Qualifier take precedence over persistence, short-term usability 
considerations may recommend somewhat semantically laden Qualifier
strings.
</t>

<t>
Finally, not only is the set of Qualifiers supported by an NMA mutable,
but different NMAs may support different Qualifier sets for the same
NAA-identified object.  In this regard the NMAs act independently
of each other and of the NAA.
</t>

<t>
The next two sections describe how ARK syntax may be used to declare,
or to avoid declaring, certain kinds of relatedness among qualified ARKs.
</t>

<section title="ARKs that Reveal Object Hierarchy">

<t>
An NAA or NMA may choose to reveal the presence of a hierarchical
relationship between objects using the `/' (slash) character after
the Name part of an ARK.  Some authorities will choose not to
disclose this information, while others will go ahead and disclose
so that manipulators of large sets of ARKs can infer object
relationships by simple identifier inspection; for example, this
makes it possible for a system to present a collapsed view of a
large search result set.
</t>

<t>
If the ARK contains an internal slash after the NAAN, the piece to its
left indicates a containing object.  For example, publishing an ARK of
the form,
</t>
<figure>
 <artwork>
                    ark:12345/x54/xz/321 </artwork>
</figure>
<t>
is equivalent to publishing three ARKs,
</t>
<figure>
 <artwork>
                    ark:12345/x54/xz/321
                    ark:12345/x54/xz
                    ark:12345/x54 </artwork>
</figure>
<t>
together with a declaration that the first object is contained in the
second object, and that the second object is contained in the third.
</t>

<t>
Revealing the presence of hierarchy is completely up to the assigner
(NMA or NAA).  It is hard enough to commit to one object's name, let alone
to three objects' names and to a specific, ongoing relatedness among them.
Thus, regardless of whether hierarchy was present initially, the assigner,
by not using slashes, reveals no shared inferences about
hierarchical or other inter-relatedness in the following ARKs:
</t>
<figure>
 <artwork>
                    ark:12345/x54_xz_321
                    ark:12345/x54_xz
                    ark:12345/x54xz321
                    ark:12345/x54xz
                    ark:12345/x54 </artwork>
</figure>
<t>
Note that slashes around the ARK's NAAN (/12345/ in these examples) are
not part of the ARK's Name and therefore do not indicate the existence
of some sort of NAAN super object containing all objects in its namespace.
A slash must have at least one non-structural character (one that
is neither a slash nor a period) on both sides in order for it to
separate recognizable structural components.  So initial or final
slashes may be removed, and double slashes may be converted into
single slashes.
</t>

</section>

<section title="ARKs that Reveal Object Variants">

<t>
An NAA or NMA may choose to reveal the possible presence of variant
objects or object components using the `.' (period) character after the
Name part of an ARK.  Some authorities will choose not to disclose
this information, while others will go ahead and disclose
so that manipulators of large sets of ARKs can infer object
relationships by simple identifier inspection; for example, this
makes it possible for a system to present a collapsed view of a
large search result set.
</t>

<t>
If the ARK contains an internal period after Name, the piece to its
left is a root name and the piece to its right, and up to the end of the
ARK or to the next period is a suffix.  A Name may have more than one
suffix, for example,
</t>
<figure>
 <artwork>
                    ark:12345/x54.24
                    ark:12345/x4z/x54.24
                    ark:12345/x54.20v.78g.f55 </artwork>
</figure>
<!--
.\" g=glottus?=language... since lower case l looks too much like a 1
-->
<t>
There are two main rules.  First, if two ARKs share the same root
name but have different suffixes, the corresponding objects were
considered variants of each other (different formats, languages,
versions, etc.) by the assigner (NMA or NAA).  Thus, the following
ARKs are variants of each other:
</t>
<figure>
 <artwork>
                    ark:12345/x54.20v.78g.f55
                    ark:12345/x54.321xz
                    ark:12345/x54.44 </artwork>
</figure>
<t>
Second, publishing an ARK with a suffix implies the existence of at
least one variant identified by the ARK without its suffix.  The ARK
otherwise permits no further assumptions about what variants might exist.
So publishing the ARK,
</t>
<figure>
 <artwork>
                    ark:12345/x54.20v.78g.f55 </artwork>
</figure>
<t>
is equivalent to publishing the four ARKs,
</t>
<figure>
 <artwork>
                    ark:12345/x54.20v.78g.f55
                    ark:12345/x54.20v.78g
                    ark:12345/x54.20v
                    ark:12345/x54 </artwork>
</figure>
<t>
Revealing the possibility of variants is completely up to the assigner.
It is hard enough to commit to one object's name, let alone
to multiple variants' names and to a specific, ongoing relatedness among
them.  The assigner is the sole arbiter of what constitutes
a variant within its namespace, and whether to reveal that kind of
relatedness by using periods within its names.
</t>

<t>
A period must have at least one non-structural character (one that
is neither a slash nor a period) on both sides in order for it to
separate recognizable structural components.  So initial or final
periods may be removed, and adjacent periods may be converted
into a single period.  Multiple suffixes should be arranged in sorted
order (pure ASCII collating sequence) at the end of an ARK.
<!-- XXXXXXXXXX
Hayward Lam's advice about an example here (or at least in 2.7)
+ Additional comment/question for the draft 15 spec:
+ 
+ Section 2.7 : it seems to say that with object variance, the order is not
+ important. i.e.
+ 
+ ark:/12345/x54.20v.78g.f55
+ ark:/12345/x54.78g.20v.f55
+ 
+ These are referring to the same object. Maybe an example in the spec would
+ be helpful.

Yes, an example would help clarify that (and you've supplied one :-).
One thing that we'd been on the fence about was the statement (in 2.7):
"It is also permissible to throw out ARKs for which the suffixes are not
sorted."  It's a fairly consequential in its implications, but not
prominently stated; the idea was to make negative lexical comparison more
reliable, ie, different strings should mean different identifiers rather
than maybe different forms of the same identifier).  The example in the
spec still has an old formulation (to my discomfort) of a possible style
of naming variants; instead of

    ark:/12345/x54.20v.78g.f55
    ark:/12345/x54.78g.20v.f55

it would be better as

    ark:/12345/x54.v20.g78.f55
    ark:/12345/x54.g78.v20.f55

Here, v20 might be "version 20", g78 might be "language 78", and f55
might be "format 55".  The only reason for the old formulation was
to let format (obliquely suggested) sort to the end, where it usually
occurs in URLs.  But that had the rather unpleasant consequence that as
the numbers preceding 'g' or 'v' changed, their order in the id string
might have to be adjusted.  It's probably best to permit the last
variant string to lie outside the sorted order rule.  (You can ignore
this if I've confused things now.)

-John

-->
</t>

</section>

</section>

<section title="Character Repertoires">

<!--
XXX remove/increase the 128 byte restriction?
XXX in =#*+@_$, 
XXX    disallow #
XXX    but add ~
XXX    add any others?
-->
<t>
The Name and Qualifier parts are strings of visible ASCII characters.
For received ARKs, implementations must support a minimum length of 255 octets
for the string composed of the Base ARK plus Qualifier. Implementations
generating strings exceeding this length should understand that receiving
implementations may not be able to index such ARKs properly.
Characters may be letters, digits, or any of these seven characters:
</t>
<figure>
 <artwork>
    =   ~   *   +   @   _   $ </artwork>
</figure>
<t>
The following characters may also be used, but their meanings are reserved:
</t>
<figure>
 <artwork>
    %   -   .   / </artwork>
</figure>
<!-- ignore / . at end? -->
<t>
The characters `/' and `.' are ignored if either appears as the last
character of an ARK.  If used internally, they allow a name assigner
to reveal object hierarchy and object variants as previously described.
</t>

<t>
<!-- also hyphens now greatly confused with unicode dashes of many kinds  -->
Hyphens are considered to be insignificant and are always ignored in ARKs.
A `-' (hyphen) may appear in an ARK for readability, or it may have crept
in during the formatting and wrapping of text, but it must be ignored in
lexical comparisons.  As in a telephone number, hyphens have no meaning
in an ARK.  It is always safe for an NMA that receives an ARK to remove
any hyphens found in it.  As a result, like the NMA, hyphens are
"identity inert" in comparing ARKs for equivalence.  For example, the
following ARKs are equivalent for purposes of comparison and ARK service
access:
</t>
<figure>
 <artwork>
                            ark:12345/x5-4-xz-321
   https://sneezy.dopey.com/ark:12345/x54--xz32-1
                            ark:12345/x54xz321 </artwork>
</figure>
<t>
The `%' character is reserved for %-encoding all
other octets that would appear in the ARK string, in the same manner as
for URIs <xref target="RFC3986"/>.  A %-encoded octet consists of a `%' followed by two hex
digits; for example, "%7d" stands in for `}'.  Lower case hex digits are
preferred to reduce the chances of false acronym recognition; thus it is
better to use "%acT" instead of "%ACT".  The character `%' itself must be
represented using "%25".  As with URNs, %-encoding permits ARKs to
support legacy namespaces (e.g., ISBN, ISSN, SICI) that have less
restricted character repertoires <xref target="RFC2288"/>.
</t>

</section>

<section title="Normalization and Lexical Equivalence">

<!--
xxx document in footnote, but otherwise drop
.\" Note that the first "/" after "ark:" is important to keep, because
.\" (a) it improves recognizability in wider textual contexts and
.\" (b) reserves the option of a possible extra distinguished component
.\"     between "ark:" and "/", such as a modifier, qualifier, or hostname
.\"     e.g., ark:foo.bar.zaf/13030/lksjdf
** could that component be a non-opaque bit? or should we just use
     THUMP-style "was"
.\" (c) reserves the option of _relative_ ARKs (like relative URLs),
.\"     e.g., ark:xyz =same-as= ark:/13030/xyz  where "base" is known
.\" Recommendation to implementors:  consider the very first occurrence
.\" of "ark:/" in an identifier to be the breaking point (in case there
.\" are multiple occurrences, recognize the first).
-->

<t>
To determine if two or more ARKs identify the same object, the ARKs
are compared for lexical equivalence after first being normalized.
Since ARK strings may appear in various forms (e.g., having different
NMAs), normalizing them minimizes the chances that comparing two ARK
strings for equality will fail unless they actually identify different
objects.  In a specified-host ARK (one having an NMA), the NMA never
participates in such comparisons. Normalization described here serves
to define lexical equivalence but does not restrict how implementors
normalize ARKs locally for storage.
</t>

<t>
Normalization of a received ARK for the purpose of octet-by-octet equality
comparison with another ARK consists of the following steps.

<!--
    xxx no mention made of what to do with illegal chars - reject ARK?
    xxx no mention made of whitespace - delete?
    xxx no mention made of alternate unicodey hyphens and other nonword chars
        - map?
    xxx no mention made of alternate unicodey letters and diacritics
        - map?
-->
<list style="numbers">
<t>The NMA part (eg, everything from an initial "https://" up to
the next slash), if present is removed.</t>
<t>Any URI query string is removed (everything from the first literal
'?' to the end of the string).</t>
<t>The first case-insensitive match on "ark:/" or "ark:" is converted
to "ark:" (replacing any upper case letters and removing any terminal '/').</t>
<t>In the string that remains, the two characters following every
occurrence of `%' are converted to lower case.  The case of all other
letters in the ARK string must be preserved.</t>
<t>All hyphens are removed.</t>
<t>If normalization is being done as part of a resolution step, and if
the end of the remaining string matches a known inflection, the inflection
is noted and removed.</t>
<t>Structural characters (slash and period) are normalized:
initial and final occurrences are removed, and two structural
characters in a row (e.g., // or ./) are replaced by the first
character, iterating until each occurrence has at least one
non-structural character on either side.</t>
<t>If there are
any components with a period on the left and a slash on the right,
either the component and the preceding period must be moved to the
end of the Name part or the ARK must be thrown out as malformed.</t>
<!--
.\" XXX need example of these normalizations
xxx change these in face of possible new inflections
-->
<t>The final step is to arrange the suffixes in ASCII collating
sequence (that is, to sort them) and to remove duplicate suffixes, if any.
It is also permissible to throw out ARKs for which the suffixes
are not sorted.</t>
<!--
.\" XXX tough talk - keep? or back up and mention it earlier during
.\"      intro to variants
.\" XXX need example of these normalizations
-->
</list>
</t>

<t>
The resulting ARK string is now normalized.  Comparisons between
normalized ARKs are case-sensitive, meaning that upper case letters
are considered different from their lower case counterparts.
</t>

<t>
To keep ARK string variation to a minimum, no reserved ARK characters
should be %-encoded unless it is deliberately to conceal their reserved
meanings.  No non-reserved ARK characters should ever be %-encoded.
Finally, no %-encoded character should ever appear in an ARK in its
decoded form.
</t>

</section>

</section>

<section title="Naming Considerations">

<t>
The most important threats faced by persistence providers include such
things as funding loss, natural disaster, political and social upheaval,
processing faults, and errors in human oversight.  There is nothing that
an identifer scheme can do about such things.  Still, a few observed
identifier failures and inconveniences can be traced back to naming
practices that we now know to be less than optimal for persistence.
</t>

<section title="ARKS Embedded in Language">

<t>
The ARK has different goals from the URI, so it has different character
set requirements.  Because linguistic constructs imperil persistence, for
ARKs non-ASCII character support is unimportant.  ARKs and URIs share
goals of transcribability and transportability within web documents, so
characters are required to be visible, non-conflicting with HTML/XML
syntax, and not subject to tampering during transmission across common
transport gateways.  Add the goal of making an undelimited ARK
recognizable in running prose, as in ark:12345/=@_22*$, and certain
punctuation characters (e.g., comma, period) end up being excluded from
the ARK lest the end of a phrase or sentence be mistaken for part of the
ARK.
</t>

<t>
This consideration has more direct effect on ARK usability in a natural
language context than it has on ARK persistence.  The same is true of
the rule preventing hyphens from having lexical significance.  It is
fine to publish ARKs with hyphens in them (e.g., such as the output of
UUID/GUID generators), but the uniform treatment of hyphens as
insignificant reduces the possibility of users transcribing identifiers
that will have been broken through unpredictable hyphenation by word
processors.  Any measure that reduces user irritation with an identifier
will increase its chances of survival.
</t>

</section>

<section title="Objects Should Wear Their Identifiers">

<t>
A valuable technique for provision of persistent objects is to try to
arrange for the complete identifier to appear on, with, or near its
retrieved object.
An object encountered at a moment in time when its discovery context has
long since disappeared could then easily be traced back to its metadata,
to alternate versions, to updates, etc.  This has seen reasonable
success, for example, in book publishing and software distribution.
An identifier string only has meaning when its association is known,
and this a very sure, simple, and low-tech method of reminding everyone
exactly what that association is.
</t>

</section>

<section title="Names are Political, not Technological">

<t>
If persistence is the goal, a deliberate local strategy for systematic
name assignment is crucial.  Names must be chosen with great care.
Poorly chosen and managed names will devastate any persistence strategy,
and they do not discriminate by identifier scheme.  Whether a mistakenly
re-assigned name is a URN, DOI, PURL, URL, or ARK, the damage &mdash;
failed access and confusion &mdash; is not mitigated more in one scheme than
in another.  Conversely, in-house efforts to manage names responsibly
will go much further towards safeguarding persistence than any choice of
naming scheme or name resolution technology.
</t>

<t>
Branding (e.g., at the corporate or departmental level) is important for
funding and visibility, but substrings representing brands and
organizational names should be given a wide berth except when absolutely
necessary in the hostname (the identity-inert) part of the ARK.
These substrings are not only unstable because organizations change
frequently, but they are also dangerous because successor
organizations often have political or legal reasons to actively
suppress predecessor names and brands.  Any measure that reduces the
chances of future political or legal pressure on an identifier will
decrease the chances that our descendants will be obliged to deliberately
break it.
</t>

</section>

<section title="Choosing a Hostname or NMA">

<t>
Hostnames appearing in any identifier meant to be persistent must be
chosen with extra care.  The tendency in hostname selection has
traditionally been to choose a token with recognizable attributes, such
as a corporate brand, but that tendency wreaks havoc with persistence
that is supposed to outlive brands, corporations, subject classifications,
and natural language semantics (e.g., what did the three letters "gay" mean
in 1958, 1978, and 1998?).  Today's recognized and correct attributes are
tomorrow's stale or incorrect attributes.  In making hostnames (any
names, actually) long-term persistent, it helps to eliminate recognizable
attributes to the extent possible.  This affects selection of any name
based on URLs, including PURLs and the explicitly disposable NMAs.
</t>

<t>
There is no excuse for a provider that manages its internal names
impeccably not to exercise the same care in choosing what could be an
exceptionally durable hostname, especially if it would form the prefix
for all the provider's URL-based external names.  Registering an opaque
hostname in the ".org" or ".net" domain would not be a bad start.
Another way is to publish your ARKs with an organizational domain name
that will be mapped by DNS to an appropriate NMA host.  This makes for
shorter names with less branding vulnerability.
</t>

<t>
It is a mistake to think that hostnames are inherently unstable.
If you require brand visibility, that may be a fact of life.
But things are easier if yours is the brand of long-lived cultural
memory institution such as a national or university library or archive.
Well-chosen hostnames
from organizations that are sheltered from the direct effects of a
volatile marketplace can easily provide longer-lived global
resolvers than the domain names explicitly or implicitly used as
starting points for global resolution by indirection-based
persistent identifier schemes.  For example, it is hard to imagine
circumstances under which the Library of Congress' domain name would
disappear sooner than, say, "handle.net".
</t>

<t>
For smaller libraries, archives, and preservation organizations,
there is a natural concern about whether they will be able to keep
their web servers and domain names in the face of uncertain funding.
One option is to form or join a consortium <xref target="N2T"/> of like-minded
organizations with the purpose of providing mutual preservation support.
The first goal of such a consortium would be to perpetually rent a
hostname on which to establish a web server that simply redirects
incoming member
organization requests to the appropriate member server; using ARKs,
for example, a 150-member consortium could run a very small server (24x7)
that contained nothing more than 150 rewrite rules in its configuration
file.  Even more helpful would be additional consortial support for a
member organization that was unable to continue providing services
and needed to find a successor archival organization.  This would be a
low-cost, low-tech way to publish ARKs (or URLs) under highly persistent
hostnames.
</t>

<t>
There are no obvious reasons why the organizations registering DNS names,
URN Namespaces, and DOI publisher IDs should have among them one that
is intrinsically more fallible than the next.  Moreover, it is a
misconception that the demise of DNS and of HTTP need adversely affect
the persistence of URLs.  At such a time, certainly URLs from the present
day might not then be actionable by our present-day mechanisms, but
resolution systems for future non-actionable URLs are no harder to
imagine than resolution systems for present-day non-actionable URNs and
DOIs.  There is no more stable a namespace than one that is dead and
frozen, and that would then characterize the space of names bearing the
"http://" or "https://" prefix.
It is useful to remember that just because hostnames
have been carelessly chosen in their brief history does not mean that
they are unsuitable in NMAs (and URLs) intended for use in situations
demanding the highest level of persistence available in the Internet
environment.  A well-planned name assignment strategy is everything.
</t>

</section>

<section title="Assigners of ARKs">

<t>
A Name Assigning Authority (NAA) is an organization that creates
(or delegates creation of) long-term associations between identifiers
and information objects.  Examples of NAAs include national libraries,
national archives, and publishers.  An NAA may arrange with an external
organization for identifier assignment.  The US Library of Congress,
for example, allows OCLC (the Online Computer Library Center, a major
world cataloger of books) to create associations between Library of
Congress call numbers (LCCNs) and the books that OCLC processes.
A cataloging record is generated that testifies to each association,
and the identifier is included by the publisher, for example, in the
front matter of a book.
<!--
.\" xxx check
-->
</t>

<t>
An NAA does not so much create an identifier as create an association.
The NAA first draws an unused identifier string from its namespace,
which is the set of
all identifiers under its control.  It then records the assignment of the
identifier to an information object having sundry witnessed characteristics,
such as a particular author and modification date.  A namespace is usually
reserved for an NAA by agreement with recognized community organizations
(such as IANA and ISO) that all names containing a particular string be
under its control.  In the ARK an NAA is represented by the Name
Assigning Authority Number (NAAN).
</t>

<t>
The ARK namespace reserved for an NAA is the set of names bearing its
particular NAAN.  For example, all strings beginning with "ark:12345/"
are under control of the NAA registered under 12345, which might be the
National Library of Finland.  Because each NAA has a different NAAN,
names from one namespace cannot conflict with those from another.
Each NAA is free to assign names from its namespace (or delegate
assignment) according to its own policies.  These policies must be
documented in a manner similar to the declarations required for URN
Namespace registration <xref target="RFC2611"/>.
</t>

<t>
Organizations can request or update a NAAN by filling out a form
<xref target="NAANrequest"/>.
</t>

</section>

<section title="NAAN Namespace Management">

<t>
Every NAA must have a namespace management strategy.  A time-honored
technique is to hierarchically partition a namespace into subnamespaces
using prefixes that guarantee non-collision of names in different
partition.  This practice is strongly encouraged for all NAAs, especially
when subnamespace management will be delegated to other departments,
units, or projects within an organization.  For example, with a NAAN that
is assigned to a university and managed by its main library, care should
be taken to reserve semantically opaque prefixes that will set aside
large parts of the unused namespace for future assignments.  Prefix-based
partition management is an important responsibility of the NAA.
</t>

<t>
This sort of delegation by prefix is well-used in the formation of
DNS names and ISBN identifiers.  An important difference is that in
the former, the hierarchy is deliberately exposed and in the latter
it is hidden.  Rather than using lexical boundary markers such as the
period (`.') found in domain names, the ISBN uses a publisher prefix
but doesn't disclose where the prefix ends and the publisher's
assigned name begins.  This practice of non-disclosure, borrowed from the
ISBN and ISSN schemes, is encouraged in assigning ARKs, because it reduces
the visibility of an assertion that is probably not important now and may
become a vulnerability later.
</t>

<t>
Reasonable prefixes for assigned names usually consist of consonants and
digits and are 1-5 characters in length.  For example, the constant
prefix "x9t" might be delegated to a book digitization project that
creates identifiers such as
</t>
<figure>
 <artwork>
        https://444.berkeley.edu/ark:28722/x9t38rk45c </artwork>
</figure>
<t>
If longevity is the goal, it is important to keep the prefixes free of
recognizable semantics; for example, using an acronym representing a
project or a department is discouraged.  At the same time, you may wish
to set aside a subnamespace for testing purposes under a prefix such
as "fk..." that can serve as a visual clue and reminder to maintenance
staff that this "fake" identifier was never published.
</t>

<t>
There are other measures one can take to avoid user confusion,
transcription errors, and the appearance of accidental semantics when
creating identifiers.  If you are generating identifiers automatically,
pure numeric identifiers are likeley to be semantically opaque enough,
but it's probably useful to avoid leading zeroes because some users
mistakenly treat them as optional, thinking (arithmetically) that they
don't contribute to the "value" of the identifier.
</t>

<t>
If you need lots of identifiers and you don't want them to get too long,
you can mix digits with consonants (but avoid vowels since they might
accidentally spell words) to get more identifiers without increasing the
string length.  In this case you may not want more than a two letters in
a row because it reduces the chance of generating acronyms.  Generator
tools such as <xref target="NOID"/> provide support for these sorts of identifiers, and
can also add a computed check character as a guarantee against the most
common transcription errors.
</t>

</section>

<section title="Sub-Object Naming">

<t>
As mentioned previously, semantically opaque identifiers are very useful
for long-term naming of abstract objects, however, it may be appropriate to
extend these names with less opaque extensions that reference contemporary
service entry points (sub-objects) in support of the object.  Sub-object
extensions beginning with a digit or underscore (`_') are reserved for the
possibilty of developing a future registry of canonical service points
(e.g., numeric references to versions, formats, languages, etc).
</t>

</section>

</section>

<section title="Finding a Name Mapping Authority">

<t>
In order to derive an actionable identifier (these days, a URL)
from an ARK, a hostname (or hostname plus port combination) for a working Name
Mapping Authority (NMA) must be found.  An NMA is a service that is able to
respond to basic ARK service requests.  Relying on registration
and client-side discovery, NMAs make known which NAAs' identifiers they
are willing to service.
<!--
.\" xxx Each NMA has a standard code/number?
-->
</t>

<t>
Upon encountering an ARK, a user (or client software) looks inside it
for the optional NMA part (the host part of the NMA's ARK service).
If it contains an NMA that is working, this NMA discovery step may be
skipped; the NMA effectively uses the beginning of an ARK to cache
the results of a prior mapping authority discovery process.  If a new
<!--
.\" xxx illustrate this:  before and after
-->
NMA needs to found, the client looks inside the ARK again for the NAAN
(Name Assigning Authority Number).  Querying a global database, it then
uses the NAAN to look up all current NMAs that service ARKs issued by
the identified NAA.
</t>

<t>
The global database is key, and ideally the lookup would be automatic
and transparent to the user.
For this, the most promising method is probably the Name-to-Thing (N2T)
Resolver <xref target="N2T"/> at n2t.net.
It is a proposed low-cost, highly reliable, consortially
maintained NMA that simply exists to support actionable HTTP-based URLs
for as long as HTTP is used.  One of its big advantages over the other
two methods and the URN, Handle, DOI, and PURL methods, is that N2T
addresses the namespace splitting problem.  When objects maintained by
one NMA are inherited by more than one successor NMA, until now one of
those successors would be required to maintain forwarding tables on
behalf of the other successors.
</t>

<t>
There are two other ways to discover an NMA, one of them described in a
subsection below.  Another way, described in an appendix, is based on a
simplification of the URN resolver discovery method, itself very similar
in principle to the resolver discovery method used by Handles and DOIs.
None of these methods does more than what can be done with a very small,
consortially maintained web server such as <xref target="N2T"/>. 
</t>

<t>
In the interests of long-term persistence, however, ARK mechanisms are
first defined in high-level, protocol-independent terms so that
mechanisms may evolve and be replaced over time without compromising
fundamental service objectives.  Either or both specific methods given
here may eventually be supplanted by better methods since, by design,
the ARK scheme does not depend on a particular method, but only on having
some method to locate an active NMA.
</t>

<t>
At the time of issuance, at least one NMA for an ARK should be
prepared to service it.  That NMA may or may not be administered by the
Name Assigning Authority (NAA) that created it.  Consider the following
hypothetical example of providing long-term access to a cancer research
journal.  The publisher wishes to turn a profit and the
National Library of Medicine wishes to preserve the scholarly
record.  An agreement might be struck whereby the publisher would act
as the NAA and the national library would archive
the journal issue when it appears, but without providing direct access
for the first six months.  During the first six months of peak
commercial viability, the publisher would retain exclusive delivery
rights and would charge access fees.  Again, by agreement, both the
library and the publisher would act as NMAs, but during that initial
period the library would redirect requests for issues less than six
months old to the publisher.  At the end of the waiting period, the
library would then begin servicing requests for issues older than six
months by tapping directly into its own archives.  Meanwhile, the
publisher might routinely redirect incoming requests for older issues
to the library.  Long-term access is thereby preserved, and so is the
commercial incentive to publish content.
</t>

<t>
Although it will be common for an NAA also to run an NMA service,
it is never a requirement.  Over time NAAs and NMAs will come and go.
One NMA will succeed another, and there might be
many NMAs serving the same ARKs simultaneously (e.g., as mirrors or as
competitors).  There might also be asymmetric but coordinated NMAs as
in the library-publisher example above.
</t>

<section title="Looking Up NMAs in a Globally Accessible File">

<t>
This subsection describes a way to look up NMAs using a simple name
authority table represented as a plain text file.
For efficient access the file may be stored in a local filesystem, but
it needs to be reloaded periodically to incorporate updates.  It is not
expected that the size of the file or frequency of update should impose
an undue maintenance or searching burden any time soon, for even
primitive linear search of a file with ten-thousand NAAs is a subsecond
operation on modern server machines.  The proposed file strategy is
similar to the /etc/hosts file strategy that supported Internet host
address lookup for a period of years before the advent of DNS.
</t>

<t>
The name authority table file is updated on an ongoing basis and is
available for copying over the internet from a number of mirror sites
<xref target="NAANregistry"/>.
The file contains comment lines (lines that begin with `#') explaining
the format and giving the file's modification time, reloading address,
and NAA registration instructions.
</t>

<!--
.\" xxx (see http://ark.nlm.nih.gov/naapolicyeg.html for an example).
XXXXXXXXXXXXX update this table!
-->

</section>

</section>

<section title="Generic ARK Service Definition">

<t>
An ARK request's output is delivered information; examples include the
object itself, a policy declaration (e.g., a promise of support), a
descriptive metadata record, or an error message.  The experience of
object delivery is expected to be an evolving mix of information that
reflects changing service expectations and technology requirements;
contemporary examples include such things as an object summary and
component links formatted for human consumption.  ARK services must be
couched in high-level, protocol-independent terms if persistence is to
outlive today's networking infrastructural assumptions.  The high-level
ARK service definitions listed below are followed in the next section by
a concrete method (one of many possible methods) for delivering these
services with today's technology.
Note that some services may be invoked in one operation, such as when
an '?info' inflection returns both a description and a permanence
declaration for an object.
</t>

<section title="Generic ARK Access Service (access, location)">

<t>
Returns (a copy of) the object or a redirect to the same, although a
sensible object proxy may be substituted.  Examples of sensible
substitutes include,

<list style="symbols">
<t> a table of contents instead of a large complex document, </t>
<t> a home page instead of an entire web site hierarchy, </t>
<t> a rights clearance challenge before accessing protected data, </t>
<t> directions for access to an offline object (e.g., a book), </t>
<t> a description of an intangible object (a disease, an event), or </t>
<t> an applet acting as "player" for a large multimedia object. </t>
</list>

May also return a discriminated list of alternate object locators.
If access is denied, returns an explanation of the
object's current (perhaps permanent) inaccessibility.
</t>

<section title="Generic Policy Service (permanence, naming, etc.)">

<t>
Returns declarations of policy and support commitments for given ARKs.
Declarations are returned in either a structured metadata format or a
human readable text format; sometimes one format may serve both purposes.
Policy subareas may be addressed in separate requests, but the following
areas should be covered:  object permanence, object naming, object
fragment addressing, and operational service support.
</t>

<t>
The permanence declaration for an object is a rating defined with respect
to an identified permanence provider (guarantor), which will be the NMA.
It may include the following aspects.
</t>

<t>
<list>
<t>
(a) "object availability" &mdash; whether and how access to the
object is supported (e.g., online 24x7, or offline only),
</t>

<t>
(b) "identifier validity" &mdash; under what conditions the identifier
will be or has been re-assigned,
</t>

<t>
(c) "content invariance" &mdash; under what conditions the content of
the object is subject to change, and
</t>

<t>
(d) "change history" &mdash; access to corrections, migrations, and revisions,
whether through links to the changed objects themselves or through a
document summarizing the change history
</t>
</list>
</t>

<t>
A recent approach to persistence statements, conceived independently from ARKs,
can be found at <xref target="PStatements"/>, with ongoing work available at
<xref target="ARKagency"/>.
An older approach to a permanence rating framework is given in
<xref target="NLMPerm"/>, which identified the following "permanence levels":
</t>

<t>
<list>
<t>
Not Guaranteed: 
No commitment has been made to retain this resource.  It could become
unavailable at any time.  Its identifier could be changed. 
</t>

<t>
Permanent: Dynamic Content:
A commitment has been made to keep this resource permanently available.
Its identifier will always provide access to the resource.  Its content
could be revised or replaced. 
</t>

<t>
Permanent: Stable Content:
A commitment has been made to keep this resource permanently available.
Its identifier will always provide access to the resource.  Its content
is subject only to minor corrections or additions.
</t>

<t>
Permanent: Unchanging Content:
A commitment has been made to keep this resource permanently available.
Its identifier will always provide access to the resource.  Its content
will not change.
</t>
</list>
</t>

<t>
Naming policy for an object includes an historical description of the
NAA's (and its successor NAA's) policies regarding differentiation
of objects.  Since it is the NMA that responds to requests for policy
statements, it is useful for the NMA to be able to produce or
summarize these historical NAA documents.  Naming policy may include
the following aspects.
</t>

<t>
<list>
<t>
(i) "similarity" &mdash; (or "unity") the limit, defined by the NAA,
to the level of dissimilarity beyond which two similar objects
warrant separate identifiers but before which they share one single
identifier, and
</t>

<t>
(ii) "granularity" &mdash; the limit, defined by the NAA, to the level
of object subdivision beyond which sub-objects do not warrant
separately assigned identifiers but before which sub-objects are
assigned separate identifiers.
</t>
</list>
</t>

<!--
.\" xxx derivative objects
-->
<t>
Subnaming policy for an object describes the qualifiers that the NMA,
in fulfilling its ongoing and evolving service obligations, allows as
extensions to an NAA-assigned ARK.  To the conceptual object that the
NAA named with an ARK, the NMA may add component access points and
derivatives (e.g., format migrations in aid of preservation) in order
to provide both basic and value-added services.
</t>

<t>
Addressing policy for an object includes a description of how, during
access, object components (e.g., paragraphs, sections) or views (e.g.,
image conversions) may or may not be "addressed", in other words, how
the NMA permits arguments or parameters to modify the object delivered
as the result of an ARK request.  If supported, these sorts of
operations would provide things like byte-ranged fragment delivery and
open-ended format conversions, or any set of possible transformations
that would be too numerous to list or to identify with separately
assigned ARKs.
</t>

<t>
Operational service support policy includes a description of general
operational aspects of the NMA service, such as after-hours staffing and
trouble reporting procedures.
</t>

</section>

<section title="Generic Description Service">

<t>
Returns a description of the object.  Descriptions are returned in a
structured metadata format, a human-readable text format, or in one
format that serves both purposes (such as human-readable HTML with
embedded machine-readable metadata, or perhaps YAML). A description must at
a minimum answer the who, what, when, and where questions ("where" being the
long-term identifier as opposed to a transient redirect target) concerning an
expression of the object.  Standalone descriptions should be accompanied by the
modification date and source of the description itself.  May also return
discriminated lists of ARKs that are related to the given ARK.
</t>

</section>

</section>

<section title="Overview of The HTTP URL Mapping Protocol (THUMP)">

<t>
The HTTP URL Mapping Protocol (THUMP) is a way of taking a key (any
identifier) and asking such questions as, what information does this
identify and how permanent is it?  <xref target="THUMP"/> is in fact one
specific method
under development for delivering ARK services.  The protocol runs over
HTTP to exploit the web browser's current pre-eminence as user interface
to the Internet.  THUMP is designed so that a person can enter ARK
requests directly into the location field of current browser interfaces.
Because it runs over HTTP, THUMP can be simulated and tested via
keyboard-based interactions <xref target="RFC0854"/>.
</t>

<t>
The asker (a person or client program) starts with an identifier,
such as an ARK or a URL.  The
identifier reveals to the asker (or allows the asker to infer) the
Internet host name and port number of a server system that responds to
questions.  Here, this is just the NMA that is obtained by inspection
and possibly lookup based on the ARK's NAAN.  The asker then sets up an
HTTP session with the server system, sends a question via a THUMP request
(contained within an HTTP request), receives an answer via a THUMP
response (contained within an HTTP response), and closes the session.
That concludes the connected portion of the protocol.
</t>

<t>
A THUMP request is a string of characters beginning with a `?' (question mark)
that is appended to the identifier string.  The resulting string is sent as an
argument to HTTP's GET command.  Request strings too long for GET may be sent
using HTTP's POST command. The two most common requests correspond to two
degenerate special cases. First, a simple key with no request at all is the
same as an ordinary access request. Thus a plain ARK entered into a browser's
location field behaves much like a plain URL, and returns access to the primary
identified object, for instance, an HTML document.
</t>

<t>
The second special case is a minimal ARK description request string
consisting of just "?info".  For example, entering the string,
</t>
<figure>
 <artwork>
        n2t.net/ark:67531/metadc107835?info </artwork>
</figure>
<t>
into the browser's location field directly precipitates a request for
a metadata record describing the object identified by ark:67531/metadc107835.
The browser, unaware of THUMP, prepares and sends an HTTP GET request in the
same manner as for a URL.  THUMP is designed so that the response (indicated by
the returned HTTP content type) is normally displayed, whether the output is
structured for machine processing (text/plain) or formatted for human
consumption (text/html). In addition to '?info', this specification reserves
both '?' and '??' (originally older forms) for future use.
</t>

<t>
The following example THUMP session assumes metadata being returned by
a resolver (as server) to a browser client. Each line has been annotated to
include a line number and whether it was the client or server that sent
it.  Without going into much depth, the session has four pieces separated from
each other by blank lines:  the client's piece (lines 1-3), the server's
HTTP/THUMP response headers (4-7), and the body of the server's response
(8-13).  The first and last lines (1 and 13) correspond to the client's steps
to start the TCP session and the server's steps to end it, respectively.
</t>
<figure>
 <artwork>
 1  C: [opens session]
    C: GET https://n2t.net/ark:67531/metadc107835?info HTTP/1.1
    C: 
    S: HTTP/1.1 200 OK
 5  S: Content-Type: text/plain
    S: THUMP-Status: 0.6 200 OK
    S: 
    S: erc:
    S: who:   Austin, Larry
10  S: what:  A Study of Rhythm in Bach's Orgelbüchlein
    S: when:  1952
    S: where: https://digital.library.unt.edu/ark:/67531/metadc107835
    S: erc-support:
    S: who:   University of North Texas Libraries
15  S: what:  Permanent: Stable Content:
    S: when:  20081203
    S: where: https://digital.library.unt.edu/ark:/67531/
    S: [closes session] </artwork>
</figure>
<t>
The first two server response lines (4-5) above are typical of HTTP.
The next line (6) is peculiar to THUMP, and indicates the THUMP version
and a normal return status.
</t>

<t>
The balance of the response consists of a single metadata record (8-17)
that comprises the ARK description service response.  The returned record
is in the format of an Electronic Resource Citation <xref target="ERC"/>,
which is discussed in overview in the next section. For now, note that it
contains four elements that answer the top priority questions regarding an
expression of the object:  who played a major role in expressing it, what the
expression was called, when it was created, and where the expression may be
found (note that "where" is preferably a persistent, citable identifier rather
than an unstable URL sometimes mistakenly referred to as a "location").
This quartet of elements comes up again and again in ERCs.
Lines 13-17 contain a minimal persistence statement.
<!--
.\" xxx add ERC.pm reference???
-->
</t>
<t>
Each segment in an ERC tells a different story relating to the object,
so although the same four questions (elements) appear in each, the answers
depend on the segment's story type.  While the first segment tells the
story of an expression of the object, the second segment tells the story
of the support commitment made to it:  who made the commitment, what the
nature of the commitment was, when it was made, and where a fuller
explanation of the commitment may be found.
</t>

</section>

<section title="The Electronic Resource Citation (ERC)">

<t>
An Electronic Resource Citation (or ERC, pronounced e-r-c)
<xref target="ERC"/> is a kind of object description that uses
Dublin Core Kernel metadata elements <xref target="DCKernel"/>.
The ERC with Kernel elements provides a simple, compact, and printable
record for holding data associated with an information resource.
As originally designed <xref target="Kernel"/>, Kernel metadata
balances the needs for expressive power, very simple machine
processing, and direct human manipulation.
The ERC sense of "citation" is not limited to the traditional referencing of
a result or information fixed in time on a printed page, but to a more general
kind of reference, both backward, to digital material that cannot be known to
be fixed in time (true of virtually all online information), and forward, to
material that is all the more valuable for improving or evolving over time.
</t>

<t>
The previous section shows two limited examples of what is fully
described elsewhere <xref target="ERC"/>.  The rest of this short
section provides some of the background and rationale for this
record format.
</t>

<t>
A founding principle of Kernel metadata is that direct human contact
with metadata will be a necessary and sufficient condition for the near
term rapid development of metadata standards, systems, and services.
Thus the machine-processable Kernel elements must only minimally strain
people's ability to read, understand, change, and transmit ERCs without
their relying on intermediation with specialized software tools.  The
basic ERC needs to be succinct, transparent, and trivially parseable by
software.
</t>

<t>
Borrowing from the data structuring format that underlies the successful spread
of email and web services, the ERC format uses <xref target="ANVL"/>, which is
based on email and HTTP headers <xref target="RFC2822"/>.  There is a
naturalness to ANVL's label-colon-value format (seen in the previous section)
that barely needs explanation to a person beginning to enter ERC metadata.
</t>

<t>
While ANVL elements are expected at the top level and don't themselves support
hierarchy, the value of an ANVL element may be an arbitrary encoded hierarchy
of JSON or XML. Typically, the name of such an ANVL element ends in "json" or
"xml", for example, "json" or "geojson". Care should be taken to escape
structural characters that appear in element names and values, specifically,
line terminators (both newlines ("\n") and carriage returns ("\r")) and,
in element names, colons (":").
</t>

<t>
Besides simplicity of ERC system implementation and data entry mechanics,
ERC semantics (what the record and its constituent parts mean) must also
be easy to explain.  ERC semantics are based on a reformulation and
extension of the Dublin Core <xref target="RFC5013"/> hypothesis, which
suggests that the
fifteen Dublin Core metadata elements have a key role to play in
cross-domain resource description.  The ERC design recognizes that the
Dublin Core's primary contribution is the international,
interdisciplinary consensus that identified fifteen semantic buckets
(element categories), regardless of how they are labeled.  The ERC then
adds a definition for a record and some minimal compliance rules.  In
pursuing the limits of simplicity, the ERC design combines and relabels
some Dublin Core buckets to isolate a tiny kernel (subset) of four
elements for basic cross-domain resource description.
</t>

<t>
For the cross-domain kernel, the ERC uses the four basic elements &mdash;
who, what, when, and where &mdash; to pretend that every object in the
universe can have a uniform minimal description.  Each has a name or other
identifier, a locator (a means to access it), some responsible person or party,
and a date. It doesn't matter what type of object it is, or whether one plans
to read it, interact with it, smoke it, wear it, or navigate it.  Of course,
this approach is flawed because uniformity of description for some object
types requires more semantic contortion and sacrifice than for others.
That is why at the beginning of this document, the ARK was said to be
suited to objects that accommodate reasonably regular electronic
description.
</t>

<t>
While insisting on uniformity at the most basic level provides powerful
cross-domain leverage, the semantic sacrifice is great for many
applications.  So the ERC also permits a semantically rich and nuanced
description to co-exist in a record along with a basic description.
In that way both sophisticated and naive recipients of the record can
extract the level of meaning from it that best suits their needs and
abilities.  Key to unlocking the richer description is a controlled
vocabulary of ERC record types (not explained in this document) that
permit knowledgeable recipients to apply defined sets of additional
assumptions to the record.
</t>

</section>

<section title="Advice to Web Clients">

<t>
ARKs are envisaged to appear wherever durable object references are
planned.  Library cataloging records, literature citations, and
bibliographies are important examples.  In many of these places URLs
(Uniform Resource Locators) are currently used, and inside some of
those URLs are embedded URNs, Handles, and DOIs.  Unfortunately,
there's no suggestion of a way to probe for extra services that
would build confidence in those identifiers; in other words, there's
no way to tell whether any of those identifiers is any better managed
than the average URL.
</t>

<t>
ARKs are also envisaged to appear in hypertext links (where they are not
normally shown to users) and in rendered text (displayed or printed).
A normal HTML link for which the URL is not displayed looks like this.
</t>

<figure>
 <artwork><![CDATA[
<a href = "https://example.org/index.htm"> Click Here <a> ]]></artwork>
</figure>

<t>
A URL with an embedded ARK invites access (via `?info') to extra services:
</t>

<figure>
 <artwork><![CDATA[
<a href = "https://example.org/ark:14697/b12345x"> Click Here <a> ]]></artwork>
</figure>

<t>
Using the <xref target="N2T"/> resolver to provide
identifier-scheme-agnostic protection against hostname
instability, this ARK could be published as:
</t>

<figure>
 <artwork><![CDATA[
<a href = "https://n2t.net/ark:14697/b12345x"> Click Here <a> ]]></artwork>
</figure>

<t>
An NAA will typically make known the associations it creates by
publishing them in catalogs, actively advertizing them, or simply leaving
them on web sites for visitors (e.g., users, indexing spiders) to stumble
across in browsing.
</t>

</section>

<section title="Security Considerations">

<t>
The ARK naming scheme poses no direct risk to computers and networks.
Implementors of ARK services need to be aware of security issues when
querying networks and filesystems for Name Mapping Authority services,
and the concomitant risks from spoofing and obtaining incorrect
information.  These risks are no greater for ARK mapping authority
discovery than for other kinds of service discovery.  For example,
recipients of ARKs with a specified host (NMA) should treat it
like a URL and be aware that the identified ARK service may no longer
be operational.
</t>

<t>
Apart from mapping authority discovery, ARK clients and servers subject
themselves to all the risks that accompany normal operation of the
protocols underlying mapping services (e.g., HTTP, Z39.50).  As
specializations of such protocols, an ARK service may limit exposure to
the usual risks.  Indeed, ARK services may enhance a kind of security
by helping users identify long-term reliable references to information
objects.
</t>

</section>

</section>

</middle>

<back>

 <references>

 <reference anchor="ANVL"
   target="https://n2t.net/ark:/13030/c7x921j3h">
  <front>
    <title>A Name-Value Language </title>
    <author initials="J." surname="Kunze" fullname="John Kunze" />
    <author initials="B." surname="Kahle" fullname="Brewster Kahle" />
    <author initials="J." surname="Masanes" fullname="Julien Masanes" />
    <author initials="G." surname="Mohr" fullname="Gordon Mohr" />
    <date month="" year="2005" />
  </front>
  <format type="HTML"
   target="https://n2t.net/ark:/13030/c7x921j3h" />
 </reference>

 <reference anchor="ARK"
   target="https://n2t.net/ark:/13030/c7n00zt1z">
  <front>
    <title>Towards Electronic Persistence Using ARK Identifiers</title>
    <author initials="J." surname="Kunze" fullname="John Kunze" />
    <date month="August" year="2003" />
  </front>
  <seriesInfo name="IWAW/ECDL Annual Workshop Proceedings" value="3rd" />
  <format type="PDF"
   target="https://n2t.net/ark:/13030/c7n00zt1z" />
 </reference>

 <reference anchor="ARKagency"
   target="https://arks.org">
  <front>
    <title>ARK Maintenance Agency</title>
    <author fullname="ARK Alliance" />
    <date year="2021" />
  </front>
  <format type="HTML"
   target="https://arks.org" />
 </reference>

 <reference anchor="DCKernel"
   target="https://dublincore.org/groups/kernel/">
  <front>
    <title>Kernel Metadata Working Group</title>
    <author surname="DCMI" fullname="Dublin Core Metadata Initiative" />
    <date month="" year="2001-2008" />
  </front>
  <format type="HTML"
   target="https://dublincore.org/groups/kernel/" />
 </reference>

 <reference anchor="DOI"
   target="https://dx.doi.org/10.1000/203">
  <front>
    <title>The Digital Object Identifier (DOI) System</title>
    <author surname="IDF" fullname="International DOI Foundation" />
    <date month="February" year="2001" />
  </front>
  <format type="HTML"
   target="https://dx.doi.org/10.1000/203" />
 </reference>

 <reference anchor="ERC"
   target="https://n2t.net/ark:/13030/c7sn0141m">
  <front>
    <title>Kernel Metadata and Electronic Resource Citations</title>
    <author initials="J." surname="Kunze" fullname="John Kunze" />
    <author initials="A." surname="Turner" fullname="Adrian Turner" />
    <date month="October" year="2007" />
  </front>
  <format type="HTML"
   target="https://n2t.net/ark:/13030/c7sn0141m" />
 </reference>

 <reference anchor="Handle"
   target="https://eric.ed.gov/?id=ED450775">
  <front>
    <title>Handle System Overview</title>
    <author initials="L." surname="Lannom" fullname="Larry Lannom" />
    <date month="April" year="1999" />
  </front>
  <seriesInfo name="ICSTI Forum" value="No. 30" />
  <format type="HTML"
   target="https://eric.ed.gov/?id=ED450775" />
 </reference>

 <reference anchor="Kernel"
   target="https://n2t.net/ark:/13030/c7rr1pm49">
  <front>
    <title>A Metadata Kernel for Electronic Permanence</title>
    <author initials="J." surname="Kunze" fullname="John Kunze" />
    <date month="January" year="2002" />
  </front>
  <seriesInfo name="Journal of Digital Information" value="Vol 2, Issue 2" />
  <seriesInfo name="ISSN" value="1368-7506" />
  <format type="PDF"
   target="https://n2t.net/ark:/13030/c7rr1pm49" />
 </reference>

 <reference anchor="N2T"
   target="https://n2t.net">
  <front>
    <title>Name-to-Thing Resolver</title>
    <author surname="ARK Alliance" fullname="ARK Alliance" />
    <date month="August" year="2006" />
  </front>
  <format type="HTML"
   target="https://n2t.net" />
 </reference>

 <reference anchor="NAANregistry"
   target="https://arks.org/e/pub/naan_registry.txt">
  <front>
    <title>NAAN Registry</title>
    <author fullname="ARKs.org" />
    <date year="2019" />
  </front>
  <format type="TXT"
   target="https://n2t.net/e/pub/naan_registry.txt" />
 </reference>

 <reference anchor="NAANrequest"
   target="https://n2t.net/e/naan_request">
  <front>
    <title>NAAN Request Form</title>
    <author fullname="ARKs.org" />
    <date year="2018" />
  </front>
  <format type="HTML"
   target="https://n2t.net/e/naan_request" />
 </reference>

 <!-- XXX broken link
   target="http://www.arl.org/newsltr/212/nlm.html">
    <date month="October" year="2000" />
    <title>Defining NLM's Commitment to the Permanence of Electronic Information</title>
    <author initials="M." surname="Byrnes" fullname="Margaret Byrnes" />
    <date month="October" year="2000" />
  </front>
  <seriesInfo name="ARL" value="212:8-9" />
 -->

 <reference anchor="NLMPerm"
   target="https://www.nlm.nih.gov/pubs/techbull/ma05/ma05_archive.html">
  <front>
    <title>Permanence Levels and the Archives for NLM's Permanent Web Documents</title>
    <author initials="M." surname="Byrnes" fullname="Margaret Byrnes" />
    <date month="March" year="2005" />
  </front>
  <format type="HTML"
   target="https://www.nlm.nih.gov/pubs/techbull/ma05/ma05_archive.html" />
 </reference>

 <reference anchor="NOID"
   target="https://metacpan.org/pod/distribution/Noid/noid">
  <front>
    <title>Nice Opaque Identifiers</title>
    <author initials="J." surname="Kunze" fullname="John Kunze" />
    <date month="April" year="2006" />
  </front>
  <format type="HTML"
   target="https://metacpan.org/pod/distribution/Noid/noid" />
 </reference>

 <reference anchor="PStatements"
   target="https://n2t.net/ark:/13030/c7833mx7t">
  <front>
    <title>Persistence statements: describing digital stickiness</title>
    <author initials="J." surname="Kunze" fullname="J. Kunze, et al" />
    <date month="October" year="2016" />
  </front>
  <format type="HTML"
   target="https://n2t.net/ark:/13030/c7833mx7t" />
 </reference>


<!-- XXX broken target="http://purl.oclc.org/OCLC/PURL/INET96" -->
 <reference anchor="PURL"
   target="https://www.internetsociety.org/inet96/proceedings/a4/a4_1.htm">
  <front>
    <title>Introduction to Persistent Uniform Resource Locators</title>
    <author initials="K." surname="Shafer" fullname="K. Shafer, et al" />
    <date month="" year="1996" />
  </front>
  <format type="HTML"
   target="https://www.internetsociety.org/inet96/proceedings/a4/a4_1.htm" />
 </reference>

 &rfc0854; <!-- telnet -->
 &rfc1034; <!-- DNS -->
 <!-- &rfc1321; --> <!-- MD5 -->
 &rfc2141; <!-- R. Moats, "URN Syntax", RFC 2141, May 1997 -->
 &rfc2288; <!-- C. Lynch, et al, "Using Existing Bibliographic Identifiers as Uniform Resource Names", RFC 2288, February 1998. -->
 &rfc2611; <!-- URN Namespaces -->
 &rfc2616; <!-- HTTP -->
 &rfc2822; <!-- email headers -->
 &rfc2915; <!-- NAPTR -->
 &rfc3986; <!-- URI -->
 &rfc5013; <!-- Dublin Core -->

<!--
 <reference anchor="TEMPER"
   target="http://www.cdlib.org/inside/diglib/ark/temperspec.pdf">
  <front>
    <title>Temporal Enumerated Ranges</title>
    <author initials="J." surname="Kunze" fullname="John Kunze" />
    <date month="" year="2008" />
  </front>
  <format type="PDF"
   target="http://www.cdlib.org/inside/diglib/ark/temperspec.pdf" />
 </reference>
-->

<!-- XXX link to expired draft-->
 <reference anchor="THUMP"
   target="https://www.ietf.org/archive/id/draft-kunze-thump-03.txt">
  <front>
    <title>The HTTP URL Mapping Protocol</title>
    <author initials="K." surname="Gamiel" fullname="Kevin Gamiel" />
    <author initials="J." surname="Kunze" fullname="John Kunze" />
    <date month="August" year="2007" />
  </front>
  <format type="PDF"
   target="https://www.ietf.org/archive/id/draft-kunze-thump-03.txt" />
 </reference>

 </references>

<section anchor="agency" title="ARK Maintenance Agency: arks.org">

<t>
The ARK Maintenance Agency <xref target="ARKagency"/> at arks.org has several
functions.

<list style="symbols">
<t>
To manage the registry of organizations that will be assigning ARKs.
Organizations can request or update a NAAN by filling out a form
<xref target="NAANrequest"/>.
</t>
<t>
To be a clearinghouse for information about ARKs, such as best practices,
introductory documentation, tutorials, community forums, etc. These
supplemental resources help ARK implementor in high-level applications across
different sectors and disciplines, and with a variety of metadata standards.
</t>
<t>
To be a locus of discussion about future versions of the ARK specification.
</t>
</list>
</t>

</section>

<section title="Looking up NMAs Distributed via DNS">

<t>
This subsection introduces an older method for looking up NMAs that is
based on the method for discovering URN resolvers described in <xref
target="RFC2915"/>.  It
relies on querying the DNS system already installed in the background
infrastructure of most networked computers.  A query is submitted to DNS
asking for a list of resolvers that match a given NAAN.  DNS distributes
the query to the particular DNS servers that can best provide the answer,
unless the answer can be found more quickly in a local DNS cache as a
side-effect of a recent query.  Responses come back inside Name Authority
Pointer (NAPTR) records.  The normal result is one or more candidate NMAs.
</t>

<t>
In its full generality the <xref target="RFC2915"/> algorithm ambitiously accommodates
a complex set of preferences, orderings, protocols, mapping services,
regular expression rewriting rules, and DNS record types.  This subsection
proposes a drastic simplification of it for the special case of ARK
mapping authority discovery.  The simplified algorithm is called Maptr.
It uses only one DNS record type (NAPTR) and restricts most of its field
values to constants.  The following hypothetical excerpt from a DNS
data file for the NAAN known as 12026 shows three example NAPTR records
ready to use with the Maptr algorithm.
</t>
<figure>
 <artwork>
  12026.ark.arpa.
  ;; US Library of Congress
  ;;       order pref flags service regexp  replacement
   IN NAPTR  0     0   "h"  "ark"   "USLC"  lhc.nlm.nih.gov:8080
   IN NAPTR  0     0   "h"  "ark"   "USLC"  foobar.zaf.org
   IN NAPTR  0     0   "h"  "ark"   "USLC"  sneezy.dopey.com </artwork>
</figure>
<!--
.\" XXX define what the regexp code is used for!! if at all!!
-->
<t>
All the fields are held constant for Maptr except for the "flags", "regexp",
and "replacement" fields.  The "service" field contains the constant value
"ark" so that NAPTR records participating in the Maptr algorithm will not
be confused with other NAPTR records.  The "order" and "pref" fields are
held to 0 (zero) and otherwise ignored for now; the algorithm may evolve
to use these fields for ranking decisions when usage patterns and local
administrative needs are better understood.
</t>

<t>
When a Maptr query returns a record with a flags field of "h" (for
host, a Maptr extension to the NAPTR flags), the replacement field
contains the NMA (host) of an ARK service provider.  When a query
returns a record with a flags field of "" (the empty string), the client
needs to submit a new query containing the domain name found in the
replacement field.  This second sort of record exploits the distributed
nature of DNS by redirecting the query to another domain name.  It looks
like this.
</t>
<figure>
 <artwork>
  12345.ark.arpa.
  ;; Digital Library Consortium
  ;;       order pref flags service regexp replacement
   IN NAPTR  0     0    ""  "ark"     ""   dlc.spct.org. </artwork>
</figure>
<t>
Here is the Maptr algorithm for ARK mapping authority discovery.
In it replace &lt;NAAN&gt; with the NAAN from the ARK for which an NMA
is sought.
</t>

<t>
<list style="numbers">
<t>
Initialize the DNS query:  type=NAPTR, query=&lt;NAAN&gt;.ark.arpa.
</t>

<t>
Submit the query to DNS and retrieve (NAPTR) records, discarding any
record that does not have "ark" for the service field.
</t>

<t>
All remaining records with a flags fields of "h" contain candidate
NMAs in their replacement fields.  Set them aside, if any.
</t>

<t>
Any record with an empty flags field ("") has a replacement field
containing a new domain name to which a subsequent query should be
redirected.  For each such record, set query=&lt;replacement&gt; then go to
step (2).  When all such records have been recursively exhausted,
go to step (5).
</t>

<t>
All redirected queries have been resolved and a set of candidate
NMAs has been accumulated from steps (3).  If there are zero NMAs,
exit &mdash; no mapping authority was found.  If there is one or more NMA,
choose one using any criteria you wish, then exit.
</t>
</list>
</t>

<t>
A Perl script that implements this algorithm is included here.
</t>
<!--
.\" XXX untested!  doesn't use regexp field!
-->
<figure>
 <artwork>
#!/depot/bin/perl

use Net::DNS;                           # include simple DNS package
my $qtype = "NAPTR";                    # initialize query type
my $naa = shift;                        # get NAAN script argument
my $mad = new Net::DNS::Resolver;       # mapping authority discovery

&amp;maptr("$naa.ark.arpa");                # call maptr - that's it

sub maptr {                             # recursive maptr algorithm
        my $dname = shift;              # domain name as argument
        my ($rr, $order, $pref, $flags, $service, $regexp,
                $replacement);
        my $query = $mad-&gt;query($dname, $qtype);
        return                          # non-productive query
                if (! $query || ! $query->answer);
        foreach $rr ($query-&gt;answer) {
                next                    # skip records of wrong type
                        if ($rr-&gt;type ne $qtype);
                ($order, $pref, $flags, $service, $regexp,
                        $replacement) = split(/\s/, $rr-&gt;rdatastr);
                if ($flags eq "") {
                        &amp;maptr($replacement);   # recurse
                } elsif ($flags eq "h") {
                        print "$replacement\n"; # candidate NMA
                }
        }
} </artwork>
</figure>
<t>
The global database thus distributed via DNS and the Maptr algorithm
can easily be seen to mirror the contents of the Name Authority Table
file described in the previous section.
</t>

</section>

</back>

</rfc>
