4 Syntax

A SWHID consists of two separate parts: a mandatory core_identifier that can identify any software artifact, and an optional list of qualifiers that allows specification of the context where the object is meant to be seen and that points to a subpart of the object itself.

Syntactically, SWHIDs are generated by the <identifier> entry point in the following grammar (which uses notation defined by RFC-5234):

<identifier> ::= <core_identifier> [ <qualifiers> ] ;

<core_identifier> ::=
    "swh" ":" <scheme_version> ":" <object_type> ":" <object_id> ;
<scheme_version> ::= "1" ;
<object_type> ::=
    "snp"  (* snapshot *)
  | "rel"  (* release *)
  | "rev"  (* revision *)
  | "dir"  (* directory *)
  | "cnt"  (* content *)
  ;
<object_id> ::= 40 * <hex_digit> ; (* intrinsic id, hex-encoded *)
<dec_digit> ::=
    "0" | "1" | "2" | "3" | "4"
  | "5" | "6" | "7" | "8" | "9" ;
<hex_digit> ::=
    <dec_digit>
  | "a" | "b" | "c" | "d" | "e" | "f" ;

<qualifiers> ::= ";" <qualifier> [ <qualifiers> ] ;
<qualifier> ::=
    <context_qualifier>
  | <fragment_qualifier>
  ;
<context_qualifier> ::=
    <origin_ctxt>
  | <visit_ctxt>
  | <anchor_ctxt>
  | <path_ctxt>
  ;
<origin_ctxt> ::= "origin" "=" <url_escaped> ;
<visit_ctxt> ::= "visit" "=" <core_identifier> ;
<anchor_ctxt> ::= "anchor" "=" <core_identifier> ;
<path_ctxt> ::= "path" "=" <path_absolute_escaped> ;
<fragment_qualifier> ::= "lines" "=" <range> | "bytes" "=" <range> ;
<range> ::= <number> ["-" <number>] ;
<number> ::= <dec_digit> + ;
<url_escaped> ::= (* RFC 3987 IRI *)
<path_absolute_escaped> ::= (* RFC 3987 absolute path *)

The last two symbols are defined as:

  • <url_escaped> is an IRI as defined in RFC-3987; and
  • <path_absolute_escaped> is an ipath-absolute from RFC-3987.

In both of these, all occurrences of ; (and %, as required by the RFC) have been percent-encoded (as %3B and %25 respectively). Other characters may be percent-encoded, for example, to improve readability and/or embeddability of SWHID in other contexts.