6 Qualified identifiers
Qualifiers
A qualified, or full, SWHID is composed of a core SWHID identifier, and a sequence of qualifiers.
Qualifiers may be:
- fragment qualifiers (see 6.1), that identify subparts of a software artifact; or
- context qualifiers (see 6.2), that provide additional context on the software artifact.
Each qualifier is specified as a key-value pair, using an =
character as a separator.
Qualifiers are separated from the core identifier and from each other by using a ;
character.
Some qualifiers are valid for specific object types, and the validity of some qualifiers depends on the presence of other qualifiers. Conformant implementation MUST not generate invalid qualifiers or qualifier combinations and MUST ignore them if present, as detailed in the following sections.
6.1 Fragment qualifiers
There are two fragment qualifiers, lines
and bytes
.
Each fragment qualifier MUST appear at most once.
Fragment qualifiers are only valid for objects of type content.
Each valid SWHID must have at most one fragment qualifier.
A conformant implementation MAY accept a SWHID that violates this constraint,
by ignoring the lines
qualifier when the bytes
qualifier is present.
6.1.1 Lines qualifier
A "line" in the context of a file content refers to a sequence of characters that ends with a line break. This line can contain text, code, or any other form of data. In this specification, the line break is the ASCII LF character.
The "lines" qualifier allows to designate a line range inside a content.
The range can be a single line number, or a pair of line numbers separated by the ASCII -
character.
Line numbers start from 1, and the range is inclusive, that is, the fragment includes both the lines numbered as the start and end of the range.
For example, swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;lines=9-15
designates the function generate_input_stream
that is found at lines 9 to 15 of the content with core SWHID swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b
.
Notice that the notion of "line number" is not always meaningful: the content may be a binary file, or a file that uses non standard line termination character(s).
6.1.2 Bytes qualifier
To overcome the limitations of the lines qualifier, the bytes qualifier allows
designation of a byte range inside a content. The range can be a single byte number, or a pair of byte numbers separated by -
.
Byte numbers start from 0, and the range is inclusive, that is, the fragment includes both the bytes numbered as the start and end of the range.
If the range is a single byte number, it designates the byte at that specific position.
For example, swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;bytes=154-315
designates the same function generate_input_stream
as in the example above, but
does not rely on any convention about line numbers.
6.2 Context qualifiers
There are four context qualifiers, origin
, visit
, path
and anchor
.
Each context qualifier MUST appear at most once.
6.2.1 Origin qualifier
This qualifier allows declaration of the software origin where the object has been found or observed, as an URI.
For example, swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git
indicates that the content seen previously with the function generate_input_stream
has
been seen in the Git repository at https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git
This qualifier may help to get hold of the full repository where a content has been found, but there is no guarantee of success, as an origin can change or disappear over time (as is the case in the example above, since gitorious.org was shut down in 2015).
6.2.2 Visit qualifier
This qualifier allows addition of the core SWHID identifier of the snapshot of the repository where the object has been found or observed.
For example, swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9
indicates that the content seen previously with the function generate_input_stream
has
been seen in the Git repository at https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git
, when
its full state had the SWHID core identifier swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9
.
This qualifier is only valid when the origin
qualifier is also present.
Otherwise, it MUST be ignored.
6.2.3 Path qualifier
This qualifier allows declaration of the absolute file path, from the root
directory associated to the anchor node, to the object designated by the core
SWHID identifier; when the anchor denotes a directory, a revision or a release,
the root directory is uniquely determined; when the anchor denotes a snapshot,
the root directory is the first directory reachable from the HEAD
branch,
and undefined if such a reference is missing.
For example, swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9;anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0;path=/Examples/SimpleFarm/simplefarm.ml
indicates that the content seen previously with the function generate_input_stream
has
been seen in the Git repository at https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git
, when
its full state had the SWHID core identifier swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9
, and that it is named simplefarm.ml
in the directory Simplefarm
contained in the directory Examples
contained in the root directory associated to the revision with core SWHID swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0
.
This qualifier is only valid when the object type is not content. Otherwise, it MUST be ignored.
6.2.4 Anchor qualifier
This qualifier is used in conjunction with the path
qualifier.
It allows identification of a node in the Merkle DAG relative to which
a path to the object is specified, as the core identifier of a directory,
a revision, a release or a snapshot. See the example provided for the
path
qualifier.
This qualifier is only valid when the path
qualifier is also present.
Otherwise, it MUST be ignored.
6.3 Comparing qualified SWHIDs
One can determine whether two software artifacts are identical (bit by bit) by comparing their core SWHIDs, ignoring all qualifiers. If the core SWHIDs are equal, the software artifacts they represent are identical.
To determine if two SWHIDs represent the same software artifact (or fragment thereof) in the same context, one must also compare their qualifiers. Two SWHIDs are considered equivalent in context if:
- They both have the same set of qualifiers.
- The values of these qualifiers are identical. For instance, if both SWHIDs
have an
anchor
qualifier, the core SWHID values of these qualifiers are identical. Similarly, if both have alines
qualifier, their values are identical.
Note that the order of the qualifiers does not matter for comparison purposes.
6.4 Recommendations
We recommend equipping identifiers meant for sharing with as many
qualifiers as possible. While qualifiers may be listed in any order, it
is good practice to present them in the following canonical order:
origin
, visit
, anchor
, path
, lines
or bytes
.
By adhering to this order, it becomes easier to visually inspect and compare SWHIDs, especially when dealing with a large number of identifiers.
Redundant information should be omitted: for example, if the visit is present, and the path is relative to the snapshot indicated there, then the anchor qualifier is superfluous; similarly, if the path is empty, it may be omitted.