Public and Publishing
Overview
NAN distinguishes between public and published as described below. Note that all published datasets are also public and that public datasets are viewable by everyone, even those who are not authenticated with an NMRhub account.
Public Datasets
A dataset marked as public is viewable through the data browser by anyone, including users without a NAN account. Authorized users may continue to edit public datasets.
- Users with proper permissions may manually mark datasets as public at any time.
- By policy, datasets are automatically made public three years after archival.
- Within six months of the scheduled public release date, users may opt to extend the release by one additional year.
- The public release date is displayed in the data browser and can be used for sorting and filtering.
Published Datasets
Publishing a dataset performs the following:
- Makes the dataset public
- Creates an immutable, versioned snapshot of the dataset
- Assigns an ARK persistent unique identifier (PUI)
The published version includes:
- All dataset files
- Metadata and database records
- Associated samples
- Any supplemental data
The original (parent) dataset remains fully editable. A reference is maintained between the parent and all published versions. If a user requests to publish a dataset that has not changed since the last publication, a new version is not created.
Publishing Dataset Collections
Dataset collections may also be published. When a collection is published:
- Each dataset within the collection is individually published following the same process
- The collection itself receives a dedicated ARK identifier
- The ARK resolves to the published collection view in the data browser
Versioning and Provenance
- When re-publishing a dataset a new version number is created (e.g. V1, V2, V3)
- If no changes to the dataset have been made since it was last published a new version is not created.
- All changes to datasets are tracked in a provenance record.
- All published versions and the original dataset are linked together so that users can see all versions of a given dataset
ARK Identifiers
- We issue ARK identifiers as our persistent identifier. ARKs behave similarly to DOIs. Documentation on how to interpret an ARK identifier is available here.
- The key point to know is that you should provide the entire string when you copy an ARK identifier on the NAN web site. In other words, don't strip off the domain (technically the "name mapping authority") from the ARK when publishing.
- An example full ARK: https://usnan.org/ark:/83454/c16be2f687-3f23-4986-9181-d1fd71cffc56
- Note that it is possible to track down a resource with just the `ark:/83454/c16be2f687-3f23-4986-9181-d1fd71cffc56` portion of the identifier, but this requires extra manual steps.