Public and Publishing: Difference between revisions
Mmaciejewski (talk | contribs) Created page with " = Overview = NAN distinguishes between "public" and "published" datasets to balance accessibility with data integrity and provenance. == Public Datasets == A dataset marked as '''public''' is viewable through the data browser by anyone, including users without a NAN account, provided they have read permissions. Authorized users may continue to edit public datasets. * Users may manually mark datasets as public at any time. * By policy, datasets are automatically made p..." |
m →ARK Identifiers: Typo fix |
||
| (6 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
{{Datasets}} | |||
= Overview = | = Overview = | ||
NAN distinguishes between | NAN distinguishes between public and published as described below. Note that all published datasets are also public and that public datasets are viewable by everyone, even those who are not authenticated with an NMRhub account. | ||
== Public Datasets == | == Public Datasets == | ||
A dataset marked as '''public''' is viewable through the data browser by anyone, including users without a NAN account | A dataset marked as '''public''' is viewable through the data browser by anyone, including users without a NAN account. Authorized users may continue to edit public datasets. | ||
* Users may manually mark datasets as public at any time. | * Users with proper permissions may manually mark datasets as public at any time. | ||
* By policy, datasets are automatically made public three years after archival. | * By policy, datasets are automatically made public three years after archival. | ||
* Within six months of the scheduled public release date, users may opt to extend the release by one additional year. | * Within six months of the scheduled public release date, users may opt to extend the release by one additional year. | ||
* The public release date is displayed in the data browser and can be used for sorting. | * The public release date is displayed in the data browser and can be used for sorting and filtering. | ||
== Published Datasets == | == Published Datasets == | ||
Publishing a dataset performs the following: | '''Publishing''' a dataset performs the following: | ||
* Makes the dataset public | * Makes the dataset public | ||
* Creates an immutable, versioned snapshot of the dataset | * Creates an immutable, versioned snapshot of the dataset | ||
* Assigns an ARK persistent identifier | * Assigns an ARK persistent unique identifier (PUI) | ||
The published version includes: | The published version includes: | ||
| Line 36: | Line 37: | ||
== Versioning and Provenance == | == Versioning and Provenance == | ||
* | * When re-publishing a dataset a new version number is created (e.g. V1, V2, V3) | ||
** If no changes to the dataset have been made since it was last published a new version is not created. | |||
* All changes to datasets are tracked in a provenance record. | * All changes to datasets are tracked in a provenance record. | ||
* All published versions and the original dataset are linked together so that users can see all versions of a given dataset | |||
== ARK Identifiers == | |||
* We issue ARK identifiers as our persistent identifier. ARKs behave similarly to DOIs. Documentation on how to interpret an ARK identifier is available [https://arks.org/about/ here]. | |||
* The key point to know is that you should provide the entire string when you copy an ARK identifier on the NAN web site. In other words, don't strip off the domain (technically the "name mapping authority") from the ARK when publishing. | |||
** An example full ARK: https://usnan.org/ark:/83454/c16be2f687-3f23-4986-9181-d1fd71cffc56 | |||
* Note that it is possible to track down a resource with just the `ark:/83454/c16be2f687-3f23-4986-9181-d1fd71cffc56` portion of the identifier, but this requires extra manual steps. | |||
** For example, you can use <nowiki>https://n2t.net/</nowiki> as the "name mapping authority" to resolve these ARK fragments ([https://n2t.net/ark:/83454/c16be2f687-3f23-4986-9181-d1fd71cffc56 example]) as well as DOIs ([https://n2t.net/doi:10.3390/s23218689 example]) and other identifiers. | |||
Latest revision as of 20:39, 7 January 2026
Overview
NAN distinguishes between public and published as described below. Note that all published datasets are also public and that public datasets are viewable by everyone, even those who are not authenticated with an NMRhub account.
Public Datasets
A dataset marked as public is viewable through the data browser by anyone, including users without a NAN account. Authorized users may continue to edit public datasets.
- Users with proper permissions may manually mark datasets as public at any time.
- By policy, datasets are automatically made public three years after archival.
- Within six months of the scheduled public release date, users may opt to extend the release by one additional year.
- The public release date is displayed in the data browser and can be used for sorting and filtering.
Published Datasets
Publishing a dataset performs the following:
- Makes the dataset public
- Creates an immutable, versioned snapshot of the dataset
- Assigns an ARK persistent unique identifier (PUI)
The published version includes:
- All dataset files
- Metadata and database records
- Associated samples
- Any supplemental data
The original (parent) dataset remains fully editable. A reference is maintained between the parent and all published versions. If a user requests to publish a dataset that has not changed since the last publication, a new version is not created.
Publishing Dataset Collections
Dataset collections may also be published. When a collection is published:
- Each dataset within the collection is individually published following the same process
- The collection itself receives a dedicated ARK identifier
- The ARK resolves to the published collection view in the data browser
Versioning and Provenance
- When re-publishing a dataset a new version number is created (e.g. V1, V2, V3)
- If no changes to the dataset have been made since it was last published a new version is not created.
- All changes to datasets are tracked in a provenance record.
- All published versions and the original dataset are linked together so that users can see all versions of a given dataset
ARK Identifiers
- We issue ARK identifiers as our persistent identifier. ARKs behave similarly to DOIs. Documentation on how to interpret an ARK identifier is available here.
- The key point to know is that you should provide the entire string when you copy an ARK identifier on the NAN web site. In other words, don't strip off the domain (technically the "name mapping authority") from the ARK when publishing.
- An example full ARK: https://usnan.org/ark:/83454/c16be2f687-3f23-4986-9181-d1fd71cffc56
- Note that it is possible to track down a resource with just the `ark:/83454/c16be2f687-3f23-4986-9181-d1fd71cffc56` portion of the identifier, but this requires extra manual steps.