Public and Publishing: Difference between revisions

From Network for Advanced NMR
Jump to navigationJump to search
No edit summary
m ARK Identifiers: Typo fix
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<span style="display:inline-block; margin-bottom:1em;">[[Datasets#Actions|← Back to Dataset Browser --> Actions]]</span>
{{Datasets}}


= Overview =
= Overview =
NAN distinguishes between "public" and "published" datasets to balance accessibility with data integrity and provenance.
NAN distinguishes between public and published as described below. Note that all published datasets are also public and that public datasets are viewable by everyone, even those who are not authenticated with an NMRhub account.


== Public Datasets ==
== Public Datasets ==
Line 17: Line 17:
* Makes the dataset public
* Makes the dataset public
* Creates an immutable, versioned snapshot of the dataset
* Creates an immutable, versioned snapshot of the dataset
* Assigns an ARK persistent identifier
* Assigns an ARK persistent unique identifier (PUI)


The published version includes:
The published version includes:
Line 41: Line 41:
* All changes to datasets are tracked in a provenance record.
* All changes to datasets are tracked in a provenance record.
* All published versions and the original dataset are linked together so that users can see all versions of a given dataset
* All published versions and the original dataset are linked together so that users can see all versions of a given dataset
== ARK Identifiers ==
* We issue ARK identifiers as our persistent identifier. ARKs behave similarly to DOIs. Documentation on how to interpret an ARK identifier is available [https://arks.org/about/ here].
* The key point to know is that you should provide the entire string when you copy an ARK identifier on the NAN web site. In other words, don't strip off the domain (technically the "name mapping authority") from the ARK when publishing.
** An example full ARK: https://usnan.org/ark:/83454/c16be2f687-3f23-4986-9181-d1fd71cffc56
* Note that it is possible to track down a resource with just the `ark:/83454/c16be2f687-3f23-4986-9181-d1fd71cffc56` portion of the identifier, but this requires extra manual steps.
** For example, you can use <nowiki>https://n2t.net/</nowiki> as the "name mapping authority" to resolve these ARK fragments ([https://n2t.net/ark:/83454/c16be2f687-3f23-4986-9181-d1fd71cffc56 example]) as well as DOIs ([https://n2t.net/doi:10.3390/s23218689 example]) and other identifiers.

Latest revision as of 20:39, 7 January 2026

Overview

NAN distinguishes between public and published as described below. Note that all published datasets are also public and that public datasets are viewable by everyone, even those who are not authenticated with an NMRhub account.

Public Datasets

A dataset marked as public is viewable through the data browser by anyone, including users without a NAN account. Authorized users may continue to edit public datasets.

  • Users with proper permissions may manually mark datasets as public at any time.
  • By policy, datasets are automatically made public three years after archival.
  • Within six months of the scheduled public release date, users may opt to extend the release by one additional year.
  • The public release date is displayed in the data browser and can be used for sorting and filtering.

Published Datasets

Publishing a dataset performs the following:

  • Makes the dataset public
  • Creates an immutable, versioned snapshot of the dataset
  • Assigns an ARK persistent unique identifier (PUI)

The published version includes:

  • All dataset files
  • Metadata and database records
  • Associated samples
  • Any supplemental data

The original (parent) dataset remains fully editable. A reference is maintained between the parent and all published versions. If a user requests to publish a dataset that has not changed since the last publication, a new version is not created.

Publishing Dataset Collections

Dataset collections may also be published. When a collection is published:

  • Each dataset within the collection is individually published following the same process
  • The collection itself receives a dedicated ARK identifier
  • The ARK resolves to the published collection view in the data browser

Versioning and Provenance

  • When re-publishing a dataset a new version number is created (e.g. V1, V2, V3)
    • If no changes to the dataset have been made since it was last published a new version is not created.
  • All changes to datasets are tracked in a provenance record.
  • All published versions and the original dataset are linked together so that users can see all versions of a given dataset

ARK Identifiers

  • We issue ARK identifiers as our persistent identifier. ARKs behave similarly to DOIs. Documentation on how to interpret an ARK identifier is available here.
  • The key point to know is that you should provide the entire string when you copy an ARK identifier on the NAN web site. In other words, don't strip off the domain (technically the "name mapping authority") from the ARK when publishing.
  • Note that it is possible to track down a resource with just the `ark:/83454/c16be2f687-3f23-4986-9181-d1fd71cffc56` portion of the identifier, but this requires extra manual steps.
    • For example, you can use https://n2t.net/ as the "name mapping authority" to resolve these ARK fragments (example) as well as DOIs (example) and other identifiers.