Public and Publishing: Difference between revisions

From Network for Advanced NMR
Jump to navigationJump to search
Created page with " = Overview = NAN distinguishes between "public" and "published" datasets to balance accessibility with data integrity and provenance. == Public Datasets == A dataset marked as '''public''' is viewable through the data browser by anyone, including users without a NAN account, provided they have read permissions. Authorized users may continue to edit public datasets. * Users may manually mark datasets as public at any time. * By policy, datasets are automatically made p..."
 
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Datasets}}


= Overview =
= Overview =
NAN distinguishes between "public" and "published" datasets to balance accessibility with data integrity and provenance.
NAN distinguishes between public and published as described below. Note that all published datasets are also public and that public datasets are viewable by everyone, even those who are not authenticated with an NMRhub account.


== Public Datasets ==
== Public Datasets ==
A dataset marked as '''public''' is viewable through the data browser by anyone, including users without a NAN account, provided they have read permissions. Authorized users may continue to edit public datasets.
A dataset marked as '''public''' is viewable through the data browser by anyone, including users without a NAN account. Authorized users may continue to edit public datasets.


* Users may manually mark datasets as public at any time.
* Users with proper permissions may manually mark datasets as public at any time.
* By policy, datasets are automatically made public three years after archival.
* By policy, datasets are automatically made public three years after archival.
* Within six months of the scheduled public release date, users may opt to extend the release by one additional year.
* Within six months of the scheduled public release date, users may opt to extend the release by one additional year.
* The public release date is displayed in the data browser and can be used for sorting.
* The public release date is displayed in the data browser and can be used for sorting and filtering.


== Published Datasets ==
== Published Datasets ==
Publishing a dataset performs the following:
'''Publishing''' a dataset performs the following:


* Makes the dataset public
* Makes the dataset public
Line 36: Line 37:
== Versioning and Provenance ==
== Versioning and Provenance ==


* Each published dataset receives a new version number.
* When re-publishing a dataset a new version number is created (e.g. V1, V2, V3)
** If no changes to the dataset have been made since it was last published a new version is not created.
* All changes to datasets are tracked in a provenance record.
* All changes to datasets are tracked in a provenance record.
* All published versions and the original dataset are linked together so that users can see all versions of a given dataset

Latest revision as of 14:24, 13 June 2025

Overview

NAN distinguishes between public and published as described below. Note that all published datasets are also public and that public datasets are viewable by everyone, even those who are not authenticated with an NMRhub account.

Public Datasets

A dataset marked as public is viewable through the data browser by anyone, including users without a NAN account. Authorized users may continue to edit public datasets.

  • Users with proper permissions may manually mark datasets as public at any time.
  • By policy, datasets are automatically made public three years after archival.
  • Within six months of the scheduled public release date, users may opt to extend the release by one additional year.
  • The public release date is displayed in the data browser and can be used for sorting and filtering.

Published Datasets

Publishing a dataset performs the following:

  • Makes the dataset public
  • Creates an immutable, versioned snapshot of the dataset
  • Assigns an ARK persistent identifier

The published version includes:

  • All dataset files
  • Metadata and database records
  • Associated samples
  • Any supplemental data

The original (parent) dataset remains fully editable. A reference is maintained between the parent and all published versions. If a user requests to publish a dataset that has not changed since the last publication, a new version is not created.

Publishing Dataset Collections

Dataset collections may also be published. When a collection is published:

  • Each dataset within the collection is individually published following the same process
  • The collection itself receives a dedicated ARK identifier
  • The ARK resolves to the published collection view in the data browser

Versioning and Provenance

  • When re-publishing a dataset a new version number is created (e.g. V1, V2, V3)
    • If no changes to the dataset have been made since it was last published a new version is not created.
  • All changes to datasets are tracked in a provenance record.
  • All published versions and the original dataset are linked together so that users can see all versions of a given dataset