Arbitrary Dataset Upload

From Network for Advanced NMR
Revision as of 17:05, 10 June 2025 by Mmaciejewski (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Arbitrary Data Upload

While NDTS provides automated data harvesting from NAN-connected instruments, there are cases where data must be uploaded manually. The Arbitrary Data Upload Tool supports this need, offering a structured workflow for uploading datasets that were not harvested in real time.

This tool is useful when:

  • Automatic NDTS harvesting was unintentionally disabled
  • Data was collected prior to instrument connection to NAN
  • Legacy datasets are stored on spectrometer workstations or lab computers

Limitations of NDTS Manual Harvesting

The NDTS GUI includes a manual upload feature for harvesting individual experiments. However, this feature is not designed for bulk uploads or complex directory structures. The Arbitrary Data Upload Tool addresses this limitation by enabling users to submit archives containing multiple datasets.

Upload Workflow

The Arbitrary Data Upload process consists of the following steps:

  1. Users create a `.tar` or `.zip` archive containing the data directory. The archive may include nested subdirectories.
  2. The archive file is uploaded to the NAN portal using a user's NAN account.
  3. A background service on the portal scans the uploaded file and identifies valid experimental datasets.
  4. Identified datasets are displayed in a staging table, which resembles the standard Data Browser. Users may fill in required metadata fields and edit existing values.
  5. Once datasets are prepared, users select which ones to finalize and upload to the NAN archive.
  6. Selected datasets are parsed and fully integrated into the archive.

Features and Safeguards

  • The staging area is cached, allowing users to pause and return later. Sessions persist across logins.
  • A full history of all arbitrary uploads is maintained per user.
  • The system detects duplicates based on unique identifiers and alerts users to prevent redundant entries.
  • Datasets that do not meet minimum completeness standards—such as missing critical files or metadata—are rejected.
  • Only datasets from instruments registered in the NAN portal are accepted.
  • Successfully uploaded datasets are marked with a Transfer Mode of `arbitrary`.

This workflow ensures consistency, quality control, and traceability of data manually added to the NAN archive.