NAN Data Transport System: Difference between revisions

From Network for Advanced NMR
Jump to navigationJump to search
 
(24 intermediate revisions by the same user not shown)
Line 1: Line 1:
== '''Overview''' ==
{{DISPLAYTITLE:NAN Data Transport System (NDTS) Overview}}
{{NDTS_Navbox}}
 
 
== Overview ==
The Network for Advanced NMR Data Transport System (NDTS) enables automated harvesting of NMR acquisition data from spectrometer workstations and delivers it securely to the NAN Repository. Facility Managers are responsible for installing and managing the local components of the system, ensuring connectivity, and supporting user access to collected data.
The Network for Advanced NMR Data Transport System (NDTS) enables automated harvesting of NMR acquisition data from spectrometer workstations and delivers it securely to the NAN Repository. Facility Managers are responsible for installing and managing the local components of the system, ensuring connectivity, and supporting user access to collected data.


== '''System Architecture and Data Flow''' ==
== NDTS Components ==
The NDTS system consists of local and central components working together to collect, transfer, store, and index NMR datasets
The NDTS system consists of local and central components working together to collect, transfer, store, and index NMR datasets


Line 8: Line 12:
! Component !! Location !! Role
! Component !! Location !! Role
|-
|-
| '''Daemon''' || Spectrometer Workstation || Detects completed experiments, associates metadata, and sends data to the Gateway.
| '''Daemon''' || Spectrometer Workstation ||
Sends heartbeat information to the Gateway
* Detects completed experiments, associates metadata, and sends data to the Gateway.
Pull user information from the Gateway
* Sends heartbeat information to the Gateway
* Pulls user information from the Gateway
|-
|-
| '''Gateway''' || Within NMR facility network || Receives data from all Daemons and relays it to the Receiver. Pulls user information from the NAN Receiver
|'''NDTS GUI'''
|Spectrometer Workstation
|
* Shows data harvesting statuses
* Allows the operator to change the NAN user and enter additional metadata
|-
|-
| '''Receiver''' || UCHC Data Center || Accepts experiment data and metadata from Gateways
| '''Gateway'''|| Within NMR facility network ||
Packages user information destined for the spectrometer workstations
* Receives data from all Daemons and relays it to the Receiver. Pulls user information from the NAN Receiver.
* The Gateway should be a dedicated computer, but has minimal requirements with regard to CPU, memory, and storage. The exception to that is if the Gateway is setup to archive all datasets as a backup it needs the storage capacity to accommodate the data or even better mounted to an external storage device.
|-
|-
| '''Parser''' || UCHC Data Center || Parses datasets to extract additional metadata, writes database entries, and stores dataset files to primary and disaster recovery storage appliances
| '''Receiver'''|| UCHC Data Center ||
* Accepts experiment data and metadata from Gateways
* Packages user information destined for the spectrometer workstations
|-
|-
| '''PostgreSQL Database''' || UCHC Data Center || Stores datasets and their structured metadata
| '''Parser'''|| UCHC Data Center ||
* Parses datasets to extract additional metadata, writes database entries, and stores dataset files to primary and disaster recovery storage appliances
|-
|-
| '''Primary Storage''' || UCHC Data Center || Stores canonical copies of all collected experimental data
| '''PostgreSQL Database'''|| UCHC Data Center ||
* Stores datasets and their structured metadata
|-
|-
| '''Disaster Recovery Storage''' || Geo-dispersed || Maintains redundant backups of all experimental data on a WORM S3 bucket
| '''Primary Storage'''|| UCHC Data Center ||
* Stores copies of all collected experimental data
|-
|-
| '''Elasticsearch Database''' || UCHC Data Center || Indexes statistics about harvested datasets and heartbeat information for visualization from the virtual NAN Operation Center (vNOC)
| '''Disaster Recovery Storage'''|| Geo-dispersed ||
* Maintains redundant backups of all experimental data on a WORM S3 bucket
|-
| '''Elasticsearch Database''' || UCHC Data Center ||
* Indexes statistics about harvested datasets and heartbeat information for visualization from the virtual NAN Operation Center (vNOC)
|}
|}


=== '''Data Flow Summary:''' ===
== Data Flow Summary ==
# A user completes an acquisition on a spectrometer.
# A user completes an acquisition on a spectrometer.
# The Daemon detects the completed experiment and sends it to the Gateway.
# The Daemon detects the completed experiment and sends it to the Gateway.
Line 39: Line 58:
Failures at any stage result in data being spooled locally and retried automatically.
Failures at any stage result in data being spooled locally and retried automatically.


== '''Facility Manager Responsibilities''' ==
== Facility Manager Responsibilities ==
Facility Managers are expected to:
Facility Managers are expected to:
* Install and configure Gateway and Daemon software
* Purchase the Gateway computer and install a modern Linux (preferably Ubuntu / Xubuntu / Mint or other Debian based OS)
* Install and configure [[NDTS Gateway Installation|Gateway]] and [[NDTS Daemon Installation|Daemon]] software
* Manage facility users through the [[Facility Dashboards|Facility Dashboard]]
* Manage facility users through the [[Facility Dashboards|Facility Dashboard]]
* Reassign “unselected” or misattributed data through the [[Datasets|Dataset Browser]]
* Reassign “unselected” or misattributed data through the [[Datasets|Dataset Browser]]
* Monitor the health of NDTS for their facility, including heartbeats, through the virtual NAN Operating Center (vNOC)
* Monitor the health of NDTS for their facility, including heartbeats, through the virtual NAN Operating Center (vNOC)


== '''User Guide Sections''' ==
== Security ==
Use the links below to access detailed instructions for each component or task:
 
* [[NDTS_Installation|Installing and Updating Daemon and Gateway]]
* [[NDTS Daemon installation components]]
* [[NDTS_Daemon_Configuration|Configuring the Daemon]]
* [[NDTS_Daemon_Operation|Running and Monitoring the Daemon]]
* [[NDTS_Workstation_UI|Using the Workstation UI]]
* [[NDTS_Experiment_Harvesting|Understanding Experiment Harvesting]]
* [[NDTS_Gateway_Operation|Gateway Operation and Management]]
* [[NDTS_Data_Access|Accessing Collected Data]]
* [[NDTS_Logging|Logging and Audit Trail]]
* [[NDTS_Appendix|Appendix and Troubleshooting]]
 
----


<span style="font-size:90%">For assistance, contact the NAN Repository administrator or submit a ticket from the Facility Dashboard.</span>
* Out-of-date operating systems on spectrometer workstations may lack modern encryption. To mitigate this risk, NDTS employs a dedicated '''Gateway''' computer between the workstations and the NDTS '''Receiver'''. The Gateway runs a current Linux distribution, and users are expected to apply security updates promptly.
* Because the Gateway resides on the same internal network as the workstations, dataset transfers from a workstation to the Gateway occur over an unencrypted channel; this local scope generally makes encryption unnecessary.
* All outbound communication originates from the Gateway; NAN datacenter services never initiate connections to facility Gateways. Transfers from the Gateway to the Receiver are fully encrypted, and mutual TLS certificates ensure the Gateway is connected to the correct Receiver. Checksums protect every transmission, and any failed transfer, either (workstation-to-Gateway or Gateway-to-Receiver) is queued locally for automatic retry.
* Upon arrival at the Receiver, each dataset is replicated across two independent storage systems. After ingestion, the data is stored redundantly in two additional locations, each offering high durability.

Latest revision as of 15:30, 25 June 2025


Overview

The Network for Advanced NMR Data Transport System (NDTS) enables automated harvesting of NMR acquisition data from spectrometer workstations and delivers it securely to the NAN Repository. Facility Managers are responsible for installing and managing the local components of the system, ensuring connectivity, and supporting user access to collected data.

NDTS Components

The NDTS system consists of local and central components working together to collect, transfer, store, and index NMR datasets

Component Location Role
Daemon Spectrometer Workstation
  • Detects completed experiments, associates metadata, and sends data to the Gateway.
  • Sends heartbeat information to the Gateway
  • Pulls user information from the Gateway
NDTS GUI Spectrometer Workstation
  • Shows data harvesting statuses
  • Allows the operator to change the NAN user and enter additional metadata
Gateway Within NMR facility network
  • Receives data from all Daemons and relays it to the Receiver. Pulls user information from the NAN Receiver.
  • The Gateway should be a dedicated computer, but has minimal requirements with regard to CPU, memory, and storage. The exception to that is if the Gateway is setup to archive all datasets as a backup it needs the storage capacity to accommodate the data or even better mounted to an external storage device.
Receiver UCHC Data Center
  • Accepts experiment data and metadata from Gateways
  • Packages user information destined for the spectrometer workstations
Parser UCHC Data Center
  • Parses datasets to extract additional metadata, writes database entries, and stores dataset files to primary and disaster recovery storage appliances
PostgreSQL Database UCHC Data Center
  • Stores datasets and their structured metadata
Primary Storage UCHC Data Center
  • Stores copies of all collected experimental data
Disaster Recovery Storage Geo-dispersed
  • Maintains redundant backups of all experimental data on a WORM S3 bucket
Elasticsearch Database UCHC Data Center
  • Indexes statistics about harvested datasets and heartbeat information for visualization from the virtual NAN Operation Center (vNOC)

Data Flow Summary

  1. A user completes an acquisition on a spectrometer.
  2. The Daemon detects the completed experiment and sends it to the Gateway.
  3. The Gateway transmits the data to the Receiver at UCHC.
  4. The Receiver accepts the data and hands it off to the Parser.
  5. The Parser extracts metadata and stores it in the PostgreSQL and Elasticsearch databases.
  6. The experiment data is stored in primary storage and backed up to disaster recovery storage.
  7. The data becomes visible in the NAN Portal (e.g., Data Browser, vNOC) within seconds.

Failures at any stage result in data being spooled locally and retried automatically.

Facility Manager Responsibilities

Facility Managers are expected to:

  • Purchase the Gateway computer and install a modern Linux (preferably Ubuntu / Xubuntu / Mint or other Debian based OS)
  • Install and configure Gateway and Daemon software
  • Manage facility users through the Facility Dashboard
  • Reassign “unselected” or misattributed data through the Dataset Browser
  • Monitor the health of NDTS for their facility, including heartbeats, through the virtual NAN Operating Center (vNOC)

Security

  • Out-of-date operating systems on spectrometer workstations may lack modern encryption. To mitigate this risk, NDTS employs a dedicated Gateway computer between the workstations and the NDTS Receiver. The Gateway runs a current Linux distribution, and users are expected to apply security updates promptly.
  • Because the Gateway resides on the same internal network as the workstations, dataset transfers from a workstation to the Gateway occur over an unencrypted channel; this local scope generally makes encryption unnecessary.
  • All outbound communication originates from the Gateway; NAN datacenter services never initiate connections to facility Gateways. Transfers from the Gateway to the Receiver are fully encrypted, and mutual TLS certificates ensure the Gateway is connected to the correct Receiver. Checksums protect every transmission, and any failed transfer, either (workstation-to-Gateway or Gateway-to-Receiver) is queued locally for automatic retry.
  • Upon arrival at the Receiver, each dataset is replicated across two independent storage systems. After ingestion, the data is stored redundantly in two additional locations, each offering high durability.