NAN Archive

From Network for Advanced NMR
Revision as of 19:29, 1 August 2025 by Mmaciejewski (talk | contribs)
Jump to navigationJump to search

This page is currently under development - please excuse any issues

Overview

While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data in the NAN Archive.

The NAN archive is composed of:

  • Postgres database which holds all metadata records across the entire NAN portal and NDTS
  • Network attached storage which holds all files associated with datasets
  • Disaster recovery storage which hold an immutable copy of all datasets

NAN Postgres database

  • The Postgres database is hosted as a virtual machine with a virtual datastore
  • The Postgres database is replicated in near real-time to a second virtual machined hosted on a different physical server in a physically separate datacenter. The datastore is also virtual and store on a separate network attached storage system as the primary database.
    • In the unlikely event of a failure of the primary Postgres database, the replica database can be changed into the primary database in a few minutes.
  • The Postgres database is backed up hourly to a separate network attached storage system
    • In the very unlikely failure of both the primary database and replica the entire NAN database may be rebuilt from the backups, but recovery would take hours to complete.
  • Multiple network attached storage systems are utilized for the virtual VM datastores and for the database backups. Those network attached storage systems are monitored continuously and have high levels of data durability to prevent data loss due to disk or node failures. They are under vendor hardware and software support. In addition, these systems all utilize snapshots, at least daily, with snapshots being kept for weeks to recover any data loss that may possibly occur. These snapshots can also be utilized to recover the virtual machine that hosts the Postgres database providing two additional backups of the databases current to within the snapshot time-frame.

Metadata Provenance Tracking

  • Changes to dataset metadata are captured in immutable audit tables in the NAN database, preserving a complete change history.

Data Storage

NAN Primary Storage (Dell PowerScale (Isilon) A3000)

  • NAN utilizes four A3000 nodes each with 400 GB raw capacity for a total capacity of 1.6 PB
  • The system utilizes Erasure coding for data protection reducing the total usable size to 900 TB
  • The A3000 uses a distributed, fully symmetric clustered architecture with OneFS and can scale to 252 nodes (100 PB)

NAN Disaster Recovery Storage (HP Scality RING)

  • Scality RING is a software platform that uses a peer-to-peer architecture, distributing data and metadata across multiple nodes and datacenters to ensure high availability and eliminate single points of failure.
    • NAN utilizes the the UConn HPC facilities system which is GeoSpread across four separate datacenters, two in Farmington CT and two in Storrs CT
    • The system provide 14 x 9s of data durability through Erasure coding, replication, and self-healing capability
  • NAN utilizes a WORM (Write-Once-Read-Many) S3 bucket for disaster recovery. All data is protected from accidental or malicious deletion due to the WORM capability. All files have WORM leases that auto-renew quarterly and data cannot be removed by users, system administrators, nor the hardware vendor performing maintenance. The WORM also protects against Ransomware attacks as data cannot be modified preventing a malicious actor from encrypting the files.

Landing Zone (Qumulo)

  • NDTS transfers data from Gateway computers to the NAN receiver's landing zone. The Landing zone is a disk partition hosted on a Qumulo NAS and is replicated to a second Qumulo NAS in real-time. No data is removed from the Landing Zone until a copies of data have been verified on the NAN primary and disaster recovery storage. Thus, at all times, from the moment data arrives in the datacenter there are at least two independent copies of the data.

NSF Trusted CI Center of Excellence

  • As an NSF Trusted CI Center of Excellence, NAN undergoes periodic third-party reviews and aligns its policies with research cyberinfrastructure best practices.


Access to NAN components is limited through role-based Active Directory groups. Key-based SSH is required for all logins. Code changes are version-controlled with Git and can be rolled back if needed. Systems are continuously monitored for suspicious activity using the CrowdStrike Falcon agent. Server logs are centrally aggregated and retained for auditing. As an NSF Trusted CI Center of Excellence, NAN undergoes periodic third-party reviews and aligns its policies with research cyberinfrastructure best practices.