NAN Archive

From Network for Advanced NMR
Revision as of 18:45, 1 August 2025 by Mmaciejewski (talk | contribs) (Created page with "While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data in the NAN Archive. The NAN archive is composed of: * Postgres database which holds all metadata records across the entire NAN portal and NDTS * Network attached storage which holds all files associated with datasets * Disaster recovery storage which hold an immutable copy of all datasets Postgres database * The Postgres database is hosted as a virtual machine with a...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data in the NAN Archive.

The NAN archive is composed of:

  • Postgres database which holds all metadata records across the entire NAN portal and NDTS
  • Network attached storage which holds all files associated with datasets
  • Disaster recovery storage which hold an immutable copy of all datasets

Postgres database

  • The Postgres database is hosted as a virtual machine with a virtual datastore
  • The Postgres database is replicated in near real-time to a second virtual machined hosted on a different physical server in a physically separate datacenter. The datastore is also virtual and store on a separate network attached storage system as the primary database.
    • In the unlikely event of a failure of the primary Postgres database, the replica database can be changed into the primary database in a few minutes.
  • The Postgres database is backed up hourly to a separate network attached storage system
    • In the very unlikely failure of both the primary database and replica the entire NAN database may be rebuilt from the backups, but recovery would take hours to complete.
  • Multiple network attached storage systems are utilized for the virtual VM datastores and for the database backups. Those network attached storage systems are monitored continuously and have high levels of data durability to prevent data loss due to disk or node failures. They are under vendor hardware and software support. In addition, these systems all utilize snapshots, at least daily, with snapshots being kept for weeks to recover any data loss that may possibly occur. These snapshots can also be utilized to recover the virtual machine that hosts the Postgres database providing two additional backups of the databases current to within the snapshot time-frame.


While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data. Its layered security strategy combines technical safeguards with tightly controlled administrative workflows. Access to NAN components is limited through role-based Active Directory groups. Key-based SSH is required for all logins. Code changes are version-controlled with Git and can be rolled back if needed. Systems are continuously monitored for suspicious activity using the CrowdStrike Falcon agent. Server logs are centrally aggregated and retained for auditing. As an NSF Trusted CI Center of Excellence, NAN undergoes periodic third-party reviews and aligns its policies with research cyberinfrastructure best practices.

All datasets are stored on a fault-tolerant Dell PowerEdge storage appliance with active monitoring. Daily snapshots enable recovery from accidental deletion. A disaster recovery copy of all datasets is written to a WORM (Write Once Read Many) S3 bucket on a Scality RING object storage cluster. Each object receives a renewable quarterly retention lease, during which it cannot be modified or deleted, even by administrators. Leases renew automatically unless a dataset is flagged to be purged. Object versioning maintains a complete history of dataset changes.

The PostgreSQL database replicates in real time to a secondary datacenter for high availability. Daily backups support full recovery if needed. Critical services are distributed across two restricted-access datacenters to eliminate single points of failure. Changes to dataset metadata are captured in immutable audit tables in the NAN database, preserving a complete change history.