NAN Archive
This page is currently under development - please excuse any issues
Overview
While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data in the NAN Archive.
The NAN archive is composed of:
- Postgres database which holds all metadata records across the entire NAN portal and NDTS
- Network attached storage which holds all files associated with datasets
- Disaster recovery storage which hold an immutable copy of all datasets
NAN Postgres database
- The Postgres database is hosted as a virtual machine with a virtual datastore
- The Postgres database is replicated in near real-time to a second virtual machined hosted on a different physical server in a physically separate datacenter. The datastore is also virtual and store on a separate network attached storage system as the primary database.
- In the unlikely event of a failure of the primary Postgres database, the replica database can be changed into the primary database in a few minutes.
- The Postgres database is backed up hourly to a separate network attached storage system
- In the very unlikely failure of both the primary database and replica the entire NAN database may be rebuilt from the backups, but recovery would take hours to complete.
- Multiple network attached storage systems are utilized for the virtual VM datastores and for the database backups. Those network attached storage systems are monitored continuously and have high levels of data durability to prevent data loss due to disk or node failures. They are under vendor hardware and software support. In addition, these systems all utilize snapshots, at least daily, with snapshots being kept for weeks to recover any data loss that may possibly occur. These snapshots can also be utilized to recover the virtual machine that hosts the Postgres database providing two additional backups of the databases current to within the snapshot time-frame.
Data Storage
NAN Primary Storage (Dell PowerScale (Isilon) A3000)
- NAN utilizes four A3000 nodes each with 400 GB raw capacity for a total capacity of 1.6 PB
- The system utilizes Erasure coding for data protection reducing the total usable size to 900 TB
- The A3000 uses a distributed, fully symmetric clustered architecture with OneFS and can scale to 252 nodes (100 PB)
NAN Disaster Recovery Storage (HP Scality RING)
- Scality RING is a software platform that uses a peer-to-peer architecture, distributing data and metadata across multiple nodes and datacenters to ensure high availability and eliminate single points of failure.
- NAN utilizes the the UConn HPC facilities system which is GeoSpread across four separate datacenters, two in Farmington CT and two in Storrs CT
- The system provide 14 x 9s of data durability through Erasure coding, replication, and self-healing capability
- NAN utilizes a WORM (Write-Once-Read-Many) S3 bucket for disaster recovery. All data is protected from accidental or malicious deletion due to the WORM capability. All files have WORM leases that auto-renew quarterly and data cannot be removed by users, system administrators, nor the hardware vendor performing maintenance. The WORM also protects against Ransomware attacks as data cannot be modified preventing a malicious actor from encrypting the files.
While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data. Its layered security strategy combines technical safeguards with tightly controlled administrative workflows. Access to NAN components is limited through role-based Active Directory groups. Key-based SSH is required for all logins. Code changes are version-controlled with Git and can be rolled back if needed. Systems are continuously monitored for suspicious activity using the CrowdStrike Falcon agent. Server logs are centrally aggregated and retained for auditing. As an NSF Trusted CI Center of Excellence, NAN undergoes periodic third-party reviews and aligns its policies with research cyberinfrastructure best practices.
All datasets are stored on a fault-tolerant Dell PowerEdge storage appliance with active monitoring. Daily snapshots enable recovery from accidental deletion. A disaster recovery copy of all datasets is written to a WORM (Write Once Read Many) S3 bucket on a Scality RING object storage cluster. Each object receives a renewable quarterly retention lease, during which it cannot be modified or deleted, even by administrators. Leases renew automatically unless a dataset is flagged to be purged. Object versioning maintains a complete history of dataset changes.
The PostgreSQL database replicates in real time to a secondary datacenter for high availability. Daily backups support full recovery if needed. Critical services are distributed across two restricted-access datacenters to eliminate single points of failure. Changes to dataset metadata are captured in immutable audit tables in the NAN database, preserving a complete change history.