NAN Archive: Difference between revisions
From Network for Advanced NMR
Jump to navigationJump to search
Mmaciejewski (talk | contribs) Created page with "While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data in the NAN Archive. The NAN archive is composed of: * Postgres database which holds all metadata records across the entire NAN portal and NDTS * Network attached storage which holds all files associated with datasets * Disaster recovery storage which hold an immutable copy of all datasets Postgres database * The Postgres database is hosted as a virtual machine with a..." |
Mmaciejewski (talk | contribs) No edit summary |
||
| (3 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
== Overview == | |||
While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data in the NAN Archive. | While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data in the NAN Archive. | ||
The NAN | The NAN Archive consists of: | ||
* A Postgres database holding all metadata records across the NAN portal and NDTS | |||
* Network-attached storage (NAS) for all files associated with datasets | |||
* Disaster recovery storage containing immutable dataset backups | |||
* | == NAN Postgres Database == | ||
* | * Hosted as a virtual machine (VM) with a virtual datastore | ||
* | * Replicated in near real-time to a second VM on a different physical server in a separate datacenter | ||
** The secondary VM uses a distinct NAS system for added resilience | |||
** In case of failure, the replica can be promoted to primary within minutes | |||
* Hourly backups to a separate NAS system | |||
** In the unlikely event both the primary and replica fail, the system can be restored from backups (recovery may take several hours) | |||
* All NAS systems supporting the database and backups: | |||
** Are continuously monitored | |||
** Feature high data durability | |||
** Are under vendor hardware/software support | |||
** Employ daily (or more frequent) snapshots retained for weeks, allowing recovery of VMs and recent states | |||
=== Metadata Provenance Tracking === | |||
* Changes to dataset metadata are stored in immutable audit tables | |||
* Complete change history is preserved to ensure traceability | |||
== Data Storage == | |||
=== Primary Storage: Dell PowerScale (Isilon) A3000 === | |||
* Four A3000 nodes, each with 400 TB raw capacity (1.6 PB total) | |||
* Erasure coding reduces usable capacity to ~900 TB | |||
* OneFS clustered architecture scales to 252 nodes (up to 100 PB) | |||
=== Disaster Recovery Storage: HP Scality RING === | |||
* Distributed, peer-to-peer architecture across four datacenters (Farmington and Storrs, CT) | |||
* 14×9s data durability via erasure coding, replication, and self-healing | |||
* WORM (Write-Once-Read-Many) S3 bucket ensures: | |||
** Protection from accidental/malicious deletion | |||
** Automatic lease renewal | |||
** No file deletions by users, admins, or vendors | |||
** Resilience against ransomware encryption attempts | |||
=== Landing Zone: Qumulo NAS === | |||
* Data from NDTS Gateways arrives at the Landing Zone | |||
* Hosted on Qumulo NAS and replicated in real-time to a second NAS | |||
* Data is only deleted after verified transfer to both primary and disaster recovery storage | |||
* At minimum, two independent copies exist from the moment of data arrival | |||
== Access Control & Monitoring == | |||
* Access is restricted using role-based Active Directory (AD) groups | |||
* All logins require key-based SSH | |||
* All code changes are Git-versioned with rollback support | |||
* CrowdStrike Falcon agent monitors systems for suspicious activity | |||
* Server logs are centrally aggregated and retained for audit | |||
* These measures ensure operational integrity and compliance with security protocols | |||
== NSF Trusted CI Center of Excellence == | |||
* NAN participates as an NSF Trusted CI Center of Excellence | |||
* Undergoes periodic third-party reviews | |||
* Aligns policies with research cyberinfrastructure best practices | |||
Latest revision as of 19:35, 1 August 2025
Overview
While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data in the NAN Archive.
The NAN Archive consists of:
- A Postgres database holding all metadata records across the NAN portal and NDTS
- Network-attached storage (NAS) for all files associated with datasets
- Disaster recovery storage containing immutable dataset backups
NAN Postgres Database
- Hosted as a virtual machine (VM) with a virtual datastore
- Replicated in near real-time to a second VM on a different physical server in a separate datacenter
- The secondary VM uses a distinct NAS system for added resilience
- In case of failure, the replica can be promoted to primary within minutes
- Hourly backups to a separate NAS system
- In the unlikely event both the primary and replica fail, the system can be restored from backups (recovery may take several hours)
- All NAS systems supporting the database and backups:
- Are continuously monitored
- Feature high data durability
- Are under vendor hardware/software support
- Employ daily (or more frequent) snapshots retained for weeks, allowing recovery of VMs and recent states
Metadata Provenance Tracking
- Changes to dataset metadata are stored in immutable audit tables
- Complete change history is preserved to ensure traceability
Data Storage
Primary Storage: Dell PowerScale (Isilon) A3000
- Four A3000 nodes, each with 400 TB raw capacity (1.6 PB total)
- Erasure coding reduces usable capacity to ~900 TB
- OneFS clustered architecture scales to 252 nodes (up to 100 PB)
Disaster Recovery Storage: HP Scality RING
- Distributed, peer-to-peer architecture across four datacenters (Farmington and Storrs, CT)
- 14×9s data durability via erasure coding, replication, and self-healing
- WORM (Write-Once-Read-Many) S3 bucket ensures:
- Protection from accidental/malicious deletion
- Automatic lease renewal
- No file deletions by users, admins, or vendors
- Resilience against ransomware encryption attempts
Landing Zone: Qumulo NAS
- Data from NDTS Gateways arrives at the Landing Zone
- Hosted on Qumulo NAS and replicated in real-time to a second NAS
- Data is only deleted after verified transfer to both primary and disaster recovery storage
- At minimum, two independent copies exist from the moment of data arrival
Access Control & Monitoring
- Access is restricted using role-based Active Directory (AD) groups
- All logins require key-based SSH
- All code changes are Git-versioned with rollback support
- CrowdStrike Falcon agent monitors systems for suspicious activity
- Server logs are centrally aggregated and retained for audit
- These measures ensure operational integrity and compliance with security protocols
NSF Trusted CI Center of Excellence
- NAN participates as an NSF Trusted CI Center of Excellence
- Undergoes periodic third-party reviews
- Aligns policies with research cyberinfrastructure best practices