NAN Archive: Difference between revisions
From Network for Advanced NMR
Jump to navigationJump to search
Mmaciejewski (talk | contribs) No edit summary |
Mmaciejewski (talk | contribs) No edit summary |
||
| (One intermediate revision by the same user not shown) | |||
| Line 1: | Line 1: | ||
== Overview == | == Overview == | ||
While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data in the NAN Archive. | While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data in the NAN Archive. | ||
The NAN | The NAN Archive consists of: | ||
* A Postgres database holding all metadata records across the NAN portal and NDTS | |||
* Network-attached storage (NAS) for all files associated with datasets | |||
* Disaster recovery storage containing immutable dataset backups | |||
== NAN Postgres Database == | |||
* Hosted as a virtual machine (VM) with a virtual datastore | |||
* Replicated in near real-time to a second VM on a different physical server in a separate datacenter | |||
** The secondary VM uses a distinct NAS system for added resilience | |||
== NAN Postgres | ** In case of failure, the replica can be promoted to primary within minutes | ||
* | * Hourly backups to a separate NAS system | ||
* | ** In the unlikely event both the primary and replica fail, the system can be restored from backups (recovery may take several hours) | ||
** In | * All NAS systems supporting the database and backups: | ||
* | ** Are continuously monitored | ||
** In the | ** Feature high data durability | ||
* | ** Are under vendor hardware/software support | ||
** Employ daily (or more frequent) snapshots retained for weeks, allowing recovery of VMs and recent states | |||
=== Metadata Provenance Tracking === | === Metadata Provenance Tracking === | ||
* Changes to dataset metadata are stored in immutable audit tables | |||
* Changes to dataset metadata are | * Complete change history is preserved to ensure traceability | ||
== Data Storage == | == Data Storage == | ||
=== | === Primary Storage: Dell PowerScale (Isilon) A3000 === | ||
* Four A3000 nodes, each with 400 TB raw capacity (1.6 PB total) | |||
* Erasure coding reduces usable capacity to ~900 TB | |||
* OneFS clustered architecture scales to 252 nodes (up to 100 PB) | |||
* | === Disaster Recovery Storage: HP Scality RING === | ||
* | * Distributed, peer-to-peer architecture across four datacenters (Farmington and Storrs, CT) | ||
* | * 14×9s data durability via erasure coding, replication, and self-healing | ||
* WORM (Write-Once-Read-Many) S3 bucket ensures: | |||
** Protection from accidental/malicious deletion | |||
** Automatic lease renewal | |||
** No file deletions by users, admins, or vendors | |||
** Resilience against ransomware encryption attempts | |||
=== | === Landing Zone: Qumulo NAS === | ||
* Data from NDTS Gateways arrives at the Landing Zone | |||
* Hosted on Qumulo NAS and replicated in real-time to a second NAS | |||
* Data is only deleted after verified transfer to both primary and disaster recovery storage | |||
* At minimum, two independent copies exist from the moment of data arrival | |||
* | == Access Control & Monitoring == | ||
* | * Access is restricted using role-based Active Directory (AD) groups | ||
* | * All logins require key-based SSH | ||
* | * All code changes are Git-versioned with rollback support | ||
* CrowdStrike Falcon agent monitors systems for suspicious activity | |||
* Server logs are centrally aggregated and retained for audit | |||
* These measures ensure operational integrity and compliance with security protocols | |||
* | |||
== NSF Trusted CI Center of Excellence == | == NSF Trusted CI Center of Excellence == | ||
* NAN participates as an NSF Trusted CI Center of Excellence | |||
* | * Undergoes periodic third-party reviews | ||
* Aligns policies with research cyberinfrastructure best practices | |||
Latest revision as of 19:35, 1 August 2025
Overview
While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data in the NAN Archive.
The NAN Archive consists of:
- A Postgres database holding all metadata records across the NAN portal and NDTS
- Network-attached storage (NAS) for all files associated with datasets
- Disaster recovery storage containing immutable dataset backups
NAN Postgres Database
- Hosted as a virtual machine (VM) with a virtual datastore
- Replicated in near real-time to a second VM on a different physical server in a separate datacenter
- The secondary VM uses a distinct NAS system for added resilience
- In case of failure, the replica can be promoted to primary within minutes
- Hourly backups to a separate NAS system
- In the unlikely event both the primary and replica fail, the system can be restored from backups (recovery may take several hours)
- All NAS systems supporting the database and backups:
- Are continuously monitored
- Feature high data durability
- Are under vendor hardware/software support
- Employ daily (or more frequent) snapshots retained for weeks, allowing recovery of VMs and recent states
Metadata Provenance Tracking
- Changes to dataset metadata are stored in immutable audit tables
- Complete change history is preserved to ensure traceability
Data Storage
Primary Storage: Dell PowerScale (Isilon) A3000
- Four A3000 nodes, each with 400 TB raw capacity (1.6 PB total)
- Erasure coding reduces usable capacity to ~900 TB
- OneFS clustered architecture scales to 252 nodes (up to 100 PB)
Disaster Recovery Storage: HP Scality RING
- Distributed, peer-to-peer architecture across four datacenters (Farmington and Storrs, CT)
- 14×9s data durability via erasure coding, replication, and self-healing
- WORM (Write-Once-Read-Many) S3 bucket ensures:
- Protection from accidental/malicious deletion
- Automatic lease renewal
- No file deletions by users, admins, or vendors
- Resilience against ransomware encryption attempts
Landing Zone: Qumulo NAS
- Data from NDTS Gateways arrives at the Landing Zone
- Hosted on Qumulo NAS and replicated in real-time to a second NAS
- Data is only deleted after verified transfer to both primary and disaster recovery storage
- At minimum, two independent copies exist from the moment of data arrival
Access Control & Monitoring
- Access is restricted using role-based Active Directory (AD) groups
- All logins require key-based SSH
- All code changes are Git-versioned with rollback support
- CrowdStrike Falcon agent monitors systems for suspicious activity
- Server logs are centrally aggregated and retained for audit
- These measures ensure operational integrity and compliance with security protocols
NSF Trusted CI Center of Excellence
- NAN participates as an NSF Trusted CI Center of Excellence
- Undergoes periodic third-party reviews
- Aligns policies with research cyberinfrastructure best practices