NAN Archive

From Network for Advanced NMR
Jump to navigationJump to search

Overview

While no system can guarantee perfect security or trust, NAN goes to great lengths to safeguard data in the NAN Archive.

The NAN Archive consists of:

  • A Postgres database holding all metadata records across the NAN portal and NDTS
  • Network-attached storage (NAS) for all files associated with datasets
  • Disaster recovery storage containing immutable dataset backups

NAN Postgres Database

  • Hosted as a virtual machine (VM) with a virtual datastore
  • Replicated in near real-time to a second VM on a different physical server in a separate datacenter
    • The secondary VM uses a distinct NAS system for added resilience
    • In case of failure, the replica can be promoted to primary within minutes
  • Hourly backups to a separate NAS system
    • In the unlikely event both the primary and replica fail, the system can be restored from backups (recovery may take several hours)
  • All NAS systems supporting the database and backups:
    • Are continuously monitored
    • Feature high data durability
    • Are under vendor hardware/software support
    • Employ daily (or more frequent) snapshots retained for weeks, allowing recovery of VMs and recent states

Metadata Provenance Tracking

  • Changes to dataset metadata are stored in immutable audit tables
  • Complete change history is preserved to ensure traceability

Data Storage

Primary Storage: Dell PowerScale (Isilon) A3000

  • Four A3000 nodes, each with 400 TB raw capacity (1.6 PB total)
  • Erasure coding reduces usable capacity to ~900 TB
  • OneFS clustered architecture scales to 252 nodes (up to 100 PB)

Disaster Recovery Storage: HP Scality RING

  • Distributed, peer-to-peer architecture across four datacenters (Farmington and Storrs, CT)
  • 14×9s data durability via erasure coding, replication, and self-healing
  • WORM (Write-Once-Read-Many) S3 bucket ensures:
    • Protection from accidental/malicious deletion
    • Automatic lease renewal
    • No file deletions by users, admins, or vendors
    • Resilience against ransomware encryption attempts

Landing Zone: Qumulo NAS

  • Data from NDTS Gateways arrives at the Landing Zone
  • Hosted on Qumulo NAS and replicated in real-time to a second NAS
  • Data is only deleted after verified transfer to both primary and disaster recovery storage
  • At minimum, two independent copies exist from the moment of data arrival

Access Control & Monitoring

  • Access is restricted using role-based Active Directory (AD) groups
  • All logins require key-based SSH
  • All code changes are Git-versioned with rollback support
  • CrowdStrike Falcon agent monitors systems for suspicious activity
  • Server logs are centrally aggregated and retained for audit
  • These measures ensure operational integrity and compliance with security protocols

NSF Trusted CI Center of Excellence

  • NAN participates as an NSF Trusted CI Center of Excellence
  • Undergoes periodic third-party reviews
  • Aligns policies with research cyberinfrastructure best practices