NDTS Daemon Operation: Difference between revisions

Revision as of 16:25, 3 June 2025

Running and Monitoring the Daemon

This page explains how to control the data-transport-daemon service, verify connectivity, and interpret the daemon’s log and audit files on every spectrometer workstation.

Service Control

# Start the daemon
sudo /sbin/service data-transport-daemon start

# Stop the daemon
sudo /sbin/service data-transport-daemon stop

# Restart (reloads configuration)
sudo /sbin/service data-transport-daemon restart

# Check status
sudo /sbin/service data-transport-daemon status

Note, the daemon will not start again if another instance is already running

Heartbeats and Connectivity

On a regular basis (by default, every 10 minutes), each NDTS daemon sends a heartbeat message to the Gateway. These messages serve as a continuous health check and confirm that the daemon is active and communicating. Each heartbeat contains a set of diagnostic and identity information used for system monitoring and troubleshooting and includes:

Workstation hostname
Current local datetime on the workstation
Workstation IP address
Currently selected NMRhub user
Daemon version
Facility and spectrometer identifiers
Operating system details
Uptime and system load metrics

The Gateway receives the heartbeat, appends its own identifying information (including Gateway UUID and timestamps), and forwards the full message to the NDTS Receiver. These heartbeats are recorded in the NAN Repository and are viewable by Facility Managers via the virtual NAN Operations Center (vNOC).

Slack Notifications

When heartbeats stop, the Receiver alerts the facility via the associated Slack channel.

Automated actions based on heartbeat status
Condition	Time-out	Receiver Action	Slack Message
First missed heartbeat	> 20 min	Mark workstation offline	offline
Still missing at next poll	+ 8 min	Repeat offline (max 3)	offline
Heartbeat resumes	–	Mark workstation online	online

In practical terms, if you subscribe to the Slack channel for your facility you will know within 20 minutes if a daemon went off-line, but you will see a maximum of three offline messages to not flood the Slack channel with the same offline message over and over. A single, online, message will appear in the Slack channel when heartbeats resume.

Slack channels (one per facility):

ccrc-ndts-notifications
nmrfam-ndts-notifications
uchc-ndts-notifications

Version Tracking

When the Daemon starts it writes the active version number in two palces ...

/opt/nan-dtdaemon/running_workstation_version-X.Y.Z

which records the version number and a timestamp of when the daemon started

 /opt/nan-dtdaemon/logs/nan-dtdaemon.log

which write an INFO message stating the version number and which is time-stamped

Experiment Transfer Audit

Each processed experiment appends a line to

/opt/nan-dtdaemon/logs/ndtd_audit.txt

Fields: timestamp • workstation user • NMRhub user (or unselected) • start/end • path • daemon version • action (sent • spooled • sent-spooled • skipped-trivial • skipped-disabled)

Daemon Logging

NDTS writes two workstation logs:

nan-dtdaemon.log — runtime events, heartbeats, errors
ndtd_audit.txt — one-line summary per experiment

This page focuses on **nan-dtdaemon.log**.

Log Levels

Each line begins with a level tag. The level is controlled by the log_level parameter in ndtd_configuration.dat.

Level	Verbosity	Typical Use
fatal	Highest-priority, least frequent	Events that make the daemon shut down and cannot be auto-recovered
error	Critical problems	Failures that stop normal operation but daemon continues running
warning	Important but non-fatal issues	Conditions worth attention; daemon recovers automatically
info	Default	Unusual or noteworthy events; normal operations generate very little output
debug	Diagnostic detail	Ongoing list of major operations; log grows steadily
trace	Maximum detail	Every internal step; use only for short troubleshooting sessions

Log File Example

The fragment below is reproduced verbatim from the PDF (pp. 14-15):

Thu Sep 28 13:17:03 2023 LOG_START Started dtd logger.
Thu Sep 28 13:17:03 2023 INFO NDTD Workstation version is 1.0.15
Thu Sep 28 13:17:03 2023 INFO *** This is a Topspin Workstation ***
Thu Sep 28 13:17:03 2023 INFO Ndtd Control Processor listening.
Thu Sep 28 13:17:03 2023 INFO Entering polling loop...
Thu Sep 28 13:17:03 2023 INFO Workstation user has changed!
Thu Sep 28 13:17:03 2023 INFO workstation user is nmradmin
Thu Sep 28 13:17:03 2023 INFO User nmradmin is included in NAN data collection!
Thu Sep 28 13:17:03 2023 INFO Harvesting setting for user nmradmin is on
Thu Sep 28 13:17:03 2023 INFO Topspin program has been detected and is running.
Thu Sep 28 13:17:03 2023 INFO Setting directory to watch to /opt/topspin4.2.0/prog/curdir/nmradmin/shmem

Analysis of the Example

LOG_START — is the first line indicating the new instance of the daemon and the data and time that it started.
"Workstation version is 1.0.15” — indicates the daemon is running and the version number
“*** This is a Topspin Workstation ***” — indicates that this is a Topspin workstation
“Ndtd Control Processor listening.” — indicates that the daemon is listening for incoming control commands
“Entering polling loop…” — indicates that the daemon has entered the acquisition polling loop
"Workstation user has changed!" and next three lines — indicates that the workstation user has changed to nmradmin, that nmradmin is configured to harvest data, and that the harvesting setting is on
“Topspin program has been detected and is running.” — daemon detected that the Topspin acquisition directory running
“Setting directory to watch …” — shows the location of the Topspin directory which the daemon will watch for file modifications that indicate the start and end of an acquisition

Troubleshooting Checklist

Symptom	What to Check
No new data reaches NAN	• `service data-transport-daemon status` • Latest heartbeat in vNOC • Gateway log for incoming files
Repeated offline Slack alerts	Workstation powered off? Network drop? Firewall blocking port 60195?
Log grows rapidly	`log_level` left at trace – reset to info
Experiments remain spooled	Gateway unreachable → verify IP/port and gateway service status

Next Step

Return to NDTS Overview or continue to Accessing Collected Data.

@@ Line 58: / Line 58: @@
 | ''online''
 |}
+In practical terms, if you subscribe to the Slack channel for your facility you will know within 20 minutes if a daemon went off-line, but you will see a maximum of three ''offline'' messages to not flood the Slack channel with the same ''offline'' message over and over. A single, ''online'', message will appear in the Slack channel when heartbeats resume.
 Slack channels (one per facility):
@@ Line 65: / Line 67: @@
 == '''Version Tracking''' ==
-* On start-up the daemon writes its version to the log.
+* When the Daemon starts it writes the active version number in two palces ...
-* A file named
+   <pre>/opt/nan-dtdaemon/running_workstation_version-X.Y.Z</pre>which records the version number and a timestamp of when the daemon started
-   <pre>/opt/nan-dtdaemon/running_workstation_version-X.Y.Z</pre>
+   /opt/nan-dtdaemon/logs/nan-dtdaemon.log
-   records the version and start time.
+which write an INFO message stating the version number and which is time-stamped
 == '''Experiment Transfer Audit''' ==

NDTS Daemon Operation: Difference between revisions

Revision as of 16:25, 3 June 2025

Contents

Running and Monitoring the Daemon

Service Control

Heartbeats and Connectivity

Slack Notifications

Version Tracking

Experiment Transfer Audit

Daemon Logging

Log Levels

Log File Example

Analysis of the Example

Troubleshooting Checklist

Next Step

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools