NDTS Daemon Operation: Difference between revisions
Mmaciejewski (talk | contribs) No edit summary |
Mmaciejewski (talk | contribs) |
||
Line 110: | Line 110: | ||
</pre> | </pre> | ||
=== '''Analysis of the Example''' === | ==== '''Analysis of the Example''' ==== | ||
# '''LOG_START''' | # '''LOG_START''' — is the first line indicating the new instance of the daemon and the data and time that it started. | ||
# | # '''"Workstation version is 1.0.15”''' — indicates the daemon is running and the version number | ||
# | # '''“*** This is a Topspin Workstation ***”''' — indicates that this is a Topspin workstation | ||
# | # '''“Ndtd Control Processor listening.”''' — indicates that the daemon is listening for incoming control commands | ||
# | # '''“Entering polling loop…”''' — indicates that the daemon has entered the acquisition polling loop | ||
# | # '''"Workstation user has changed!"''' and next three lines — indicates that the workstation user has changed to nmradmin, that nmradmin is configured to harvest data, and that the harvesting setting is on | ||
# '''“Topspin program has been detected and is running.”''' — daemon detected that the Topspin acquisition directory running | |||
# '''“Setting directory to watch …”''' — shows the location of the Topspin directory which the daemon will watch for file modifications that indicate the start and end of an acquisition | |||
# | |||
# | |||
== '''Troubleshooting Checklist''' == | == '''Troubleshooting Checklist''' == |
Revision as of 19:52, 2 June 2025
Running and Monitoring the Daemon
This page explains how to control the data-transport-daemon service, verify connectivity, and interpret the daemon’s log and audit files on every spectrometer workstation.
Service Control
# Start the daemon sudo /sbin/service data-transport-daemon start # Stop the daemon sudo /sbin/service data-transport-daemon stop # Restart (reloads configuration) sudo /sbin/service data-transport-daemon restart # Check status sudo /sbin/service data-transport-daemon status
- The daemon refuses to start if another instance is already running.*
Heartbeat and Connectivity
- The daemon sends a heartbeat to the Gateway every 10 minutes.
- The Gateway forwards that heartbeat to the NDTS Receiver; entries are visible in vNOC.
Slack Notifications
When heartbeats stop, the Receiver posts to the facility’s Slack channel.
Condition | Time-out | Receiver Action | Slack Message |
---|---|---|---|
First missed heartbeat | > 20 min | Mark workstation offline | offline |
Still missing at next poll | + 8 min | Repeat offline (max 3) | offline |
Heartbeat resumes | – | Mark workstation online | online |
Slack channels (one per facility):
ccrc-ndts-notifications
nmrfam-ndts-notifications
uchc-ndts-notifications
Version Tracking
- On start-up the daemon writes its version to the log.
- A file named
/opt/nan-dtdaemon/running_workstation_version-X.Y.Z
records the version and start time.
Experiment Transfer Audit
Each processed experiment appends a line to
/opt/nan-dtdaemon/logs/ndtd_audit.txt
Fields: timestamp • workstation user • NMRhub user (or unselected) • start/end • path • daemon version • action (sent • spooled • sent-spooled • skipped-trivial • skipped-disabled)
Daemon Logging
NDTS writes two workstation logs:
- nan-dtdaemon.log — runtime events, heartbeats, errors
- ndtd_audit.txt — one-line summary per experiment
This page focuses on **nan-dtdaemon.log**.
Log Levels
Each line begins with a level tag. The level is controlled by the
log_level
parameter in ndtd_configuration.dat
.
Level | Verbosity | Typical Use |
---|---|---|
fatal | Highest-priority, least frequent | Events that make the daemon shut down and cannot be auto-recovered |
error | Critical problems | Failures that stop normal operation but daemon continues running |
warning | Important but non-fatal issues | Conditions worth attention; daemon recovers automatically |
info | Default | Unusual or noteworthy events; normal operations generate very little output |
debug | Diagnostic detail | Ongoing list of major operations; log grows steadily |
trace | Maximum detail | Every internal step; use only for short troubleshooting sessions |
Log File Example
The fragment below is reproduced verbatim from the PDF (pp. 14-15):
Thu Sep 28 13:17:03 2023 LOG_START Started dtd logger. Thu Sep 28 13:17:03 2023 INFO NDTD Workstation version is 1.0.15 Thu Sep 28 13:17:03 2023 INFO *** This is a Topspin Workstation *** Thu Sep 28 13:17:03 2023 INFO Ndtd Control Processor listening. Thu Sep 28 13:17:03 2023 INFO Entering polling loop... Thu Sep 28 13:17:03 2023 INFO Workstation user has changed! Thu Sep 28 13:17:03 2023 INFO workstation user is nmradmin Thu Sep 28 13:17:03 2023 INFO User nmradmin is included in NAN data collection! Thu Sep 28 13:17:03 2023 INFO Harvesting setting for user nmradmin is on Thu Sep 28 13:17:03 2023 INFO Topspin program has been detected and is running. Thu Sep 28 13:17:03 2023 INFO Setting directory to watch to /opt/topspin4.2.0/prog/curdir/nmradmin/shmem
Analysis of the Example
- LOG_START — is the first line indicating the new instance of the daemon and the data and time that it started.
- "Workstation version is 1.0.15” — indicates the daemon is running and the version number
- “*** This is a Topspin Workstation ***” — indicates that this is a Topspin workstation
- “Ndtd Control Processor listening.” — indicates that the daemon is listening for incoming control commands
- “Entering polling loop…” — indicates that the daemon has entered the acquisition polling loop
- "Workstation user has changed!" and next three lines — indicates that the workstation user has changed to nmradmin, that nmradmin is configured to harvest data, and that the harvesting setting is on
- “Topspin program has been detected and is running.” — daemon detected that the Topspin acquisition directory running
- “Setting directory to watch …” — shows the location of the Topspin directory which the daemon will watch for file modifications that indicate the start and end of an acquisition
Troubleshooting Checklist
Symptom | What to Check |
---|---|
No new data reaches NAN | • service data-transport-daemon status • Latest heartbeat in vNOC • Gateway log for incoming files |
Repeated offline Slack alerts | Workstation powered off? Network drop? Firewall blocking port 60195? |
Log grows rapidly | log_level left at trace – reset to info
|
Experiments remain spooled | Gateway unreachable → verify IP/port and gateway service status |
Next Step
Return to NDTS Overview or continue to Accessing Collected Data.