NDTS Daemon Operation: Difference between revisions
From Network for Advanced NMR
Jump to navigationJump to search
Mmaciejewski (talk | contribs) Created page with "= Running and Monitoring the Daemon = This page explains how to control the **data-transport-daemon** service, verify connectivity, and interpret the logs produced on each spectrometer workstation. == '''Starting, Stopping, and Checking Status''' == <pre> # Start the daemon sudo /sbin/service data-transport-daemon start # Stop the daemon sudo /sbin/service data-transport-daemon stop # Restart (reloads configuration) sudo /sbin/service data-transport-daemon restart #..." |
Mmaciejewski (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
= Running and Monitoring the Daemon = | = Running and Monitoring the Daemon = | ||
This page explains how to control the | This page explains how to control the '''data-transport-daemon''' service, verify connectivity, and interpret the daemon’s log and audit files on every spectrometer workstation. | ||
== ''' | == '''Service Control''' == | ||
<pre> | <pre> | ||
# Start the daemon | # Start the daemon | ||
Line 17: | Line 17: | ||
sudo /sbin/service data-transport-daemon status | sudo /sbin/service data-transport-daemon status | ||
</pre> | </pre> | ||
*The daemon | *The daemon refuses to start if another instance is already running.* | ||
== '''Heartbeat and Connectivity''' == | == '''Heartbeat and Connectivity''' == | ||
* | * The daemon sends a heartbeat to the Gateway every '''10 minutes'''. | ||
* The Gateway forwards that heartbeat to the NDTS Receiver | * The Gateway forwards that heartbeat to the NDTS Receiver; entries are visible in vNOC. | ||
=== Slack Notifications === | === Slack Notifications === | ||
When | When heartbeats stop, the Receiver posts to the facility’s Slack channel. | ||
{| class="wikitable" | {| class="wikitable" | ||
! Condition !! Time-out !! Action !! Slack | ! Condition !! Time-out !! Receiver Action !! Slack Message | ||
|- | |- | ||
| | | First missed heartbeat > 20 min | ≈ 20 min | Mark workstation '''offline''' | ''offline'' | ||
|- | |- | ||
| Heartbeat resumes | | Heartbeat still missing (next poll) | +8 min | Re-post '''offline''' (max 3) | ''offline'' | ||
|- | |||
| Heartbeat resumes | – | Mark workstation '''online''' | ''online'' | |||
|} | |} | ||
Slack channel names (one per facility): | |||
* <code>ccrc-ndts-notifications</code> | * <code>ccrc-ndts-notifications</code> | ||
Line 40: | Line 42: | ||
* <code>uchc-ndts-notifications</code> | * <code>uchc-ndts-notifications</code> | ||
== '''Version | == '''Version Tracking''' == | ||
* On | * On start-up, the daemon writes its version to the log. | ||
* A file named | * A file named | ||
<pre>/opt/nan-dtdaemon/running_workstation_version-X.Y.Z</pre> | |||
contains the version and start timestamp. | |||
== '''Experiment Transfer Audit''' == | == '''Experiment Transfer Audit''' == | ||
Every processed experiment | Every processed experiment appends one line to | ||
<pre>/opt/nan-dtdaemon/logs/ndtd_audit.txt</pre> | <pre>/opt/nan-dtdaemon/logs/ndtd_audit.txt</pre> | ||
Fields: | Fields: | ||
# Timestamp # Workstation user # NMRhub user (or | # Timestamp | ||
# Path to data # Daemon version # Action | # Workstation (Linux) user | ||
(sent | # Selected NMRhub user (or ''unselected'') | ||
# Experiment start & end time | |||
# Path to experiment data | |||
# Daemon version | |||
# Action (sent • spooled • sent-spooled • skipped-trivial • skipped-disabled) | |||
== '''Daemon | == '''Daemon Log File''' == | ||
* Main log: <pre>/opt/nan-dtdaemon/logs/nan-dtdaemon.log</pre> | * Main log: <pre>/opt/nan-dtdaemon/logs/nan-dtdaemon.log</pre> | ||
* | * Verbosity is set by '''log_level''' in <code>ndtd_configuration.dat</code> | ||
(fatal | (fatal < error < warning < '''info''' < debug < trace). | ||
Example start-up excerpt (level INFO): | Example start-up excerpt (level INFO): | ||
Line 69: | Line 77: | ||
== '''Troubleshooting Checklist''' == | == '''Troubleshooting Checklist''' == | ||
{| class="wikitable" | {| class="wikitable" | ||
! Symptom !! Check | ! Symptom !! What to Check | ||
|- | |- | ||
| No new data | | No new data reaches NAN | • <code>service data-transport-daemon status</code><br/>• Latest heartbeat timestamp in vNOC<br/>• Gateway log for incoming files | ||
• | |||
• Gateway log for incoming files | |||
|- | |- | ||
| Slack | | Repeated ''offline'' Slack alerts | Workstation powered off? Network drop? Firewall still allowing port 60195? | ||
|- | |- | ||
| Log | | Log growing rapidly | <code>log_level trace</code> left enabled → set back to '''info''' | ||
|- | |- | ||
| Experiments | | Experiments stay ''spooled'' | Gateway unreachable → verify IP/port and Gateway service status | ||
|} | |} | ||
== '''Next Step''' == | |||
*Return to [[NDTS Overview|NDTS Overview]] or proceed to [[NDTS_Data_Access|Accessing Collected Data]].* |
Revision as of 19:23, 2 June 2025
Running and Monitoring the Daemon
This page explains how to control the data-transport-daemon service, verify connectivity, and interpret the daemon’s log and audit files on every spectrometer workstation.
Service Control
# Start the daemon sudo /sbin/service data-transport-daemon start # Stop the daemon sudo /sbin/service data-transport-daemon stop # Restart (reloads configuration) sudo /sbin/service data-transport-daemon restart # Check status sudo /sbin/service data-transport-daemon status
- The daemon refuses to start if another instance is already running.*
Heartbeat and Connectivity
- The daemon sends a heartbeat to the Gateway every 10 minutes.
- The Gateway forwards that heartbeat to the NDTS Receiver; entries are visible in vNOC.
Slack Notifications
When heartbeats stop, the Receiver posts to the facility’s Slack channel.
Condition | Time-out | Receiver Action | Slack Message |
---|---|---|---|
≈ 20 min | Mark workstation offline | offline | |||
+8 min | Re-post offline (max 3) | offline | |||
– | Mark workstation online | online |
Slack channel names (one per facility):
ccrc-ndts-notifications
nmrfam-ndts-notifications
uchc-ndts-notifications
Version Tracking
- On start-up, the daemon writes its version to the log.
- A file named
/opt/nan-dtdaemon/running_workstation_version-X.Y.Z
contains the version and start timestamp.
Experiment Transfer Audit
Every processed experiment appends one line to
/opt/nan-dtdaemon/logs/ndtd_audit.txt
Fields:
- Timestamp
- Workstation (Linux) user
- Selected NMRhub user (or unselected)
- Experiment start & end time
- Path to experiment data
- Daemon version
- Action (sent • spooled • sent-spooled • skipped-trivial • skipped-disabled)
Daemon Log File
- Main log:
/opt/nan-dtdaemon/logs/nan-dtdaemon.log
- Verbosity is set by log_level in
ndtd_configuration.dat
(fatal < error < warning < info < debug < trace).
Example start-up excerpt (level INFO):
Thu Sep 28 13:17:03 2023 LOG_START Started dtd logger. Thu Sep 28 13:17:03 2023 INFO NDTD Workstation version is 1.0.15 Thu Sep 28 13:17:03 2023 INFO *** This is a Topspin Workstation *** Thu Sep 28 13:17:03 2023 INFO Ndtd Control Processor listening.
Troubleshooting Checklist
Symptom | What to Check |
---|---|
• service data-transport-daemon status • Latest heartbeat timestamp in vNOC • Gateway log for incoming files | |
Workstation powered off? Network drop? Firewall still allowing port 60195? | |
log_level trace left enabled → set back to info
| |
Gateway unreachable → verify IP/port and Gateway service status |
Next Step
- Return to NDTS Overview or proceed to Accessing Collected Data.*