NDTS Daemon Operation: Difference between revisions
From Network for Advanced NMR
Jump to navigationJump to search
Mmaciejewski (talk | contribs) |
Mmaciejewski (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
= Running and Monitoring the Daemon = | = Running and Monitoring the Daemon = | ||
This page explains how to control the '''data-transport-daemon''' service, verify connectivity, and interpret the daemon’s log and audit files on every spectrometer workstation. | This page explains how to control the '''data-transport-daemon''' service, verify connectivity, and interpret the daemon’s log and audit files on every spectrometer workstation. | ||
== '''Service Control''' == | == '''Service Control''' == | ||
Line 27: | Line 27: | ||
{| class="wikitable" | {| class="wikitable" | ||
! Condition | ! Condition | ||
! Time-out | |||
! Receiver Action | |||
! Slack Message | |||
|- | |- | ||
| First missed heartbeat | | First missed heartbeat | ||
| > 20 min | |||
| Mark workstation '''offline''' | |||
| ''offline'' | |||
|- | |- | ||
| | | Still missing at next poll | ||
| + 8 min | |||
| Repeat ''offline'' (max 3) | |||
| ''offline'' | |||
|- | |- | ||
| Heartbeat resumes | – | Mark workstation '''online''' | ''online'' | | Heartbeat resumes | ||
| – | |||
| Mark workstation '''online''' | |||
| ''online'' | |||
|} | |} | ||
Slack | Slack channels (one per facility): | ||
* <code>ccrc-ndts-notifications</code> | |||
* <code>ccrc-ndts-notifications</code> | * <code>nmrfam-ndts-notifications</code> | ||
* <code>nmrfam-ndts-notifications</code> | |||
* <code>uchc-ndts-notifications</code> | * <code>uchc-ndts-notifications</code> | ||
== '''Version Tracking''' == | == '''Version Tracking''' == | ||
* On start-up | * On start-up the daemon writes its version to the log. | ||
* A file named | * A file named | ||
<pre>/opt/nan-dtdaemon/running_workstation_version-X.Y.Z</pre> | <pre>/opt/nan-dtdaemon/running_workstation_version-X.Y.Z</pre> | ||
records the version and start time. | |||
== '''Experiment Transfer Audit''' == | == '''Experiment Transfer Audit''' == | ||
Each processed experiment appends a line to | |||
<pre>/opt/nan-dtdaemon/logs/ndtd_audit.txt</pre> | <pre>/opt/nan-dtdaemon/logs/ndtd_audit.txt</pre> | ||
Fields: | Fields: timestamp • workstation user • NMRhub user (or ''unselected'') • start/end • path • daemon version • action | ||
(sent • spooled • sent-spooled • skipped-trivial • skipped-disabled) | |||
== '''Daemon Log File''' == | == '''Daemon Log File''' == | ||
* Main log: <pre>/opt/nan-dtdaemon/logs/nan-dtdaemon.log</pre> | * Main log: <pre>/opt/nan-dtdaemon/logs/nan-dtdaemon.log</pre> | ||
* Verbosity | * Verbosity controlled by '''log_level''' in <code>ndtd_configuration.dat</code> | ||
(fatal | (fatal < error < warning < '''info''' < debug < trace). | ||
Example start-up excerpt ( | Example start-up excerpt (INFO): | ||
<pre> | <pre> | ||
Thu Sep 28 13:17:03 2023 LOG_START Started dtd logger. | Thu Sep 28 13:17:03 2023 LOG_START Started dtd logger. | ||
Line 77: | Line 81: | ||
== '''Troubleshooting Checklist''' == | == '''Troubleshooting Checklist''' == | ||
{| class="wikitable" | {| class="wikitable" | ||
! Symptom | ! Symptom | ||
! What to Check | |||
|- | |- | ||
| No new data reaches NAN | • <code>service data-transport-daemon status</code><br/>• Latest heartbeat | | No new data reaches NAN | ||
| • <code>service data-transport-daemon status</code><br/>• Latest heartbeat in vNOC<br/>• Gateway log for incoming files | |||
|- | |- | ||
| Repeated ''offline'' Slack alerts | Workstation powered off? Network drop? Firewall | | Repeated ''offline'' Slack alerts | ||
| Workstation powered off? Network drop? Firewall blocking port 60195? | |||
|- | |- | ||
| Log | | Log grows rapidly | ||
| <code>log_level</code> left at '''trace''' – reset to '''info''' | |||
|- | |- | ||
| Experiments | | Experiments remain ''spooled'' | ||
| Gateway unreachable → verify IP/port and gateway service status | |||
|} | |} | ||
== '''Next Step''' == | == '''Next Step''' == | ||
Return to [[NDTS Overview|NDTS Overview]] or continue to [[NDTS_Data_Access|Accessing Collected Data]]. |
Revision as of 19:26, 2 June 2025
Running and Monitoring the Daemon
This page explains how to control the data-transport-daemon service, verify connectivity, and interpret the daemon’s log and audit files on every spectrometer workstation.
Service Control
# Start the daemon sudo /sbin/service data-transport-daemon start # Stop the daemon sudo /sbin/service data-transport-daemon stop # Restart (reloads configuration) sudo /sbin/service data-transport-daemon restart # Check status sudo /sbin/service data-transport-daemon status
- The daemon refuses to start if another instance is already running.*
Heartbeat and Connectivity
- The daemon sends a heartbeat to the Gateway every 10 minutes.
- The Gateway forwards that heartbeat to the NDTS Receiver; entries are visible in vNOC.
Slack Notifications
When heartbeats stop, the Receiver posts to the facility’s Slack channel.
Condition | Time-out | Receiver Action | Slack Message |
---|---|---|---|
First missed heartbeat | > 20 min | Mark workstation offline | offline |
Still missing at next poll | + 8 min | Repeat offline (max 3) | offline |
Heartbeat resumes | – | Mark workstation online | online |
Slack channels (one per facility):
ccrc-ndts-notifications
nmrfam-ndts-notifications
uchc-ndts-notifications
Version Tracking
- On start-up the daemon writes its version to the log.
- A file named
/opt/nan-dtdaemon/running_workstation_version-X.Y.Z
records the version and start time.
Experiment Transfer Audit
Each processed experiment appends a line to
/opt/nan-dtdaemon/logs/ndtd_audit.txt
Fields: timestamp • workstation user • NMRhub user (or unselected) • start/end • path • daemon version • action (sent • spooled • sent-spooled • skipped-trivial • skipped-disabled)
Daemon Log File
- Main log:
/opt/nan-dtdaemon/logs/nan-dtdaemon.log
- Verbosity controlled by log_level in
ndtd_configuration.dat
(fatal < error < warning < info < debug < trace).
Example start-up excerpt (INFO):
Thu Sep 28 13:17:03 2023 LOG_START Started dtd logger. Thu Sep 28 13:17:03 2023 INFO NDTD Workstation version is 1.0.15 Thu Sep 28 13:17:03 2023 INFO *** This is a Topspin Workstation *** Thu Sep 28 13:17:03 2023 INFO Ndtd Control Processor listening.
Troubleshooting Checklist
Symptom | What to Check |
---|---|
No new data reaches NAN | • service data-transport-daemon status • Latest heartbeat in vNOC • Gateway log for incoming files |
Repeated offline Slack alerts | Workstation powered off? Network drop? Firewall blocking port 60195? |
Log grows rapidly | log_level left at trace – reset to info
|
Experiments remain spooled | Gateway unreachable → verify IP/port and gateway service status |
Next Step
Return to NDTS Overview or continue to Accessing Collected Data.