NDTS Daemon Operation: Difference between revisions

From Network for Advanced NMR
Jump to navigationJump to search
No edit summary
Line 1: Line 1:
= Running and Monitoring the Daemon =
= Running and Monitoring the Daemon =


This page explains how to control the '''data-transport-daemon''' service, verify connectivity, and interpret the daemon’s log and audit files on every spectrometer workstation.  
This page explains how to control the '''data-transport-daemon''' service, verify connectivity, and interpret the daemon’s log and audit files on every spectrometer workstation.


== '''Service Control''' ==
== '''Service Control''' ==
Line 27: Line 27:


{| class="wikitable"
{| class="wikitable"
! Condition !! Time-out !! Receiver Action !! Slack Message
! Condition
! Time-out
! Receiver Action
! Slack Message
|-
|-
| First missed heartbeat > 20 min | 20 min | Mark workstation '''offline''' | ''offline''
| First missed heartbeat
| > 20 min
| Mark workstation '''offline'''
| ''offline''
|-
|-
| Heartbeat still missing (next poll) | +8 min | Re-post '''offline''' (max 3) | ''offline''
| Still missing at next poll
| + 8 min
| Repeat ''offline'' (max 3)
| ''offline''
|-
|-
| Heartbeat resumes | – | Mark workstation '''online''' | ''online''
| Heartbeat resumes
| –  
| Mark workstation '''online'''
| ''online''
|}
|}


Slack channel names (one per facility):
Slack channels (one per facility):
 
* <code>ccrc-ndts-notifications</code>
* <code>ccrc-ndts-notifications</code>
* <code>nmrfam-ndts-notifications</code>
* <code>nmrfam-ndts-notifications</code>
* <code>uchc-ndts-notifications</code>
* <code>uchc-ndts-notifications</code>


== '''Version Tracking''' ==
== '''Version Tracking''' ==
* On start-up, the daemon writes its version to the log.
* On start-up the daemon writes its version to the log.
* A file named   
* A file named   
   <pre>/opt/nan-dtdaemon/running_workstation_version-X.Y.Z</pre>
   <pre>/opt/nan-dtdaemon/running_workstation_version-X.Y.Z</pre>
   contains the version and start timestamp.
   records the version and start time.


== '''Experiment Transfer Audit''' ==
== '''Experiment Transfer Audit''' ==
Every processed experiment appends one line to   
Each processed experiment appends a line to   
<pre>/opt/nan-dtdaemon/logs/ndtd_audit.txt</pre>
<pre>/opt/nan-dtdaemon/logs/ndtd_audit.txt</pre>


Fields:
Fields: timestamp • workstation user NMRhub user (or ''unselected'') start/end • path • daemon version • action  
 
(sent • spooled • sent-spooled • skipped-trivial • skipped-disabled)
# Timestamp 
# Workstation (Linux) user
# Selected NMRhub user (or ''unselected'')
# Experiment start & end time 
# Path to experiment data 
# Daemon version   
# Action (sent • spooled • sent-spooled • skipped-trivial • skipped-disabled)


== '''Daemon Log File''' ==
== '''Daemon Log File''' ==
* Main log: <pre>/opt/nan-dtdaemon/logs/nan-dtdaemon.log</pre>
* Main log: <pre>/opt/nan-dtdaemon/logs/nan-dtdaemon.log</pre>
* Verbosity is set by '''log_level''' in <code>ndtd_configuration.dat</code>   
* Verbosity controlled by '''log_level''' in <code>ndtd_configuration.dat</code>   
   (fatal < error < warning < '''info''' < debug < trace).
   (fatal &lt; error &lt; warning &lt; '''info''' &lt; debug &lt; trace).


Example start-up excerpt (level INFO):
Example start-up excerpt (INFO):
<pre>
<pre>
Thu Sep 28 13:17:03 2023 LOG_START Started dtd logger.
Thu Sep 28 13:17:03 2023 LOG_START Started dtd logger.
Line 77: Line 81:
== '''Troubleshooting Checklist''' ==
== '''Troubleshooting Checklist''' ==
{| class="wikitable"
{| class="wikitable"
! Symptom !! What to Check
! Symptom
! What to Check
|-
|-
| No new data reaches NAN | • <code>service data-transport-daemon status</code><br/>• Latest heartbeat timestamp in vNOC<br/>• Gateway log for incoming files
| No new data reaches NAN
| • <code>service data-transport-daemon status</code><br/>• Latest heartbeat in vNOC<br/>• Gateway log for incoming files
|-
|-
| Repeated ''offline'' Slack alerts | Workstation powered off? Network drop? Firewall still allowing port&nbsp;60195?
| Repeated ''offline'' Slack alerts
| Workstation powered off? Network drop? Firewall blocking port 60195?
|-
|-
| Log growing rapidly | <code>log_level trace</code> left enabled → set back to '''info'''
| Log grows rapidly
| <code>log_level</code> left at '''trace''' – reset to '''info'''
|-
|-
| Experiments stay ''spooled'' | Gateway unreachable → verify IP/port and Gateway service status
| Experiments remain ''spooled''
| Gateway unreachable → verify IP/port and gateway service status
|}
|}


== '''Next Step''' ==
== '''Next Step''' ==
*Return to [[NDTS Overview|NDTS Overview]] or proceed to [[NDTS_Data_Access|Accessing Collected Data]].*
Return to [[NDTS Overview|NDTS Overview]] or continue to [[NDTS_Data_Access|Accessing Collected Data]].

Revision as of 19:26, 2 June 2025

Running and Monitoring the Daemon

This page explains how to control the data-transport-daemon service, verify connectivity, and interpret the daemon’s log and audit files on every spectrometer workstation.

Service Control

# Start the daemon
sudo /sbin/service data-transport-daemon start

# Stop the daemon
sudo /sbin/service data-transport-daemon stop

# Restart (reloads configuration)
sudo /sbin/service data-transport-daemon restart

# Check status
sudo /sbin/service data-transport-daemon status
  • The daemon refuses to start if another instance is already running.*

Heartbeat and Connectivity

  • The daemon sends a heartbeat to the Gateway every 10 minutes.
  • The Gateway forwards that heartbeat to the NDTS Receiver; entries are visible in vNOC.

Slack Notifications

When heartbeats stop, the Receiver posts to the facility’s Slack channel.

Condition Time-out Receiver Action Slack Message
First missed heartbeat > 20 min Mark workstation offline offline
Still missing at next poll + 8 min Repeat offline (max 3) offline
Heartbeat resumes Mark workstation online online

Slack channels (one per facility):

  • ccrc-ndts-notifications
  • nmrfam-ndts-notifications
  • uchc-ndts-notifications

Version Tracking

  • On start-up the daemon writes its version to the log.
  • A file named
/opt/nan-dtdaemon/running_workstation_version-X.Y.Z
 records the version and start time.

Experiment Transfer Audit

Each processed experiment appends a line to

/opt/nan-dtdaemon/logs/ndtd_audit.txt

Fields: timestamp • workstation user • NMRhub user (or unselected) • start/end • path • daemon version • action (sent • spooled • sent-spooled • skipped-trivial • skipped-disabled)

Daemon Log File

  • Main log:
    /opt/nan-dtdaemon/logs/nan-dtdaemon.log
  • Verbosity controlled by log_level in ndtd_configuration.dat
 (fatal < error < warning < info < debug < trace).

Example start-up excerpt (INFO):

Thu Sep 28 13:17:03 2023 LOG_START Started dtd logger.
Thu Sep 28 13:17:03 2023 INFO NDTD Workstation version is 1.0.15
Thu Sep 28 13:17:03 2023 INFO *** This is a Topspin Workstation ***
Thu Sep 28 13:17:03 2023 INFO Ndtd Control Processor listening.

Troubleshooting Checklist

Symptom What to Check
No new data reaches NAN service data-transport-daemon status
• Latest heartbeat in vNOC
• Gateway log for incoming files
Repeated offline Slack alerts Workstation powered off? Network drop? Firewall blocking port 60195?
Log grows rapidly log_level left at trace – reset to info
Experiments remain spooled Gateway unreachable → verify IP/port and gateway service status

Next Step

Return to NDTS Overview or continue to Accessing Collected Data.