NDTS Daemon Operation: Difference between revisions

From Network for Advanced NMR
Jump to navigationJump to search
Created page with "= Running and Monitoring the Daemon = This page explains how to control the **data-transport-daemon** service, verify connectivity, and interpret the logs produced on each spectrometer workstation. == '''Starting, Stopping, and Checking Status''' == <pre> # Start the daemon sudo /sbin/service data-transport-daemon start # Stop the daemon sudo /sbin/service data-transport-daemon stop # Restart (reloads configuration) sudo /sbin/service data-transport-daemon restart #..."
 
No edit summary
Line 1: Line 1:
= Running and Monitoring the Daemon =
= Running and Monitoring the Daemon =


This page explains how to control the **data-transport-daemon** service, verify connectivity, and interpret the logs produced on each spectrometer workstation.
This page explains how to control the '''data-transport-daemon''' service, verify connectivity, and interpret the daemon’s log and audit files on every spectrometer workstation.


== '''Starting, Stopping, and Checking Status''' ==
== '''Service Control''' ==
<pre>
<pre>
# Start the daemon
# Start the daemon
Line 17: Line 17:
sudo /sbin/service data-transport-daemon status
sudo /sbin/service data-transport-daemon status
</pre>
</pre>
*The daemon will refuse to start if an instance is already running on the workstation.*
*The daemon refuses to start if another instance is already running.*


== '''Heartbeat and Connectivity''' ==
== '''Heartbeat and Connectivity''' ==
* By default the daemon sends a **heartbeat** to the Gateway every **10 minutes**.   
* The daemon sends a heartbeat to the Gateway every '''10&nbsp;minutes'''.   
* The Gateway forwards that heartbeat to the NDTS Receiver, where it is logged in the NAN Repository and surfaced in vNOC.
* The Gateway forwards that heartbeat to the NDTS Receiver; entries are visible in vNOC.


=== Slack Notifications ===
=== Slack Notifications ===
When a heartbeat is missed, the Receiver posts alerts to the facility’s Slack channel.
When heartbeats stop, the Receiver posts to the facility’s Slack channel.


{| class="wikitable"
{| class="wikitable"
! Condition !! Time-out !! Action !! Slack message
! Condition !! Time-out !! Receiver Action !! Slack Message
|-
|-
| Missed heartbeat &gt; 20 min || ≈ 20 min || Daemon marked '''offline''' || “*offline*” message (repeats once)
| First missed heartbeat > 20 min | ≈ 20 min | Mark workstation '''offline''' | ''offline''
|-
|-
| Heartbeat resumes || – || Daemon marked '''online''' || “*online*” message
| Heartbeat still missing (next poll) | +8 min | Re-post '''offline''' (max 3) | ''offline''
|-
| Heartbeat resumes | – | Mark workstation '''online''' | ''online''
|}
|}


Channels are named:
Slack channel names (one per facility):


* <code>ccrc-ndts-notifications</code>   
* <code>ccrc-ndts-notifications</code>   
Line 40: Line 42:
* <code>uchc-ndts-notifications</code>
* <code>uchc-ndts-notifications</code>


== '''Version Information''' ==
== '''Version Tracking''' ==
* On daemon start-up, the version is written to the log file (see below).
* On start-up, the daemon writes its version to the log.
* A file named **<code>/opt/nan-dtdaemon/running_workstation_version-X.Y.Z</code>** is created, timestamped with the start time.
* A file named
  <pre>/opt/nan-dtdaemon/running_workstation_version-X.Y.Z</pre>
  contains the version and start timestamp.


== '''Experiment Transfer Audit''' ==
== '''Experiment Transfer Audit''' ==
Every processed experiment adds one line to   
Every processed experiment appends one line to   
<pre>/opt/nan-dtdaemon/logs/ndtd_audit.txt</pre>
<pre>/opt/nan-dtdaemon/logs/ndtd_audit.txt</pre>


Fields:
Fields:


# Timestamp  # Workstation user  # NMRhub user (or ‘‘unselected’’)  # Start & End time   
# Timestamp   
# Path to data  # Daemon version  # Action
# Workstation (Linux) user   
(sent | spooled | sent-spooled | skipped-trivial | skipped-disabled)
# Selected NMRhub user (or ''unselected'')   
# Experiment start & end time   
# Path to experiment data   
# Daemon version   
# Action (sent spooled sent-spooled skipped-trivial skipped-disabled)


== '''Daemon Logs''' ==
== '''Daemon Log File''' ==
* Main log: <pre>/opt/nan-dtdaemon/logs/nan-dtdaemon.log</pre>
* Main log: <pre>/opt/nan-dtdaemon/logs/nan-dtdaemon.log</pre>
* **log_level** is set in <code>ndtd_configuration.dat</code>   
* Verbosity is set by '''log_level''' in <code>ndtd_configuration.dat</code>   
   (fatal &lt; error &lt; warning &lt; info &lt; debug &lt; trace).
   (fatal < error < warning < '''info''' < debug < trace).


Example start-up excerpt (level INFO):
Example start-up excerpt (level INFO):
Line 69: Line 77:
== '''Troubleshooting Checklist''' ==
== '''Troubleshooting Checklist''' ==
{| class="wikitable"
{| class="wikitable"
! Symptom !! Check
! Symptom !! What to Check
|-
|-
| No new data in NAN | • <code>service data-transport-daemon status</code>
| No new data reaches NAN | • <code>service data-transport-daemon status</code><br/>Latest heartbeat timestamp in vNOC<br/>• Gateway log for incoming files
Heartbeat timestamp in vNOC
• Gateway log for incoming files
|-
|-
| Slack “offline” alerts | Workstation powered off? Network drop? Firewall blocking port 60195?
| Repeated ''offline'' Slack alerts | Workstation powered off? Network drop? Firewall still allowing port&nbsp;60195?
|-
|-
| Log file grows rapidly | <code>log_level trace</code> left enabled → reset to '''info'''
| Log growing rapidly | <code>log_level trace</code> left enabled → set back to '''info'''
|-
|-
| Experiments marked ‘‘spooled’’ only | Gateway unreachable → verify IP/port and gateway service status
| Experiments stay ''spooled'' | Gateway unreachable → verify IP/port and Gateway service status
|}
|}
== '''Next Step''' ==
*Return to [[NDTS Overview|NDTS Overview]] or proceed to [[NDTS_Data_Access|Accessing Collected Data]].*

Revision as of 19:23, 2 June 2025

Running and Monitoring the Daemon

This page explains how to control the data-transport-daemon service, verify connectivity, and interpret the daemon’s log and audit files on every spectrometer workstation.

Service Control

# Start the daemon
sudo /sbin/service data-transport-daemon start

# Stop the daemon
sudo /sbin/service data-transport-daemon stop

# Restart (reloads configuration)
sudo /sbin/service data-transport-daemon restart

# Check status
sudo /sbin/service data-transport-daemon status
  • The daemon refuses to start if another instance is already running.*

Heartbeat and Connectivity

  • The daemon sends a heartbeat to the Gateway every 10 minutes.
  • The Gateway forwards that heartbeat to the NDTS Receiver; entries are visible in vNOC.

Slack Notifications

When heartbeats stop, the Receiver posts to the facility’s Slack channel.

Condition Time-out Receiver Action Slack Message
≈ 20 min | Mark workstation offline | offline
+8 min | Re-post offline (max 3) | offline
– | Mark workstation online | online

Slack channel names (one per facility):

  • ccrc-ndts-notifications
  • nmrfam-ndts-notifications
  • uchc-ndts-notifications

Version Tracking

  • On start-up, the daemon writes its version to the log.
  • A file named
/opt/nan-dtdaemon/running_workstation_version-X.Y.Z
 contains the version and start timestamp.

Experiment Transfer Audit

Every processed experiment appends one line to

/opt/nan-dtdaemon/logs/ndtd_audit.txt

Fields:

  1. Timestamp
  2. Workstation (Linux) user
  3. Selected NMRhub user (or unselected)
  4. Experiment start & end time
  5. Path to experiment data
  6. Daemon version
  7. Action (sent • spooled • sent-spooled • skipped-trivial • skipped-disabled)

Daemon Log File

  • Main log:
    /opt/nan-dtdaemon/logs/nan-dtdaemon.log
  • Verbosity is set by log_level in ndtd_configuration.dat
 (fatal < error < warning < info < debug < trace).

Example start-up excerpt (level INFO):

Thu Sep 28 13:17:03 2023 LOG_START Started dtd logger.
Thu Sep 28 13:17:03 2023 INFO NDTD Workstation version is 1.0.15
Thu Sep 28 13:17:03 2023 INFO *** This is a Topspin Workstation ***
Thu Sep 28 13:17:03 2023 INFO Ndtd Control Processor listening.

Troubleshooting Checklist

Symptom What to Check
service data-transport-daemon status
• Latest heartbeat timestamp in vNOC
• Gateway log for incoming files
Workstation powered off? Network drop? Firewall still allowing port 60195?
log_level trace left enabled → set back to info
Gateway unreachable → verify IP/port and Gateway service status

Next Step