Datasets: Difference between revisions

From Network for Advanced NMR
Jump to navigationJump to search
No edit summary
No edit summary
 
(45 intermediate revisions by one other user not shown)
Line 1: Line 1:
{{DataBrowser}}
{{Datasets}}
= Data Browser: Datasets =
= Data Browser: Datasets =


The Dataset Browser allows users to explore and manage datasets they are authorized to access. Access permissions are granted either because the dataset is ''public'' or through ''lab-based'', ''user-based'', or ''collaborative'' permissions authorized by a Principal Investigator (PI).
The Dataset Browser allows users to explore and manage datasets they are authorized to access. Access [[Lab Permissions|permissions]] are granted either because the dataset is ''public'' or through ''lab-based'', ''user-based'', or ''collaborative'' permissions authorized by a Principal Investigator (PI).


The Dataset Browser includes:
The Dataset Browser includes:
* A '''Navigation Pane''' on the left side for switching between dataset views and hierarchical organization of My & Lab data described below.
* A '''[[Datasets#Navigation Pane|Navigation Pane]]''' on the left side for switching between dataset views and hierarchical organization of My & Lab data described below.
* A '''Dataset''' '''Table''' for displaying rows of datasets with columns representing metadata that may be sorted and filtered.
* A [[Datasets#Dataset Table|'''Dataset''' '''Table''']] for displaying rows of datasets with columns representing metadata that may be sorted and filtered.
* '''Customization Tools''' in the upper-right corner to configure columns, saved views, and filters.
* '''[[Datasets#Customization Tools|Customization Tools]]''' in the upper-right corner to configure columns, saved views, and filters.
* Advanced '''Selection, Filtering and Sorting''' of datasets
* Advanced '''[[Datasets#Selection, Sorting, and Filtering|Selection, Sorting, and Filtering]]''' of datasets
* An '''Upload Datasets''' button to submit datasets that were not harvested by NDTS.
* Ability to perform [[Datasets#Actions|'''Actions''']] quickly and easily from a context menu to one or more datasets
* An '''[[Datasets#Upload Datasets|Upload Datasets]]''' button to submit datasets that were not harvested by NDTS.


== Navigation Pane ==
== Navigation Pane ==
The Navigation Pane allows users to quickly access datasets across different categories. Unauthenticated users will only see ''All Public Datasets'' and ''Knowledgebase Datasets''.
[[File:Navigation pane.png|thumb|Navigation pane]]
The Navigation Pane allows users to quickly access datasets across different categories. Unauthenticated users will only see '''''All Public Datasets''''' and '''''Knowledgebase Datasets'''''.


=== All Datasets ===
=== All Datasets ===
Line 24: Line 29:
=== My & Lab Data ===
=== My & Lab Data ===
: Displays datasets accessible via user- or lab-based permissions (excluding datasets that are visible only due to being public).
: Displays datasets accessible via user- or lab-based permissions (excluding datasets that are visible only due to being public).
: This section includes a hierarchical organization mirroring a file system:
: This section includes a hierarchical organization mirroring a file system including user defined Collections and Lab controlled [[Project, Studies, and Collections|Projects --> Studies --> Collections]]
:* '''My Collections''' – personal collections created by the user.
:* '''My Collections''' – personal collections created by the user.
:* '''Projects''' – high-level groupings for data organization
:* '''Projects''' – high-level groupings for data organization
:** '''Studies''' – reside inside Projects to allow datasets from a given study to be grouped
:** '''Studies''' – reside inside Projects to allow datasets from a given study to be grouped
:*** '''Collections''' – reside inside Studies to allow fine-grained dataset groupings\
:*** '''Collections''' – reside inside Studies to allow fine-grained dataset groupings


== Dataset Table ==
== Dataset Table ==
Line 37: Line 42:


=== Table Columns ===
=== Table Columns ===
Columns represent different metadata fields for the NAN dataset. There is a default list of columns that are displayed, but users can toggle different columns on and off as desired by selecting the wrench icon in the upper right hand corner of the dataset browser. Columns may be re-order by dragging them. The columns to be displayed, along with their order, is saved in the NAN database as a user preference and will persist across sessions, browsers, and computers. See [[Dataset Columns]] for a complete list of columns with a short description.
Columns represent different metadata fields for the NAN dataset. There is a default list of columns that are displayed, but users can toggle different columns on and off as desired by selecting the wrench icon in the upper right hand corner of the dataset browser. Columns may be re-order by dragging them. The columns to be displayed, along with their order, is saved in the NAN database as a user preference and will persist across sessions, browsers, and computers. See [[Dataset Columns]] for a complete list of columns with a short description and the types of filters that may be applied.


==== Redundant Status Column ====
* By default, the Data Browser displays only '''Preferred''' datasets to provide a clean and responsive interface.
* Users can identify datasets with redundancies via an icon on the preferred dataset. Clicking the icon opens the full set for review (see Icon badges below)
* A '''Redundant''' column may be enabled in the browser view. When this column is active, the Data Browser shows both preferred and redundant datasets for complete visibility.
==== Display Name / Dataset Name ====
==== Display Name / Dataset Name ====
When a dataset is harvested by the NAN Data Transport System it is stored in the NAN database with a unique UUID (hidden from the user) and is given a Dataset Name (non-editable) that matches the experimental directory from the NMR spectrometer.
When a dataset is harvested by the NAN Data Transport System it is stored in the NAN database with a unique UUID (hidden from the user) and is given a Dataset Name (non-editable) that matches the experimental directory from the NMR spectrometer.
Line 58: Line 67:


== Customization Tools ==
== Customization Tools ==
[[File:Customization setting-tools.png|thumb|Tools to customize and utilize views and filters]]
[[File:Customization setting-tools.png|thumb|Tools to customize and utilize views and filters|400x400px]]


=== Wrench Icon ===
=== Wrench Icon ===


* Brings up a pull-down menu to toggle which columns are shown in the dataset table.
* Brings up a pull-down menu to toggle which columns are shown in the dataset table.
* Allows a View to be Created, overwritten, or deleted. Note that the columns that are displayed are saved as a user preference and are not tied to a View. A View defines the applied filters and sorts to the columns and is independent on which columns are visible. For example, maybe you had a project where all the datasets were collected between two dates so you define a View to filter only datasets from specific users involved in the project that lie between two dates so that you can quickly see those datasets without the need to reapply the filters.
* Allows a View to be created, overwritten, or deleted. Note that the columns that are displayed are saved as a user preference and are not tied to a View. A View defines the applied filters and sorts to the columns and is independent on which columns are visible. For example, maybe you had a project where all the datasets were collected between two dates so you define a View to filter only datasets from specific users involved in the project that lie between two dates so that you can quickly see those datasets without the need to reapply the filters.


=== Saved Views ===
=== Saved Views ===
Line 88: Line 97:


* Shown as a circle with a line through it. The Icon become visible when one or more datasets are selected and pressing it will clear all selections. Can be very handy when datasets are selected, but not visible on the screen.
* Shown as a circle with a line through it. The Icon become visible when one or more datasets are selected and pressing it will clear all selections. Can be very handy when datasets are selected, but not visible on the screen.
== Selection, Sorting, and Filtering ==
[[File:Filter on NAN User.png|thumb|Text filter with two rules]]
=== Selection ===
Datasets can be selected by clicking on the ''Display Name''. The Dataset Browser supports multi-selection, with a checkbox next to each Display Name indicating selection status.
To select multiple datasets:
* Hold the ''Ctrl'' key (or ''Cmd'' on Mac) and click on Display Names to toggle individual selections.
* Hold the ''Shift'' key to select a range from the last selected to the current dataset.
'''''IMPORTANT NOTE''''': There is inconsistent behavior when using ''Shift'' and ''Ctrl'' keys with the checkboxes. It is strongly recommended to use the Display Name for selection. Treat checkboxes only as visual indicators.


== Selection, Filtering, and Sorting ==
By default, datasets are sorted by date, with the most recent shown first. Sorting and filtering are available for all columns.
The default sorting of datasets on the dataset browser is by date with the most recent dataset on top. However, users may sort and filter based on any column.


Each column has a sort button that when pressed will sort the column in ascending order. Clicking it again will sort in descending order.
=== Sorting ===


Each column header includes a sort button (up/down arrows). Click once to sort in ascending order; click again to sort in descending order.


<nowiki>***</nowiki>
=== Filtering ===


=== How to Download Experiments ===
Each column header includes a filter icon that opens a filtering dialog. The available filter types depend on the column's data type. The table below summarizes available filters. See [[Dataset Columns]] for which filter type apply to each column.
# Select one or more experiments using the checkbox icon or by right-clicking.
# Right-click and choose '''Download'''.
# Select the download format:
#* '''Organized for Topspin''' – maintains Bruker format hierarchy.
#* '''Organized by Experiment''' – each experiment in its own folder.


----
{| class="wikitable"
|+ Filter Types
! Date !! Boolean !! Text !! Number !! Controlled List<br />''Classification'' !! Controlled List<br />''Transfer Mode'' !! Tags
|-
| equals || yes || equals || equals || || || includes
|-
| before || no || does not equal || does not equal || || || does not include
|-
| after || is unset || contains || greater than || || ||
|-
|  || is set || does not contain || less than || || ||
|-
|  ||  || similar to ||  || || ||
|-
|  ||  || starts with ||  || || ||
|-
|  ||  || ends with ||  || || ||
|-
|  ||  || is unset ||  || || ||
|}


=== How to Link Datasets to a Sample ===
For all filter types except Boolean, users can add multiple filter rules per column. If multiple rules are added, the user must specify whether to "Match All" (AND) or "Match Any" (OR). This setting is ignored if only one rule is applied.
There are two ways:
# '''From the Dataset Editor'''
#* Double-click a dataset you have '''write access''' to.
#* Click '''Find & Link Sample''', select a sample, and click '''Save'''.


# '''From the Table View'''
Advanced filters can also span multiple columns. While building complex filters may take effort, users can save views for reuse. See [[Datasets#Customization Tools|Customization Tools]] for details.
#* Select datasets, right-click, and choose '''Link Sample'''.
#* Select a sample and click '''Save'''.


----
== Actions ==
[[File:Dataset Context Menu.png|thumb|364x364px|Context menu for dataset actions]]


=== Context Menu ===
The <nowiki>'''</nowiki>Actions<nowiki>'''</nowiki> menu is accessed by right-clicking on a dataset row in the Dataset Browser. For multiple selections, right-click on any of the selected rows to perform bulk actions. Available actions depend on user permissions—actions unavailable to the user will appear grayed out.


Right-clicking a dataset opens the context menu. Available actions depend on the user’s permissions; unavailable actions appear grayed out.
Below is a table of available actions with some providing links to a page with additional details.


Available options may include:
{| class="wikitable"
|+
! Action
! Bulk Action Capable
! Description
|-
| [[Dataset Editing|View / Edit Dataset]]
| Possibly
| Opens a modal window to view or edit the selected dataset.
|-
| [[Dataset reassignment|Reassign]]
| Yes
| Assigns or reassigns a dataset to a ''NAN user''. Facility managers can reassign datasets to any user without time restrictions. Standard users can reassign within their lab group for up to three months after harvesting.
|-
| [[Download Datasets|Download]]
| Yes
| Downloads datasets in a variety of organizational layouts.
|-
| [[NMRbox Integration]]
| Yes
| Copies a dataset from the NAN archive to the user’s NMRbox home folder in a predefined location. Also enables retrieval of post-acquisition files from NMRbox back into the NAN archive.
|-
| [[Supplemental Data]]
| No
| Adds or views supplemental data associated with a dataset.
|-
| [[Redundant Datasets|Redundancy]]
| Yes
| Sets the dataset’s ''redundancy status'' as “preferred” or “redundant.” By default, the most recent experiment in a redundant set is marked as preferred.
|-
| [[Public and Publishing|Make Public]]
| Yes
| Marks the dataset as publicly available.
|-
| [[Link Sample]]
| Yes
| Links a dataset to a sample.
|-
| [[Dataset Classification|Classification]]
| Yes
| Allows uses to classify datasets from a controlled list. Allows NMR facility managers to target a dataset to be removed from the NAN archive.
|-
| Tags
| Yes
| Allows users to assign arbitrary tags to datasets.
|-
| Notes
| Yes
| Allows users to add notes to datasets.
|-
| Unlink from Collection
| Yes
| Removes a dataset from a collection.
|-
| [[Public and Publishing|Publish]]
| Yes
| Publishes a dataset.
|-
|Copy Dataset Link
|No
|Copies the URL of a dataset to the Clipboard
|}


# '''Edit Dataset'''
== Upload Datasets ==
#* Update metadata, classification, or redundancy status.
# '''Reassign'''
#* Assign to another lab user or reject misaligned data.
#* Rejected data (within 3 months) goes to the facility manager.
# '''Download'''
#* Download dataset(s).
# '''NMRbox Integration'''
#* Copy dataset to NMRbox home directory.
# '''Supplemental Data'''
#* Upload related data files.
# '''Redundancy'''
#* Mark as preferred/redundant.
# '''Link Sample'''
#* Associate with a sample (shows up in the Sample column).
# '''Classification'''
#* Label as:
#** Calibration experiment 
#** Failed – sample, instrument, or setup related 
#** Successful experiment 
#** Test experiment 
# '''Tags'''
#* Add searchable tags.
# '''Notes'''
#* Add or edit descriptive notes.
# '''Unlink from Collection'''
#* Remove from a dataset collection.
# '''Make Public'''
#* Permanently make dataset public (cannot be undone).
# '''Publish'''
#* Permanently publish dataset (cannot be undone or edited).


See the [[Arbitrary Dataset Upload]] page for details
[[Category:Data Browser]]
[[Category:Data Browser]]

Latest revision as of 18:48, 23 June 2025

Data Browser: Datasets

The Dataset Browser allows users to explore and manage datasets they are authorized to access. Access permissions are granted either because the dataset is public or through lab-based, user-based, or collaborative permissions authorized by a Principal Investigator (PI).

The Dataset Browser includes:

  • A Navigation Pane on the left side for switching between dataset views and hierarchical organization of My & Lab data described below.
  • A Dataset Table for displaying rows of datasets with columns representing metadata that may be sorted and filtered.
  • Customization Tools in the upper-right corner to configure columns, saved views, and filters.
  • Advanced Selection, Sorting, and Filtering of datasets
  • Ability to perform Actions quickly and easily from a context menu to one or more datasets
  • An Upload Datasets button to submit datasets that were not harvested by NDTS.

Navigation Pane

Navigation pane

The Navigation Pane allows users to quickly access datasets across different categories. Unauthenticated users will only see All Public Datasets and Knowledgebase Datasets.

All Datasets

Displays all datasets the user can access, including public and permission-granted datasets.

All Public Datasets

Displays only datasets marked as public.

Knowledgebase Datasets

Displays a curated subset of public datasets that are highly annotated and intended to aid users in experimental planning and analysis.

My & Lab Data

Displays datasets accessible via user- or lab-based permissions (excluding datasets that are visible only due to being public).
This section includes a hierarchical organization mirroring a file system including user defined Collections and Lab controlled Projects --> Studies --> Collections
  • My Collections – personal collections created by the user.
  • Projects – high-level groupings for data organization
    • Studies – reside inside Projects to allow datasets from a given study to be grouped
      • Collections – reside inside Studies to allow fine-grained dataset groupings

Dataset Table

The Dataset Table displays all datasets for which the user has at least read access.

Table Rows

Each row highlights a dataset in the NAN archive

Table Columns

Columns represent different metadata fields for the NAN dataset. There is a default list of columns that are displayed, but users can toggle different columns on and off as desired by selecting the wrench icon in the upper right hand corner of the dataset browser. Columns may be re-order by dragging them. The columns to be displayed, along with their order, is saved in the NAN database as a user preference and will persist across sessions, browsers, and computers. See Dataset Columns for a complete list of columns with a short description and the types of filters that may be applied.

Redundant Status Column

  • By default, the Data Browser displays only Preferred datasets to provide a clean and responsive interface.
  • Users can identify datasets with redundancies via an icon on the preferred dataset. Clicking the icon opens the full set for review (see Icon badges below)
  • A Redundant column may be enabled in the browser view. When this column is active, the Data Browser shows both preferred and redundant datasets for complete visibility.

Display Name / Dataset Name

When a dataset is harvested by the NAN Data Transport System it is stored in the NAN database with a unique UUID (hidden from the user) and is given a Dataset Name (non-editable) that matches the experimental directory from the NMR spectrometer.

  • VNMRJ: "expN"
  • Bruker: "experiment/N"

As the Dataset Name is generally not a useful description of the experiment we also create a Display Name that is user editable to allow users to create a more descriptive and meaningful label. When downloading, the dataset is saved using the original Dataset Name and the the Display Name is saved in a CSV file within the dataset folder. Note that the Display Name is fixed as the first column of the dataset table and cannot be altered. The Data Name column is not displayed by default, but can be toggled on if desired.

Icon badges

Icon badges in select columns represent additional information about the dataset and provide navigation links as described here:

  • Display Name icon badges
    • A circle with a star represents that the dataset is marked as a "preferred" dataset of a redundant set. In parenthesis will be the total number of datasets in the redundant set. Clicking the star will show all the datasets in the redundant set and provide a breadcrumb to navigate back to the default view.
    • A small clock icon indicates that the dataset has been published. Clicking the icon will allow you to navigate to previous published versions or the original dataset.
  • Sample icons
    • A link icon indicates that a sample has been linked to the dataset and clicking the link icon will bring you to the Sample Browser filtered on the linked sample.

Pagination

At the bottom of the table is a pagination control. Users can move between pages and adjust the number of rows displayed per page: 25, 50, 100, or 500 datasets per page

Customization Tools

Tools to customize and utilize views and filters

Wrench Icon

  • Brings up a pull-down menu to toggle which columns are shown in the dataset table.
  • Allows a View to be created, overwritten, or deleted. Note that the columns that are displayed are saved as a user preference and are not tied to a View. A View defines the applied filters and sorts to the columns and is independent on which columns are visible. For example, maybe you had a project where all the datasets were collected between two dates so you define a View to filter only datasets from specific users involved in the project that lie between two dates so that you can quickly see those datasets without the need to reapply the filters.

Saved Views

  • Pull-down list of saved views (defined filters and sort)

Quick Filters

Quick filters apply predefined views to narrow down datasets. Current Quick Filters include:

  • Successful Datasets Only – shows datasets marked as successful.
  • Hide Failed Datasets – hides datasets marked as failed.
  • My Data – datasets owned by the logged-in user.
  • KB Datasets – datasets published in the Knowledgebase.

Note that successful and failed dataset filters rely on proper classification of datasets

Remove Filters Icon

  • When no filters are applied to any columns the icon appears faded and is not selectable
  • When not faded and selectable will clear all applied filters and sorts
  • When the icon contains an exclamation point it means filters or sorts for a non visible column are active. Pressing the icon will prompt if all filters and sorts should be removed or only those for the non-visible columns.

Selection Icon

  • Shown as a circle with a line through it. The Icon become visible when one or more datasets are selected and pressing it will clear all selections. Can be very handy when datasets are selected, but not visible on the screen.

Selection, Sorting, and Filtering

Text filter with two rules

Selection

Datasets can be selected by clicking on the Display Name. The Dataset Browser supports multi-selection, with a checkbox next to each Display Name indicating selection status.

To select multiple datasets:

  • Hold the Ctrl key (or Cmd on Mac) and click on Display Names to toggle individual selections.
  • Hold the Shift key to select a range from the last selected to the current dataset.

IMPORTANT NOTE: There is inconsistent behavior when using Shift and Ctrl keys with the checkboxes. It is strongly recommended to use the Display Name for selection. Treat checkboxes only as visual indicators.

By default, datasets are sorted by date, with the most recent shown first. Sorting and filtering are available for all columns.

Sorting

Each column header includes a sort button (up/down arrows). Click once to sort in ascending order; click again to sort in descending order.

Filtering

Each column header includes a filter icon that opens a filtering dialog. The available filter types depend on the column's data type. The table below summarizes available filters. See Dataset Columns for which filter type apply to each column.

Filter Types
Date Boolean Text Number Controlled List
Classification
Controlled List
Transfer Mode
Tags
equals yes equals equals includes
before no does not equal does not equal does not include
after is unset contains greater than
is set does not contain less than
similar to
starts with
ends with
is unset

For all filter types except Boolean, users can add multiple filter rules per column. If multiple rules are added, the user must specify whether to "Match All" (AND) or "Match Any" (OR). This setting is ignored if only one rule is applied.

Advanced filters can also span multiple columns. While building complex filters may take effort, users can save views for reuse. See Customization Tools for details.

Actions

Context menu for dataset actions

The '''Actions''' menu is accessed by right-clicking on a dataset row in the Dataset Browser. For multiple selections, right-click on any of the selected rows to perform bulk actions. Available actions depend on user permissions—actions unavailable to the user will appear grayed out.

Below is a table of available actions with some providing links to a page with additional details.

Action Bulk Action Capable Description
View / Edit Dataset Possibly Opens a modal window to view or edit the selected dataset.
Reassign Yes Assigns or reassigns a dataset to a NAN user. Facility managers can reassign datasets to any user without time restrictions. Standard users can reassign within their lab group for up to three months after harvesting.
Download Yes Downloads datasets in a variety of organizational layouts.
NMRbox Integration Yes Copies a dataset from the NAN archive to the user’s NMRbox home folder in a predefined location. Also enables retrieval of post-acquisition files from NMRbox back into the NAN archive.
Supplemental Data No Adds or views supplemental data associated with a dataset.
Redundancy Yes Sets the dataset’s redundancy status as “preferred” or “redundant.” By default, the most recent experiment in a redundant set is marked as preferred.
Make Public Yes Marks the dataset as publicly available.
Link Sample Yes Links a dataset to a sample.
Classification Yes Allows uses to classify datasets from a controlled list. Allows NMR facility managers to target a dataset to be removed from the NAN archive.
Tags Yes Allows users to assign arbitrary tags to datasets.
Notes Yes Allows users to add notes to datasets.
Unlink from Collection Yes Removes a dataset from a collection.
Publish Yes Publishes a dataset.
Copy Dataset Link No Copies the URL of a dataset to the Clipboard

Upload Datasets

See the Arbitrary Dataset Upload page for details