flowchart LR A[Efficiency] --> B[Collaboration] B --> C[Reproducibility]
A guide for researchers
Agenda
Why do we care about sharing data?
Current issues with data
Principles of sharing data
General guidelines for dataset deposits
Data submission checklist
Canadian generalist repositories
Some reasons to share research data:
Avoid unnecessary or costly experiments by using previous research results.
Validate findings: Independent verification of research results.
Repurpose data for new research questions, in combination with other datasets or as an educational resource.
Build on previous work to accelerate scientific discovery and meta-analysis based on raw data and reproducible research.
The Goverment of Canada promotes RDM in its Tri-Agency Research Data Management Policy.
Through its federal funding agencies, the Government of Canada seeks to implement data management plans (DMPs) and sharing of research data to maximize the benefits to society.
Data sharing is NOT ONLY to meet the requirements of funding agencies and journals. It is an ethical and professional responsibility of researchers to ensure access and reuse of data in the context of open reproducible science.
flowchart LR A[Efficiency] --> B[Collaboration] B --> C[Reproducibility]
flowchart LR A[Rigorosity] --> B[Validation] B --> C[Open science]
flowchart LR A["Transparency"] --> B["Accountability"] B--> C["Return on investment"]
Agenda
Why do we care about sharing data?
Current issues with data
Principles of sharing data
General guidelines for dataset deposits
Data submission checklist
Canadian generalist repositories
In most research environments, data are stored across a wide range of locations, and often, in an uncoordinated way.
Although many research articles claim that data are “available upon request,” the reality often tells a different story.
Data availability statement
“The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.”
When we shared data typically:
Lacks comprehensive metadata and README file(s) explaining the context, methodology, and structure of the dataset.
Unstructured and lacks naming conventions, making it impossible to reuse it.
Are considered supplement to research articles.
Agenda
Why do we care about sharing data?
Current issues with data
Principles of sharing data
General guidelines for dataset deposits
Data submission checklist
Canadian generalist repositories
This means that the dataset should be:
Well documented: Others should be able to understand the dataset without needing access to the original researchers or article.
Organized: File structures, naming conventions, and folder hierarchies should be logical and easy to follow.
Discoverable: The dataset should be accesible and reusable by the community.
In general, datasets must follow the FAIR principles:
Findable
Accessible
Interoperable
Reusable
Agenda
Why do we care about sharing data?
Current issues with data
Principles of sharing data
General guidelines for dataset deposits
Data submission checklist
Canadian generalist repositories
Provide a descriptive title, summary and keywords that reflect the content of the dataset.
Include a READMEfile with metadata.
Define a dataset structure and consistent naming conventions.
Make sure that the title reflects the nature and content of the dataset.
Original title: PiPaw2.0
Improved title: Home cage based motor learning platform PiPaw2.0
Original title: Foliar Functional Trait Mapping
Improved title: Foliar Functional Trait Mapping of a mixed temperate forest using imaging spectroscopy
Original title: Covariation in Width and Depth in Bedrock Rivers Data Archive
Improved title: Data archive for width and depth covariation within the bedrock Fraser Canyon, British Columbia, Canada
Caution
The title of the dataset IS NOT the same as the title of the associated research article.
The description of the dataset shall contain sufficient elements to understand its nature, origin and content.
Original description: This dataset provides climate data (19 bioclimate variables as defined by worldclim) that were generated using the Biosim 11 software at a spatial resolution of 9 km across Canada between 1980-2020.
Improved description: This dataset provides climate data (19 bioclimate variables as defined by worldclim) that were generated using the Biosim 11 software at a spatial resolution of 9 km across Canada between 1980-2020. Please refer to https://www.worldclim.org/data/bioclim.html for information about the variables. The dataset contains: the annual mean temperature, mean diurnal range, isothermality, temperature seasonality, maximum temperature of warmest month, minimum temperature of coldest month, temperature annual range, mean temperature of wettest quarter, mean temperature of driest quarter, mean temperature of warmest quarter, mean temperature of coldest quarter, annual precipitation, precipitation of wettest month, precipitation of driest month, precipitation seasonality (coefficient of variation), precipitation of wettest quarter, precipitation of driest quarter, precipitation of warmest quarter, precipitation of coldest quarter.
Original description: Exposure to neuromodulatory chemicals in the polychaete marine worm, Capitella teleta, has been used to assess changes in locomotory behavior in adult and juvenile life stages.Worms were exposed to nicotine, fluoxetine, apomorphine, and phenobarbital and had their distance moved, maximum velocity, time to/at the edge of the arena, and time to first move measured.
Improved description: The presence of compounds such as pharmaceuticals and pesticides act as neurochemicals in aquatic organisms. This repository contains the raw data from a study investigating the effects of neuromodulatory chemicals in the marine polychaete worm Capitella teleta. We investigated the effects of nicotine, fluoxetine, apomorphine and phenobarbital, which are known to interact with acetylcholine, serotonin, dopamine and GABA pathways. We measured locomotory behavior using a high throughput multi-well plate assay, using parameters such as total distance moved, time spent moving, time spent at the edge and maximum velocity. We also performed RNA extraction and sequencing with juvenile and adult worms to determine if genes in the pathway were expressed. We share gene sequences, alignments, motif searching, and phylogenetic analysis files for each receptor (with acetylcholine, serotonin, dopamine and GABA) and videos, together with raw .fasta files for RNA sequencing and R code for processing/analysis.
Tip
Use numerous keywords to increase the discoverability of the dataset.
To find relevant keywords, ask yourself the following question:
What terms can a reuser use in a search box to find my data?
The README file is a guide to understanding the dataset and enabling its reuse.
FRDR users can use our text or [web] templates to generate a README file for submission to FRDR.
Additional resources are:
- Creating a README file
- README.so
- README.ai
Typically, a README file includes:
Dataset identifiers, like title, authors, date of collection, and geographical information.
A folder tree, defining the hierarchy of folders and files.
A description of the naming conventions used for files and folders.
Methods for data collection/generation, analysis, and experimental conditions.
Caution
The dataset is an independent object. Methods and tools for data collection MUST NOT be relegated to the associated research article.
A set of instructions and software required to open and handle the files.
Sharing and access information, detailing permissions and terms of use.
A structured dataset is the key to understanding and reusing it.
Define a structure for the data at the beginning (best) or during the course of the research.
Think about
Overall, ensure that the dataset structure is logical and consistent, understandable to external users.
TIER 4.0 is a project template to standardize datasets.
Download the project structure and adapt it to specific cases.
Shared data includes:
Tip
Visit our resource on handling and organizing research data.
Despite being the most common file type (.xls) for recording/storing data, tables are the most poorly organized and unusable objects in research.
Among the most problematic formatting issues are: merged cells that confuse parsing tools, inconsistent structures where multiple variables are crammed into a single column or row, and mixed content, such as tables combined with graphs or formatting.
While most scientists may be more comfortable with GUIs, the current research landscape requires the use of scripts and code to ensure reproducible research results.
Tip
Coding should be considered an essential skill like other research methods.
Logical naming conventions are the key to linking the raw data, processing scripts, and analysis data.
Agenda
Why do we care about sharing data?
Current issues with data
Principles of sharing data
General guidelines for dataset deposits
Data submission checklist
Canadian generalist repositories
When sharing data, make sure the dataset meets these characteristics:
Folders and files are clearly structured.
Files and folders have consistent naming conventions.
Open file formats (e.g., CSV, TIFF) make it easy to open and reuse data.
The README file contains sufficient information to understand the nature, content, and origin of the data.
When do I start organizing my data for sharing?
We recommend implementing RDM practices early and throughout the research process. Organizing data after years of chaotic data management is challenging.
When do I share my data?
Data can be shared at any time during the research process. There is no need to wait until an article is published to share data.
What if my dataset does not fit into protocols such as TIER 4?
Most importantly, the dataset must be well documented, logically organized and have consistent naming conventions that make it understandable to potential reusers.
Is my data citable?
Of course it is! The dataset gets a DOI, and the dataset can even be cited in the research article.
How can others use my dataset?
That depends on the license. We recommend a CC-BY 4.0 license, which allows broad reuse of the data.
Where do I share my data?
You can share your data in specialized or generalist repositories like The Federated Research Data Repository (FRDR) or Borealis.
A dataset is an independent research object that that can be used (and cited) independently of the research article.
Better yet, think of articles as supplements to your dataset!
Agenda
Why do we care about sharing data?
Current issues with data
Principles of sharing data
General guidelines for dataset deposits
Data submission checklist
Canadian generalist repositories
The Federated Research Data Repository (FRDR) is a national platform for Canadian researchers to discover, store, and share research data.
Our goals:
Improve data discoverability (in partnership with Lunaris).
Promote open science practices and the reuse of research data.
Ensure the long-term preservation of valuable research data.
FRDR is for Canadian researchers
FRDR supports a wide range of disciplines and data types, providing a robust infrastructure for management and dissemination of research data across Canada.
FRDR ensures the long-term preservation, accessibility and usability of datasets through its curation and preservation team.
FRDR supports funding agencies requirements related to open access to data (and research data management plans).
FRDR supports large datasets, making it an ideal repository for data-intensive research.
FRDR supports researchers in data management best practices.
FRDR supports researchers and institutions
FRDR has competent staff to guide researchers and institutions to ensure that datasets are valuable and comply with FAIR principles.
Borealis is a Canadian research data repository supported by academic libraries, research institutions, and the Digital Research Alliance of Canada.
Features:
Built on Dataverse open-source software hosted by Scholars Portal / University of Toronto Libraries.
Integrated with single sign-on login for Canadian Institutions (Canadian Access Federation).
Indexed in DataCite search, Google dataset search, Lunaris for discoverability.
Each point represents a participating academic institution that has integrated Borealis into its research data services.
File preview to explore files directly in the browser.
Data explorer tool to visualize variables in tabular data files (e.g., SPSS, Excel, CSV).
Github integration using GitHub actions.
Contact us to ensure that your data are well prepared and can be effectively shared with the research community.
Sharing and depositing data - FRDR curation team