Handling biomedical images and sharing reproducible workflows

A primer for researchers

FRDR curation team

Digital Research Alliance of Canada

Understanding BioImaging

Agenda

  1. Understanding BioImaging

  2. Planning and adquiring images

  3. Gathering image metadata

  4. Organizing and sharing image data

  5. Sharing reproducible workflows

A diagram illustrating different types of biological imaging techniques and their applications. The top section, labeled 'Sample Imaging,' includes icons representing various research fields such as molecular biology, genetics, microbiology, and anatomy. Below, three imaging modalities are depicted: 'Electron microscopy' (in light blue), 'Light microscopy' (in dark blue), and 'Human bioimaging' (in dark red), indicating their areas of application. At the bottom, 'Image data analysis' is shown, emphasizing the computational aspect of image processing. The diagram highlights the role of imaging across multiple research disciplines.

Images contain countless data

A black-and-white cartoon illustration depicting a therapy session. A patient is lying on a couch, expressing their concerns to a therapist who is taking notes. The patient has a thought bubble showing a massive wave labeled 'DATA' about to crash over a small, helpless figure. The cartoon humorously represents the overwhelming feeling of dealing with large amounts of data, a common challenge in research and data science. The drawing is signed by Henning Falk.

Henning Falk (2022) - NumFOCUS

Bioimaging has entered the realm of big data comprising increasingly complex datasets. We face numerous challenges, specifically, proper data handling and management and the creation and sharing of reproducible image analysis workflows.

BioImage Life cicle

A circular diagram illustrating the bioimaging research data management (RDM) lifecycle, titled 'Core Facility: Infrastructure and Bioimaging RDM Support.' At the center, a researcher icon represents the user interacting with different stages of the workflow. The lifecycle is divided into segments: 'Image Acquisition' (purple), 'Image File Storage & Access' (yellow), 'Processing & Bioimage Analysis' (green), '(Data-) Publication' (blue), 'Archiving, Long-Term Storage' (light gray), and 'Planning, Data Search & Re-use' (gray). Around the outer circle, key supporting infrastructure elements are represented, including cloud storage, file system storage, efficient data transfer, formats, standards, tools, permission systems, data security, web access, connectivity, and data integration. The diagram highlights the structured approach to managing and supporting bioimaging research data.

BioImage life cycle from Schmidt, et al. 2024

BioImages have the potential for scientific discovery beyond its original acquisition purpose when handled according to FAIR principles (Schmidt, et al. 2024; see Wilkinson, et al. 2016).

Planning and adquiring images

Agenda

  1. Understanding BioImaging

  2. Planning and adquiring images

  3. Gathering image metadata

  4. Organizing and sharing image data

  5. Sharing reproducible workflows

Overview

Conventional file-system-based storage is quickly reaching its limits. Before the data are generated, researchers must consider how it will be stored, moved, documented and analyzed during (and after) the project’s lifetime.

A screenshot of the Windows File Explorer 'This PC' section, displaying storage devices and their available disk space. The image highlights two key areas: (1) the 'This PC' navigation option on the left panel, and (2) the 'Devices and drives' section, showing multiple hard drives and partitions, some nearly full with red-colored storage bars. This image illustrates disk space management challenges and overcapacity warnings in Windows.

Current directions:

  • Leverage on research data management plans (DMP)
  • Cloud-based storage and platforms (i.e OMERO) will become a MUST.
  • Need for standardized formats (NGFFs).

Planning imaging experiments

Imaging Hadware

  • Imaging device (light, confocal, electron)
  • Define imaging parameters - suitable for image analysis

storage of images

  • Image size - Gigas or terabytes
  • Image complexity - Single or multiple focal planes, Live or time-lapse images

How will you handle your images?

While panning an imaging experiment, consider the most suitable approach given your capabilities for data handling and analysis.

Data adquisition

  • Desirable instrument setup and calibration and imaging parameters (size, bit depth, saturation, etc.).
  • Prioritize open file formats (.TIFF) over proprietary (.CZI).
  • Recording of metadata.

Adquisition parameters are key

During imaging it is necessary to consider a compromise between the parameters needed to answer the research question (magnification, size, bit depth) and the available processing power (storage, computing power).

What parameters should I use to answer my research question?

A fluorescence microscopy image of a coronal mouse brain section displayed in an image viewer. The image is captured from a `.czi` file and shown as an 8-bit TIFF format with a size of 23MB. The fluorescence signal appears in green, highlighting specific structures within the brain tissue. The metadata in the image viewer indicates dimensions of 9086.62 x 7290.43 microns, corresponding to a resolution of 2752 x 2208 pixels. This type of imaging is commonly used in neuroscience research to visualize protein expression, cellular structures, or anatomical regions.

8 bit image - Courtesy of Daniel Manrique-Castano

The image shows the same as the left one for a 16-bit depth.

16 bit image - Courtesy of Daniel Manrique-Castano

Consider that…

A humorous research-themed illustration combining text and a drawing of a microscope. On the left side, bold black text reads 'BRO, DO YOU EVEN SCOPE?' with an illustration of a compound microscope below it. On the right side, within a light blue background, the text emphasizes the importance of training in microscopy: 'A microscope is as good as the person that operates it. Experiment quality can be boosted with training.' The words 'microscope' and 'Experiment quality' are highlighted in blue. The image humorously promotes proper training for obtaining high-quality research results using microscopes.

Transforming file formats

After acquiring original images in proprietary formats (i.e .CZI or .LIF), researchers can use different tools to open and transform the images to open formats (.TIFF).

A screenshot of the Bio-Formats 8.1.0 download page, a software tool for reading and converting life sciences image file formats. The background contains a fluorescence microscopy image of cells stained with blue, green, and yellow markers. The main heading, 'Bio-Formats 8.1.0 Downloads,' is displayed in large white text. Bio-Formats is an essential tool for researchers working with multidimensional microscopy data.

Bio-Formats from Open Microscopy Environment (compatible with FIJI)

A screenshot of the 'python-bioformats 4.1.0' package release page. The page has a blue background and prominently displays the package name and version. Below, a command snippet shows how to install the package using pip: 'pip install python-bioformats,' with a copy button next to it. This package enables Python users to work with Bio-Formats, a library for reading and processing research image data.

Bio-Formats for Python

The OME.TIFF format

Is a format developed by the Open Microscopy Environment (OME) based on TIFF specification. OME-TIFF incorporates:

  • OME-XML metadata.
  • Support multidimensional, multi image data (see examples).
  • Supports complex multidimensional and high-content screening data using BigTiff, given the limitations of TIFF (up to 4GB).
  • Can distribute the data across multiple files (see example).

Next-Generation File Format OME-Zarr

A schematic illustration of an image pyramid, a concept used in image processing and bioimaging. The pyramid represents different resolution levels of the same image, with a pink-stained brain tissue sample shown at each level. The base of the pyramid represents 'Full resolution, Full area,' while the middle and upper levels correspond to '1/2 resolution, 1/4 area' and '1/4 resolution, 1/16 area,' respectively. This visualization highlights how images are stored at multiple scales to optimize processing and analysis in whole-slide imaging and multi-resolution microscopy.

Image Pyramid

Take home message

Strategic planing and data management, cloud-ready platforms and formats, and institutional support ensure scalable, accessible, and reusable imaging data.

Gathering image metadata

Agenda

  1. Understanding BioImaging

  2. Planning and adquiring images

  3. Gathering image metadata

  4. Organizing and sharing image data

  5. Sharing reproducible workflows

Overview

High-quality metadata is crucial for making imaging data FAIR.

A vibrant abstract digital artwork featuring a fragmented human face interwoven with geometric shapes, colorful textures, and technological elements. The composition includes a large, expressive eye surrounded by streaks of color, circuits, and intersecting lines. Additional elements, such as a stylized mouth and curved, ribbon-like structures, contribute to the surreal aesthetic.

Image from DevianArt

Current challenges:

  • A fragmented ecosystem lacking clear metadata standards that leads to inconsistent documentation.
  • Journals and institutions lacking uniform guidelines.

A structured table outlining the different metadata modules required for standardized bioimaging data documentation. The table consists of several sections: 'Study,' 'Study Component,' 'Biosample,' 'Specimen,' 'Image Acquisition,' 'Image Data,' 'Image Correlation,' and 'Analyzed Data.' Each row specifies an attribute (e.g., study type, imaging method, biological entity), a description of its significance, the data entry method (text, ontology, extracted data), and related ontologies (such as EDAM-BIOIMAGING, FBbi, OME, and EFO). This table highlights essential metadata elements needed to ensure reproducibility, interoperability, and structured data sharing in bioimaging research.

Download the template

Use controlled vocabulary

Use controlled vocabulary (ontologies) to specify objects, their categories and relationships.

Name Ontology
Organisms NCBI
Genes NCBI
Proteins Uniprot
Imaging Methods FBBI
Exp. factors EFO

Record metadata from imaging devices

Screenshot of the Micro Meta App, a microscopy metadata management tool. The top section displays the app's banner with the title 'MICRO META APP' and the tagline 'Microscopy Metadata for the Real World.' Below, two main sections allow users to 'Manage Instrument' and 'Manage Settings,' providing options to document hardware components and acquisition settings. The lower half of the image shows detailed interface views, including labeled microscope components such as objectives, cameras, filter sets, and laser settings, highlighting metadata fields like manufacturer, gain, magnification, and bit depth.

MicroMeta App

Visit the MicroMeta App and the associated research article

Methods J2 PlugIn

Screenshot of a text document generated using the MethodsJ2 tool based on user input and a Micro-Meta App hardware file. The document details the imaging setup, including a Zeiss Axiovert 200M inverted microscope configured for Widefield Epifluorescence microscopy, controlled with Zen software. The text specifies the optical components, imaging parameters such as voxel size (0.14 μm), and details of fluorescence excitation using an X-Cite 120 LED light source with various excitation/emission filters and dichroic mirrors. The acknowledgments section credits the Advanced BioImaging Facility (ABIF) at McGill and Joel Ryan for assistance.

Example from MethodsJ2 repository

The MethodsJ2 Fiji plugIn generates text for microscopy materials and methods by extracting information from metadata (MicroMeta App file). Visit here the GitHub repository or the associated research article.

OMERO: One of the best choices

OMERO incorporates MDEmic (MetaData Editor for microscopy) an tool that provides an easy way to explore and edit metadata from images.

Metadata to consider

We need to move toward:

  • Better consolidation of metadata standards.
  • Development of tools for automated metadata collection (includes collaboration with microscope manufacturers).
  • Research infrastructures and BioImaging support for proper metadata recording.

Organizing and sharing image data

Agenda

  1. Understanding BioImaging

  2. Planning and adquiring images

  3. Gathering image metadata

  4. Organizing and sharing image data

  5. Sharing reproducible workflows

Images could be in many places

A laptop with a black screen placed on a wooden table in a café setting. A cup of coffee on a saucer is next to it. The background has a wire mesh window with plants visible outside.

Laptop of students and postdocs

A close-up of a network switch with multiple Ethernet cables plugged in. The cables are orange and blue, neatly organized and connected to the switch. A label with the text 'NorthC' is visible on the top right corner of the device.

Institute network

A serene view of the sky filled with soft, fluffy white clouds stretching across the horizon. The upper part of the image shows a gradient of blue, transitioning from deep to light as it meets the clouds.

The cloud (Google drive)

Eventually, biomedical images (big data) can reach terabytes or petabytes in size, exceeding most standard file-sharing solutions.

Tip

Effective image storage requires infrastructure, optimization of processing workflows, and standardized sharing protocols.

Considerations for storage solutions

For selecting the storage modality, think that, as researchers, we do not want simply to store the dataset somewhere, we want to make it accessible and usable.

We want our images to be

  • Findable (Persistent identifiers, Indexed in a searchable resource)
  • Accessible (Software requirements, Open file formats)
  • Interoperable (rich metadata, standardized parameters)
  • Reusable (descriptive metadata, Clear license and usage rights)

Where do I share my images?

A collection of logos representing various research data repositories, including OSF, Dryad, FRDR, DFDR, BioStudies, Zenodo, and Borealis. These repositories support open science by providing platforms for data sharing and preservation.

Examples of generalist repositories

Features of some generalist repositories

Specialized image repositories

Other solutions

These instances are installed in dedicated network (core facility) space for long-term storage and sharing.

Characteristics of shared images

  • Raw and processed images (processing operations should not alter the original image).
  • Uncompressed and lossless open file formats.
  • OME-TIFF retain original (i.e .czi) metadata.
  • PNG (images with annotations) is preferred.
  • Accurate and descriptive (machine readable) naming conventions (i.e Subject_Group_Area_Marker): Use grouping factors to name images.
  • Creation of README files to contextualize and describe the content and methods used in the dataset.
  • Use CC-BY, or CC0 licenses.

Tip

“We strongly discourage author statements that images ‘are available upon request’, as this has been shown to be inefficient” (Schmied et al. 2023)

Sharing reproducible workflows

Agenda

  1. Understanding BioImaging

  2. Planning and adquiring images

  3. Gathering image metadata

  4. Organizing and sharing image data

  5. Sharing reproducible workflows

Image processing and analysis

In any research workflow, the analysis of images must be:

Objective

Illustration of a book with a magnifying glass placed over it. The book has a yellow cover with a label on the top, and the magnifying glass has a red rim with a blue lens, symbolizing research, analysis, or study.

Reliable

Close-up of a metallic carabiner with the word 'RELIABILITY' engraved on it. The carabiner is securely fastened with a golden locking mechanism and is attached to a strong fabric strap.

From https://nexxis.com.au/

Reproducible

Illustration of a researcher in a white lab coat painting on a canvas, which contains an image of himself painting on a canvas in an infinite recursive loop. The background is red, creating a striking contrast. The artwork symbolizes scientific replication, self-referential processes, and the iterative nature of research.

From https://med.stanford.edu/

Image analysis from RDM perspective

From an RDM perspective, analysis of biomedical images ideally entails:

  • Access to large datasets
  • Records of image processing (code, scripts)
  • Sharing of results (images, tables, graphs)

Tip

Accurate, descriptive naming conventions and README files with metadata, or codebooks are vital to assure the integrity of analysis pipelines.

Modular approach for image analysis

A modular pipeline establishes the main image analysis tasks into independent sub tasks.

flowchart LR
A[Nuclei detection] --> B[Quantification] 
B --> C[Spatial analysis]

Tip

Modularity allows to construct complex analysis pipelines from independent components that can function together. This promotes the reuse of independent modules.

Hierarchy of image analysis tasks

Low-level (technical knowledge)

Transform images in other images or data:

  • Preprocessing (e.g. deconvolution, noise removal)
  • Object detection and segmentation (e.g. cells, intracellular organelles)
  • Particle/object tracking

High-level (disciplinary knowledge)

Transform outputs from low-level tasks into information with biological meaning:

  • Data visualization
  • Fitting of statistical models
  • Statistical inference and uncertainty measurements

Image analysis workflow

Prioritize open/free software

There are dozens of open/free options to analyze research images and share reproducible workflows:

A great resource for images analysis: BioImage

BioImage.IO is a community-driven AI model repository that provides access to pretrained AI models with a plethora of open/free software partners.

Screenshot of the BioImage Model Zoo, displaying multiple AI models for bioimage segmentation and analysis. Each model has a thumbnail, title, description, tags, and download count. Some models focus on nucleus, cell, and mitochondria segmentation using different deep learning techniques. The interface features icons for compatibility, licensing, and additional functionalities.

From Bioimage

Image analysis workflows

Why to use code?

“The mouse is antisocial. The GUI is antisocial. So, what’s that mean? You have a problem to solve, and you solve it with a GUI. What do you have? A problem solved. But when you solve it with a command line interface in a scripting environment, you have an artifact. And all of the sudden, that artifact can be shared with someone” Jeffrey Snover

Use CODE not the mouse!

Overall

Tip

  • Keep track of all changes and analysis procedures perform to images.
  • Organize and link files (using naming conventions) through processing to avoid errors.
  • Select final or intermediate results to share considering aspects such as storing space and long-term preservation.

Publication of analysis results

Sharing of research objects in public, active research management repositories, like the Open Science Framework (OSF) is an excellent strategy to promote open, reproducible research. Please consider sharing illustrative images, figures and tables used in publication.

Tip

In general, these are lower resolution images/figures (.png) not used for analysis but for illustration in research reports (thesis, articles).

Examples of results images

Click to see examples of published results images and figures.

Data are generally not shared

‘Data is available on request’ statements in publications are found to be often unreliable in practice (Schmidt et al., 2024)

Open Science Principle

Share the data as openly as possible and keep it only as closed as necessary.

Sharing data is a professional responsibility

Depositing a dataset in a repository is NOT ONLY an exercise in meeting the requirements of funding agencies and journals. It is an ethical and professional responsibility of researchers to ensure reproducible research, and the access and reuse of research data.

Therefore, research needs to move towards

  • Competent researchers in RDM and data analysis.
  • Standardized approaches to sharing raw data and analysis code to support research findings.
  • Researchers with a commitment to transparency and best research practices to ensure integrity.

Logos of two Canadian research data repositories: FRDR (Federated Research Data Repository) and Borealis. The FRDR logo features a geometric pattern of yellow squares forming a diamond shape, with the repository name in black and gold text. The Borealis logo includes an artistic depiction of the Northern Lights over mountains and a lake, with the repository name in bold white text.

Visit FRDR or Borealis

Resources and support

A QR code image that redirects to the presentation located in a GitHub repository.

This presentation is available here (English or French)

Support Services:

Contact us to ensure that your data is well prepared and can be effectively shared with the research community.

  • Email: rdm-gdr@alliancecan.ca
  • https://www.frdr-dfdr.ca/repo/