17 NetCDF Files

Author

Natalie Williams

Published

October 16, 2025

17.1 Overview

NetCDF (Network Common Data Form) is the OGC standard for storing multidimensional scientific data.

Curation Goal

Validate the self-describing nature of scientific data. Our objective is to ensure NetCDF files follow community conventions (CF Conventions) and contain sufficient metadata (units, dimensions, and CRS) for reliable future reuse.

Preservation Risk

NetCDF defines how to store data, but not what it represents. Non-compliance with the Climate and Forecast (CF) conventions renders the data contextually unusable, as software cannot reliably interpret physical units or spatial alignment.

This notebook evaluates NetCDF files on three levels:

Metadata Compliance: Checking for global attributes like Conventions or institution.
Spatial Awareness: Detecting Coordinate Reference Systems (CRS).
Data Health: Scanning for empty datasets (100% NaNs).

17.2 Setup

Before running this notebook, you need to ensure the required R packages are installed.

17.2.1 R Packages

The following R packages are required. If you don’t have them, run this code once in your R console:

Code

# install.packages(c("tidyverse", "tidync", "ncmeta", "rstudioapi"))

17.3 Load Libraries

This chunk loads all the necessary libraries for the session.

Code

library(tidyverse)
library(tidync)     # Tidy interface for NetCDF
library(ncmeta)     # Low-level metadata extraction
library(rstudioapi) # For directory selection

17.4 Select target directory with NetCDF files

This section identifies the directory to be analyzed and finds all .nc files within it.

Note: If you are running this interactively in RStudio, a dialog box will appear. If you are rendering this document (where no user interaction is possible), it will default to the params$target_dir defined in the YAML header.

Code

# 1. Try to select interactively if in RStudio
if (interactive() && .Platform$OS.type == "windows") { 
  selected_dir <- rstudioapi::selectDirectory(caption = "Select NetCDF Directory")
} else {
  selected_dir <- NULL
}

# 2. Logic to determine final directory (Interactive vs Parameter)
if (!is.null(selected_dir)) {
  target_dir <- selected_dir
} else {
  target_dir <- params$target_dir
}

print(paste("Analyzing directory:", target_dir))

[1] "Analyzing directory: data/Inspect_nc/"

17.4.1 Find NetCDF files

Now we scan the selected directory for all .nc files.

Code

# Find all NetCDF files
nc_files <- list.files(
  path = target_dir,
  pattern = "\\.nc$",
  recursive = TRUE,
  full.names = TRUE,
  ignore.case = TRUE
)

# Print the number of files found and show the first few paths
print(paste("Found", length(nc_files), "NetCDF files."))

[1] "Found 6 NetCDF files."

Code

head(nc_files)

[1] "data/Inspect_nc//Averaged_exceedance_TP_p99p0_ERA5.nc"
[2] "data/Inspect_nc//Geopotential_orography.nc"           
[3] "data/Inspect_nc//TPp99p0_2001_2020_ERA5.nc"           
[4] "data/Inspect_nc//TPp99p0_2001_2020_IMERG.nc"          
[5] "data/Inspect_nc//TPp99p0_2001_2020_IMERG025grid.nc"   
[6] "data/Inspect_nc//WSp99p0_2001_2020_ERA5.nc"

17.5 Usability scan

This phase evaluates the “fitness for use.” We check if the files contain valid spatial coordinates (CRS) and if the data variables actually contain numbers (Data Health).

Code

nc_files <- list.files(target_dir, pattern = "\\.nc$", full.names = TRUE, recursive = TRUE)

# Helper: Usability Scan
inspect_nc_inventory <- function(fp) {
  fname <- basename(fp)
  
  # 1. Safe Load
  tnc <- tryCatch(tidync(fp), error = function(e) NULL)
  
  if (is.null(tnc)) {
    return(tibble(
      FileName = fname,
      Status = "Corrupt/Unreadable",
      DimsSummary = NA,
      VarCount = NA,
      HasCRS = NA,
      DataHealth = NA
    ))
  }
  
  # 2. Extract Metadata Summary
  # Active grid dimensions
  dims <- tnc %>% hyper_dims()
  dims_str <- paste(dims$name, collapse = " x ")
  
  # Variables
  vars <- tnc %>% hyper_vars()
  var_count <- length(vars$name)
  
  # 3. Spatial Check (CRS)
  # Look for standard lat/lon names or "grid_mapping" attribute
  has_lat <- any(str_detect(dims$name, "(?i)lat|y"))
  has_lon <- any(str_detect(dims$name, "(?i)lon|x"))
  
  # Robust attribute check using ncmeta
  all_atts <- ncmeta::nc_atts(fp)
  has_grid_mapping <- any(all_atts$name == "grid_mapping")
  
  spatial_status <- if (has_grid_mapping || (has_lat && has_lon)) "Georeferenced" else "No Spatial Grid"
  
  # 4. Data Health Check (Sparsity)
  # We read a tiny slice of the first active variable to see if it contains valid data
  is_empty_label <- "Unknown"
  try({
    first_var <- vars$name[1]
    # Pull first 100 values only
    sample_data <- tnc %>% 
      activate(first_var) %>% 
      hyper_slice(select_var = first_var) %>% 
      as_tibble()
    
    val_col <- names(sample_data)[ncol(sample_data)]
    if (all(is.na(sample_data[[val_col]]))) {
      is_empty_label <- "⚠️ All NaNs (Empty)"
    } else {
      is_empty_label <- "Contains Data"
    }
  }, silent = TRUE)
  
  return(tibble(
    FileName = fname,
    Status = "Valid",
    DimsSummary = dims_str,
    VarCount = var_count,
    HasCRS = spatial_status,
    DataHealth = is_empty_label
  ))
}

# Run Inventory
message(paste("Scanning", length(nc_files), "files for usability..."))
inventory_results <- map_dfr(nc_files, inspect_nc_inventory)

print(paste("Inventory complete for", nrow(inventory_results), "files."))

[1] "Inventory complete for 6 files."

Code

head(inventory_results)

# A tibble: 6 × 6
  FileName                         Status DimsSummary VarCount HasCRS DataHealth
  <chr>                            <chr>  <chr>          <int> <chr>  <chr>     
1 Averaged_exceedance_TP_p99p0_ER… Valid  longitude …        2 Geore… Unknown   
2 Geopotential_orography.nc        Valid  longitude …        1 Geore… Unknown   
3 TPp99p0_2001_2020_ERA5.nc        Valid  longitude …        1 Geore… Unknown   
4 TPp99p0_2001_2020_IMERG.nc       Valid  longitude …        1 Geore… Unknown   
5 TPp99p0_2001_2020_IMERG025grid.… Valid  lon x lat          1 Geore… Unknown   
6 WSp99p0_2001_2020_ERA5.nc        Valid  longitude …        1 Geore… Unknown

17.6 Metadata Extraction

Now, we perform a deep extraction of all attributes to create detailed documentation.

Code

# Create a "safely" version of tidync to handle potentially corrupt files
safe_tidync <- purrr::safely(tidync)

# 1. Process all files
processed_files <- purrr::map(nc_files, ~safe_tidync(.x)) %>% 
  set_names(nc_files)

# 2. Separate successful results
successful_results <- purrr::map(processed_files, "result") %>% 
  purrr::compact()

errors <- purrr::map(processed_files, "error") %>% 
  purrr::compact()

if (length(errors) > 0) {
  message("The following files failed deep extraction:")
  walk(names(errors), message)
}

# 3. Extract Raw Components
nc_dimensions <- purrr::map(successful_results, ~.x$dimension) %>% 
  bind_rows(.id = "FileName") %>% mutate(FileName = basename(FileName))

nc_variables <- purrr::map(successful_results, ~.x$variable) %>% 
  bind_rows(.id = "FileName") %>% mutate(FileName = basename(FileName))

nc_attributes <- purrr::map(successful_results, ~.x$attribute) %>% 
  bind_rows(.id = "FileName") %>% mutate(FileName = basename(FileName))

print("Deep extraction complete.")

[1] "Deep extraction complete."

17.7 Reshape metadata for comparison

We reshape the metadata into two summary tables: one for Global Attributes (file-level) and one for Variable Attributes (variable-level).

Code

# A. Global Attributes Summary
nc_attributes_global <- nc_attributes %>%
  filter(variable == "NC_GLOBAL") %>%
  pivot_wider(
    id_cols = FileName,
    names_from = name,
    values_from = value,
    values_fn = ~paste(., collapse = "; ")
  )

print("--- Global Attributes Summary ---")

[1] "--- Global Attributes Summary ---"

Code

glimpse(nc_attributes_global)

Rows: 1
Columns: 3
$ FileName    <chr> "Geopotential_orography.nc"
$ Conventions <chr> "CF-1.6"
$ history     <chr> "2024-04-18 11:32:43 GMT by grib_to_netcdf-2.25.1: /opt/ec…

Code

# B. Variable Attributes Summary
nc_variables_with_attributes <- nc_variables %>%
  left_join(
    filter(nc_attributes, variable != "NC_GLOBAL"),
    by = c("name" = "variable", "FileName")
  ) %>%
  pivot_wider(
    names_from = name.y, 
    values_from = value,
    values_fn = ~paste(., collapse = "; ")
  )

print("--- Variables and Attributes Summary ---")

[1] "--- Variables and Attributes Summary ---"

Code

head(nc_variables_with_attributes)

# A tibble: 6 × 20
  FileName       id.x name  type  ndims natts dim_coord active  id.y `NA`  units
  <chr>         <int> <chr> <chr> <int> <int> <lgl>     <lgl>  <dbl> <chr> <chr>
1 Averaged_exc…     0 long… NC_F…     1     0 TRUE      FALSE     NA NULL  <NA> 
2 Averaged_exc…     1 lati… NC_F…     1     0 TRUE      FALSE     NA NULL  <NA> 
3 Averaged_exc…     2 seaE… NC_F…     3     0 FALSE     TRUE      NA NULL  <NA> 
4 Averaged_exc…     3 seaE… NC_F…     3     0 FALSE     TRUE      NA NULL  <NA> 
5 Averaged_exc…     4 Exco… NC_F…     2     0 FALSE     FALSE     NA NULL  <NA> 
6 Averaged_exc…     5 Exce… NC_F…     2     0 FALSE     FALSE     NA NULL  <NA> 
# ℹ 9 more variables: long_name <chr>, calendar <chr>, scale_factor <chr>,
#   add_offset <chr>, `_FillValue` <chr>, missing_value <chr>,
#   standard_name <chr>, coordinates <chr>, axis <chr>

17.8 Save Summary Reports

Finally, we save the most useful summary tables to .csv files for documentation and further analysis.

Code

output_dir <- "Results/Inspect_nc"
dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)

timestamp <- format(Sys.Date(), "%Y%m%d")

# 1. Save Inventory (High-Level Scan)
write.csv(inventory_results, file.path(output_dir, paste0("NetCDF_Inventory_", timestamp, ".csv")), row.names = FALSE)

# 2. Save Deep Metadata (Detailed Tables)
write.csv(nc_dimensions, file.path(output_dir, paste0("NetCDF_Dimensions_", timestamp, ".csv")), row.names = FALSE)
write.csv(nc_attributes_global, file.path(output_dir, paste0("NetCDF_Global_Attributes_", timestamp, ".csv")), row.names = FALSE)
write.csv(nc_variables_with_attributes, file.path(output_dir, paste0("NetCDF_Variables_Attributes_", timestamp, ".csv")), row.names = FALSE)

print(paste("All reports saved to:", output_dir))

[1] "All reports saved to: Results/Inspect_nc"

17.9 Curation Insights

Use the generated reports to guide your preservation actions:

Spatial Awareness (HasCRS): Files marked “No Spatial Grid” lack standard latitude/longitude coordinates or a grid_mapping attribute. These files will have difficulties to load in GIS software. Check if they are non-spatial (e.g., time-series at a single station) or if the metadata is missing.
Data Health (DataHealth): Files marked “⚠️ All NaNs” are likely empty shells—the model ran but produced no output. You can verify these files manually and exclude them from the archive.
Metadata Compliance: In the NetCDF_Global_Attributes CSV, the curator can check for the attribute Conventions = “CF-1.x” and verify fullfilment of gold standards.

17.10 Additional Tools & Resources

17.10.1 Verify consistent global attributes

Verify if all files in a dataset have the same title, institution, source, and CF Conventions.

if (nrow(nc_attributes_global) > 0) {
  # Select a few key attributes and count the unique combinations
  global_consistency_check <- nc_attributes_global %>%
    select(FileName, contains("title"), contains("institution"), contains("source"), contains("Conventions")) %>%
    # The line below groups by all columns except FileName
    group_by(across(-FileName)) %>%
    summarise(file_count = n(), .groups = "drop")
  
  print("Consistency Check of Key Global Attributes:")
  print(global_consistency_check)
}

[1] "Consistency Check of Key Global Attributes:"
# A tibble: 1 × 2
  Conventions file_count
  <chr>            <int>
1 CF-1.6               1

17.10.2 Check for essential variable attributes

For data to be reusable, variables should always have attributes like long_name and units. This section allows checking for missing attributes across all variables.

if (nrow(nc_variables_with_attributes) > 0) {
  missing_attribute_check <- nc_variables_with_attributes %>%
    # Summarise the number of variables missing these key attributes
    summarise(
      missing_long_name = sum(is.na(long_name)),
      missing_units = sum(is.na(units))
    )
  
  print("Check for Missing Essential Variable Attributes:")
  print(missing_attribute_check)
}

[1] "Check for Missing Essential Variable Attributes:"
# A tibble: 1 × 2
  missing_long_name missing_units
              <int>         <int>
1                46            46

CDO (Climate Data Operators): Is a command-line suite for manipulating and analyzing NetCDF data. It is the industry standard for regridding and statistical aggregation.
Panoply: A cross-platform application from NASA that plots geo-referenced arrays from NetCDF files. Excellent for “Visual QC”.
NCO (NetCDF Operators): A toolkit to perform arithmetic and attribute editing on NetCDF files (Zender 2008).

17.11 Using the Non-Interactive R Script

For users who want to run this analysis on a server, in a batch job, or from the command line, here is a pure R script that performs the same process.

17.11.1 The `Inspect_nc_Script.R` Script

Download the R Script: Inspect_nc_Script.R

17.11.2 Example HPC Submission Script (`Inspect_nc_submit.sh`)

#!/bin/bash
#SBATCH --job-name=nc_inspect
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=8G
#SBATCH --time=00:30:00
#SBATCH --output=logs/nc_inspect_%j.log

# 1. Load R Module
module load R

# 2. Define Directory
TARGET_DIR="/scratch/user/project_data/climate_models"

# 3. Prepare Environment
mkdir -p Results/Inspect_nc
mkdir -p logs

# 4. Run Analysis
echo "Starting NetCDF Inspection on $TARGET_DIR"
Rscript Inspect_nc_Script.R "$TARGET_DIR"

17.12 References

--- title: "NetCDF Files" author: "Natalie Williams" date: "2025-10-16" format: html: toc: true toc-location: left code-fold: true bibliography: references.bib params: target_dir: "data/Inspect_nc/" --- ## Overview [NetCDF](https://www.ogc.org/standards/netcdf/) (Network Common Data Form) is the OGC standard for storing multidimensional scientific data. ::: {.callout-note title="Curation Goal"} Validate the self-describing nature of scientific data. Our objective is to ensure NetCDF files follow community conventions (CF Conventions) and contain sufficient metadata (units, dimensions, and CRS) for reliable future reuse. ::: ::: {.callout-warning title="Preservation Risk"} NetCDF defines *how* to store data, but not *what* it represents. Non-compliance with the Climate and Forecast (CF) conventions renders the data contextually unusable, as software cannot reliably interpret physical units or spatial alignment. ::: **This notebook evaluates NetCDF files on three levels:** 1. **Metadata Compliance:** Checking for global attributes like `Conventions` or `institution`. 2. **Spatial Awareness:** Detecting Coordinate Reference Systems (CRS). 3. **Data Health:** Scanning for empty datasets (100% NaNs). ------------------------------------------------------------------------ ## Setup Before running this notebook, you need to ensure the required R packages are installed. ### R Packages The following R packages are required. If you don't have them, run this code once in your R console: ```{r} # install.packages(c("tidyverse", "tidync", "ncmeta", "rstudioapi")) ``` ------------------------------------------------------------------------ ## Load Libraries This chunk loads all the necessary libraries for the session. ```{r} #| label: load-libraries #| message: false library(tidyverse) library(tidync) # Tidy interface for NetCDF library(ncmeta) # Low-level metadata extraction library(rstudioapi) # For directory selection ``` ------------------------------------------------------------------------ ## Select target directory with NetCDF files This section identifies the directory to be analyzed and finds all `.nc` files within it. > **Note:** If you are running this interactively in RStudio, a dialog box will appear. If you are rendering this document (where no user interaction is possible), it will default to the `params$target_dir` defined in the YAML header. ```{r} #| label: select-target-dir # 1. Try to select interactively if in RStudio if (interactive() && .Platform$OS.type == "windows") { selected_dir <- rstudioapi::selectDirectory(caption = "Select NetCDF Directory") } else { selected_dir <- NULL } # 2. Logic to determine final directory (Interactive vs Parameter) if (!is.null(selected_dir)) { target_dir <- selected_dir } else { target_dir <- params$target_dir } print(paste("Analyzing directory:", target_dir)) ``` ### Find NetCDF files Now we scan the selected directory for all `.nc` files. ```{r} #| label: find-files # Find all NetCDF files nc_files <- list.files( path = target_dir, pattern = "\\.nc$", recursive = TRUE, full.names = TRUE, ignore.case = TRUE ) # Print the number of files found and show the first few paths print(paste("Found", length(nc_files), "NetCDF files.")) head(nc_files) ``` ------------------------------------------------------------------------ ## Usability scan This phase evaluates the "fitness for use." We check if the files contain valid spatial coordinates (CRS) and if the data variables actually contain numbers (Data Health). ```{r} #| label: inventory-logic #| warning: false #| message: false nc_files <- list.files(target_dir, pattern = "\\.nc$", full.names = TRUE, recursive = TRUE) # Helper: Usability Scan inspect_nc_inventory <- function(fp) { fname <- basename(fp) # 1. Safe Load tnc <- tryCatch(tidync(fp), error = function(e) NULL) if (is.null(tnc)) { return(tibble( FileName = fname, Status = "Corrupt/Unreadable", DimsSummary = NA, VarCount = NA, HasCRS = NA, DataHealth = NA )) } # 2. Extract Metadata Summary # Active grid dimensions dims <- tnc %>% hyper_dims() dims_str <- paste(dims$name, collapse = " x ") # Variables vars <- tnc %>% hyper_vars() var_count <- length(vars$name) # 3. Spatial Check (CRS) # Look for standard lat/lon names or "grid_mapping" attribute has_lat <- any(str_detect(dims$name, "(?i)lat|y")) has_lon <- any(str_detect(dims$name, "(?i)lon|x")) # Robust attribute check using ncmeta all_atts <- ncmeta::nc_atts(fp) has_grid_mapping <- any(all_atts$name == "grid_mapping") spatial_status <- if (has_grid_mapping || (has_lat && has_lon)) "Georeferenced" else "No Spatial Grid" # 4. Data Health Check (Sparsity) # We read a tiny slice of the first active variable to see if it contains valid data is_empty_label <- "Unknown" try({ first_var <- vars$name[1] # Pull first 100 values only sample_data <- tnc %>% activate(first_var) %>% hyper_slice(select_var = first_var) %>% as_tibble() val_col <- names(sample_data)[ncol(sample_data)] if (all(is.na(sample_data[[val_col]]))) { is_empty_label <- "⚠️ All NaNs (Empty)" } else { is_empty_label <- "Contains Data" } }, silent = TRUE) return(tibble( FileName = fname, Status = "Valid", DimsSummary = dims_str, VarCount = var_count, HasCRS = spatial_status, DataHealth = is_empty_label )) } # Run Inventory message(paste("Scanning", length(nc_files), "files for usability...")) inventory_results <- map_dfr(nc_files, inspect_nc_inventory) print(paste("Inventory complete for", nrow(inventory_results), "files.")) head(inventory_results) ``` ## Metadata Extraction Now, we perform a deep extraction of all attributes to create detailed documentation. ```{r} #| label: deep-extraction #| warning: false #| message: false # Create a "safely" version of tidync to handle potentially corrupt files safe_tidync <- purrr::safely(tidync) # 1. Process all files processed_files <- purrr::map(nc_files, ~safe_tidync(.x)) %>% set_names(nc_files) # 2. Separate successful results successful_results <- purrr::map(processed_files, "result") %>% purrr::compact() errors <- purrr::map(processed_files, "error") %>% purrr::compact() if (length(errors) > 0) { message("The following files failed deep extraction:") walk(names(errors), message) } # 3. Extract Raw Components nc_dimensions <- purrr::map(successful_results, ~.x$dimension) %>% bind_rows(.id = "FileName") %>% mutate(FileName = basename(FileName)) nc_variables <- purrr::map(successful_results, ~.x$variable) %>% bind_rows(.id = "FileName") %>% mutate(FileName = basename(FileName)) nc_attributes <- purrr::map(successful_results, ~.x$attribute) %>% bind_rows(.id = "FileName") %>% mutate(FileName = basename(FileName)) print("Deep extraction complete.") ``` ------------------------------------------------------------------------ ## Reshape metadata for comparison We reshape the metadata into two summary tables: one for Global Attributes (file-level) and one for Variable Attributes (variable-level). ```{r} #| label: reshape-attributes # A. Global Attributes Summary nc_attributes_global <- nc_attributes %>% filter(variable == "NC_GLOBAL") %>% pivot_wider( id_cols = FileName, names_from = name, values_from = value, values_fn = ~paste(., collapse = "; ") ) print("--- Global Attributes Summary ---") glimpse(nc_attributes_global) # B. Variable Attributes Summary nc_variables_with_attributes <- nc_variables %>% left_join( filter(nc_attributes, variable != "NC_GLOBAL"), by = c("name" = "variable", "FileName") ) %>% pivot_wider( names_from = name.y, values_from = value, values_fn = ~paste(., collapse = "; ") ) print("--- Variables and Attributes Summary ---") head(nc_variables_with_attributes) ``` ------------------------------------------------------------------------ ## Save Summary Reports Finally, we save the most useful summary tables to `.csv` files for documentation and further analysis. ```{r} #| label: save-results output_dir <- "Results/Inspect_nc" dir.create(output_dir, recursive = TRUE, showWarnings = FALSE) timestamp <- format(Sys.Date(), "%Y%m%d") # 1. Save Inventory (High-Level Scan) write.csv(inventory_results, file.path(output_dir, paste0("NetCDF_Inventory_", timestamp, ".csv")), row.names = FALSE) # 2. Save Deep Metadata (Detailed Tables) write.csv(nc_dimensions, file.path(output_dir, paste0("NetCDF_Dimensions_", timestamp, ".csv")), row.names = FALSE) write.csv(nc_attributes_global, file.path(output_dir, paste0("NetCDF_Global_Attributes_", timestamp, ".csv")), row.names = FALSE) write.csv(nc_variables_with_attributes, file.path(output_dir, paste0("NetCDF_Variables_Attributes_", timestamp, ".csv")), row.names = FALSE) print(paste("All reports saved to:", output_dir)) ``` ------------------------------------------------------------------------ ## Curation Insights Use the generated reports to guide your preservation actions: - **Spatial Awareness (HasCRS):** Files marked "No Spatial Grid" lack standard latitude/longitude coordinates or a grid_mapping attribute. These files will have difficulties to load in GIS software. Check if they are non-spatial (e.g., time-series at a single station) or if the metadata is missing. - **Data Health (DataHealth):** Files marked "⚠️ All NaNs" are likely empty shells—the model ran but produced no output. You can verify these files manually and exclude them from the archive. - **Metadata Compliance:** In the NetCDF_Global_Attributes CSV, the curator can check for the attribute Conventions = "CF-1.x" and verify fullfilment of gold standards. ## Additional Tools & Resources ### Verify consistent global attributes Verify if all files in a dataset have the same `title`, `institution`, `source`, and CF `Conventions`. ```{r} #| label: curation-check-global #| code-fold: false if (nrow(nc_attributes_global) > 0) { # Select a few key attributes and count the unique combinations global_consistency_check <- nc_attributes_global %>% select(FileName, contains("title"), contains("institution"), contains("source"), contains("Conventions")) %>% # The line below groups by all columns except FileName group_by(across(-FileName)) %>% summarise(file_count = n(), .groups = "drop") print("Consistency Check of Key Global Attributes:") print(global_consistency_check) } ``` ### Check for essential variable attributes For data to be reusable, variables should always have attributes like `long_name` and `units`. This section allows checking for missing attributes across all variables. ```{r} #| label: curation-check-var-attr #| code-fold: false if (nrow(nc_variables_with_attributes) > 0) { missing_attribute_check <- nc_variables_with_attributes %>% # Summarise the number of variables missing these key attributes summarise( missing_long_name = sum(is.na(long_name)), missing_units = sum(is.na(units)) ) print("Check for Missing Essential Variable Attributes:") print(missing_attribute_check) } ``` - **CDO (Climate Data Operators):** Is a [command-line suite](https://code.mpimet.mpg.de/projects/cdo) for manipulating and analyzing NetCDF data. It is the industry standard for regridding and statistical aggregation. - **Panoply:** A cross-platform [application](https://www.giss.nasa.gov/tools/panoply/) from NASA that plots geo-referenced arrays from NetCDF files. Excellent for "Visual QC". - **NCO (NetCDF Operators):** A [toolkit](https://nco.sourceforge.net/) to perform arithmetic and attribute editing on NetCDF files [@zender2008]. ## Using the Non-Interactive R Script For users who want to run this analysis on a server, in a batch job, or from the command line, here is a pure R script that performs the same process. ### The `Inspect_nc_Script.R` Script Download the **R Script:** [**`Inspect_nc_Script.R`**](Scripts/Inspect_nc_Script.R) ### Example HPC Submission Script (`Inspect_nc_submit.sh`) ``` bash #!/bin/bash #SBATCH --job-name=nc_inspect #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --mem=8G #SBATCH --time=00:30:00 #SBATCH --output=logs/nc_inspect_%j.log # 1. Load R Module module load R # 2. Define Directory TARGET_DIR="/scratch/user/project_data/climate_models" # 3. Prepare Environment mkdir -p Results/Inspect_nc mkdir -p logs # 4. Run Analysis echo "Starting NetCDF Inspection on $TARGET_DIR" Rscript Inspect_nc_Script.R "$TARGET_DIR" ``` ## References ::: {#refs} :::

17.1 Overview

17.2 Setup

17.2.1 R Packages

17.3 Load Libraries

17.4 Select target directory with NetCDF files

17.4.1 Find NetCDF files

17.5 Usability scan

17.6 Metadata Extraction

17.7 Reshape metadata for comparison

17.8 Save Summary Reports

17.9 Curation Insights

17.10 Additional Tools & Resources

17.10.1 Verify consistent global attributes

17.10.2 Check for essential variable attributes

17.11 Using the Non-Interactive R Script

17.11.1 The Inspect_nc_Script.R Script

17.11.2 Example HPC Submission Script (Inspect_nc_submit.sh)

17.12 References

17.11.1 The `Inspect_nc_Script.R` Script

17.11.2 Example HPC Submission Script (`Inspect_nc_submit.sh`)