Skip to content

Table 2 Additional Columns added

I have added 2 additional columns for Table 2 discussed in 7/17 meeting. 1. Number of accessions in long term storage (-18-20 C) 2. Number of accessions included in MLS (from GLIS)

I calculated these with a new script because I was not able to extract this information from previous metrics produced bc I had to group by both crop and INSTCODE (instead of just by crop). @petergpython Can you review Table 2 output in the Drive with these columns added to determine if these calculations should be implemented in the script? path = "../../GCCS metrics project shared folder/Data_processing/6_generate_tables/2025_07_17/Table2_all_crops.xlsx"

If we do add these calculations to the script, should I add script below to the institution_accessions_summary and update the Metrics and Data Descriptions table to include these new metrics?

#################################################
# SG: Calculate 2 temp metrics, if used, move to script 4 and add to metrics and data descriptions table
all_metrics <- "../../Data_processing/4_estimate_metrics/2025_07_15/all_metrics_summary.xlsx"
institution_accessions_summary <- all_metrics$institution_accessions_summary

# 1. Number of accessions in long term storage (-18-20 C)
combined_allcrops <- read_csv("../../Data_processing/3_post_taxa_standardization_processing/Resulting_datasets/2025_07_08/combined_df.csv") 
long_term_storage <- combined_allcrops %>%
  filter(str_detect(as.character(STORAGE), "\\b13\\b")) %>%
  group_by(Crop_strategy, INSTCODE) %>%           #by crop & institute
  summarise(long_term_storage_count = n(), .groups = "drop")

# 2. Number of accessions included in MLS (from GLIS)
GLIS_dataset  <- read_csv("../../Data_processing/3_post_taxa_standardization_processing/Resulting_datasets/2025_07_07/GLIS_processed.csv") # glis_data_processed is the data after adding the cropstrategy variable
GLIS_MLS_count_by_inst <- GLIS_dataset %>%
  group_by(Crop_strategy, INSTCODE) %>%          #by crop & institute
  summarise(MLS_notified = sum(MLSSTAT, na.rm = TRUE), .groups = "drop")

# merge into institution_accessions_summary
inst_summary2 <- institution_accessions_summary %>%
  left_join(long_term_storage,        by = c("Crop_strategy","INSTCODE")) %>%
  left_join(GLIS_MLS_count_by_inst,   by = c("Crop_strategy","INSTCODE")) %>%
  mutate(
    # fill NAs
    long_term_storage_count = replace_na(long_term_storage_count, 0),
    MLS_notified             = replace_na(MLS_notified, 0),
    # create the two new display columns
    `Number of accessions in long term storage (-18-20 C) and source` =
      if_else(
        long_term_storage_count > 0,
        paste0(long_term_storage_count, " (storage=13)"),
        ""
      ),
    `Number of accessions included in MLS (from GLIS)` =
      MLS_notified
  )