Table 2 Additional Columns added
I have added 2 additional columns for Table 2 discussed in 7/17 meeting. 1. Number of accessions in long term storage (-18-20 C) 2. Number of accessions included in MLS (from GLIS)
I calculated these with a new script because I was not able to extract this information from previous metrics produced bc I had to group by both crop and INSTCODE (instead of just by crop).
@petergpython Can you review Table 2 output in the Drive with these columns added to determine if these calculations should be implemented in the script?
path = "../../GCCS metrics project shared folder/Data_processing/6_generate_tables/2025_07_17/Table2_all_crops.xlsx"
If we do add these calculations to the script, should I add script below to the institution_accessions_summary and update the Metrics and Data Descriptions table to include these new metrics?
#################################################
# SG: Calculate 2 temp metrics, if used, move to script 4 and add to metrics and data descriptions table
all_metrics <- "../../Data_processing/4_estimate_metrics/2025_07_15/all_metrics_summary.xlsx"
institution_accessions_summary <- all_metrics$institution_accessions_summary
# 1. Number of accessions in long term storage (-18-20 C)
combined_allcrops <- read_csv("../../Data_processing/3_post_taxa_standardization_processing/Resulting_datasets/2025_07_08/combined_df.csv")
long_term_storage <- combined_allcrops %>%
filter(str_detect(as.character(STORAGE), "\\b13\\b")) %>%
group_by(Crop_strategy, INSTCODE) %>% #by crop & institute
summarise(long_term_storage_count = n(), .groups = "drop")
# 2. Number of accessions included in MLS (from GLIS)
GLIS_dataset <- read_csv("../../Data_processing/3_post_taxa_standardization_processing/Resulting_datasets/2025_07_07/GLIS_processed.csv") # glis_data_processed is the data after adding the cropstrategy variable
GLIS_MLS_count_by_inst <- GLIS_dataset %>%
group_by(Crop_strategy, INSTCODE) %>% #by crop & institute
summarise(MLS_notified = sum(MLSSTAT, na.rm = TRUE), .groups = "drop")
# merge into institution_accessions_summary
inst_summary2 <- institution_accessions_summary %>%
left_join(long_term_storage, by = c("Crop_strategy","INSTCODE")) %>%
left_join(GLIS_MLS_count_by_inst, by = c("Crop_strategy","INSTCODE")) %>%
mutate(
# fill NAs
long_term_storage_count = replace_na(long_term_storage_count, 0),
MLS_notified = replace_na(MLS_notified, 0),
# create the two new display columns
`Number of accessions in long term storage (-18-20 C) and source` =
if_else(
long_term_storage_count > 0,
paste0(long_term_storage_count, " (storage=13)"),
""
),
`Number of accessions included in MLS (from GLIS)` =
MLS_notified
)