Skip to content

Upload accession images to Genesys

To be able to upload images from GGCE to Genesys is a popular request from genebanks. Most use S/FTP and Filezilla to upload images to Genesys, and with GGCE this process can be made easier.

Accession images in Genesys

Accession images are organized in folders by WIEWS institute code and then by accession number /wiews/{instCode}/acn/{accessionNumber/. Each accession folder contains images (and other documents) related to the accession.

Accession images in GGCE

We need to ignore the file repository organization of accession attachments in GGCE, they are stored in /AIA/{genus}/{accessionId:2}/{accessionId}/{inventoryId}/ folders. Instead we need to generate a list of accession attachments that resembles the Genesys approach.

What to upload?

We don't want to blindly not re-upload files that are already in Genesys, it takes too long. The first step is to download repository metadata file for the institute from Genesys, compare it with the GGCE list of accession attachments that resembles the Genesys approach, and determine what is only in Genesys, only in GGCE, and in both. We can then offer the user the following options:

  1. to delete images that are not in GGCE, but are in Genesys
  2. to (maybe!) download images that are not in GGCE, but are in Genesys, and add them to GGCE
  3. to upload images that are not yet in Genesys
  4. to upload new versions of images that are already in Genesys, but have a different SHA/MD5 digest (i.e. are updated in GGCE)
  5. to update only the metadata of images that are already in Genesys as they have the same SHA/MD5 digests (i.e. have updated metadata)

Implementation

  1. Develop a method that generates a Genesys-like list of accession attachments: sha1, md5, path, originalFilename (and other metadata)
  2. Download the repository metadata file for the institute(s) from Genesys and store it in temporary local storage (not file repository)
  3. Implement a method to compare the two lists and generate sublists:
    • imageUpload: not in Genesys (i.e. candidate for upload) -- no match in Genesys by path + originalFilename
    • imageDownload: not in GGCE (i.e. candidate to download) -- no match in GGCE by path + originalFilename
    • imageUpdate: new bytes in GGCE (i.e. SHA/MD digest different) -- matches by path + originalFilename but digest different
    • metadataUpdate: updated metadata in GGCE -- matches by path + originalFilename + digest, but other metadata not the same
    • these sublists can be stored similarly to Genesys metadata file
  4. Add API endpoint to refresh the status, and one that returns Page<?> of the selected sublist (this one should do magic if temp files are not available)
  5. Add API endpoint to start the sync of the selected list and return progress status, or if sync of that list is already running, return the progress status.
    • Uploading/updating/downloading should use the relevant Genesys API endpoint (not S/FTP)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information