Skip to content

Amphibian Previews

A lot of data is available in spreadsheet format (CSV, TSV, Excel) that eventually needs to be ingested into GGCE. Amphibian provides the APIs to parse, store, fetch and query spreadsheet data (Preview), but it requires mongodb and additional resources -- we want to avoid that.

While Amphibian is available in Genesys, it would require sending data to an external service, also not a good option in poor-internet speeds. We can handle small Excel/CSV spreadsheets in the browser, but this will perform poorly on mobile and slower clients. It is better to use the Amphibian approach: send data to the server and have a light UI that allows me to explore/query/annotate/map the data, even if there are 100,000-s of rows in the file. Just like Genesys.

In this ticket we develop a new amphibian-preview library (in Amphibian) that provides the functionality to parse Excel and CSV files (the PreviewApi), and uses the json data type, JSON_VALUE and JSON_QUERY functions available in MariaDB and MSSQL to query data. It does not require mongodb and works with RBDMS tables util_amphibian_preview and util_amphibian_preview_data.

The amphibian-preview library will be a new dependency of GGCE and will allow for parsing and reading spreadsheets, all while using the already available RDMBS.

GGCE will expose API endpoints to load and read spreadsheet data to the UI.

Use case: import trait data

I have an Excel file with new trait data from the last regeneration of my accessions and want to import the data to GGCE.

  1. I upload the Excel file, it is parsed
  2. I can see a preview of the contents (paginated) and I'm able to filter the data (Genesys Previews do this)
  3. The UI allows me to map columns to CropTrait, validate the data and ingest it as CropTraitObservation (Genesys does this)
  4. Ta-daaa: my new trait data is in GGCE

Use case: bulk data upload

I downloaded data from GGCE, modified it in Excel and now would like to update my database:

  1. I upload the Excel file, it is parsed
  2. GGCE figures out what fields I'm trying to update and gives me a preview (paginated) of what the changes would look like
  3. I "commit" the upload and the data from the spreadsheet updates the corresponding data in the database
  4. Ta-daaa: I've updated 10,000 rows without crashing the browser or my computer
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information