Taxonomy checker
Add Taxonomy checker functionality to the tool. The project depends on opencsv
, so the source file should be in CSV format.
CSV Template
The template uses MCPD taxonomy column names. The code inspects the header row for these columns and uses these as input analysis. All other columns in the input spreadsheet are ignored.
Accession Number | GENUS | SPECIES | SPAUTHOR | SUBTAXA | SUBTAUTHOR |
---|---|---|---|---|---|
TMe-419 | Manihot | esculenta | Crantz | subsp. flabellifolia | (Pohl) Cif. |
Results
The tool produces a new CSV with extra columns containing suggested values:
|Accession Number| GENUS|GENUS_check|SPECIES|SPECIES_check|SPAUTHOR|SPAUTHOR_check|SUBTAXA|SUBTAXA_check|SUBTAUTHOR_check| | -------- | -------- | -------- | -------- | -------- | -------- |-------- |-------- |-------- |-------- |-------- | |TMe-419| Manihot|--------|esculenta|-------- |Crantz|-------- |subsp. flabellifolia|-------- |(Pohl) Cif.|-------- |
The _check
columns can be added as last columns instead of being inserted next to the originals.
When the original value is valid there will be no values in the corresponding _check
column.
Configuration options
The tool can be configured to resolve to the current taxonomy instead of only checking for spelling mistakes.
Filling gaps
The values for SPAUTHOR
and SUBTAUTHOR
are commonly not provided by crop gene banks. When genus and species provided are valid (i.e. no suggestions for change) the tool will suggest the species authority name from GRIN Taxonomy database. Similarly, when all other data is valid, value for SUBTAUTHOR
will be suggested.
Even when these values exist in the source data, we only include suggestions as describe above.
Notes
GENUS_NAME
is not unique in the database, differences are in GENUS_AUTHORITY
or GENUS_SECTION
.
For each SPECIES_NAME
there is one SPECIES_AUTHORITY
within any one TAXONOMY_GENUS_ID
.