Taxonomy Tools issueshttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues2023-11-03T06:45:49+01:00https://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/38JUnit upgrade2023-11-03T06:45:49+01:00Matija ObrezaJUnit upgradePlease upgrade JUnit.Please upgrade JUnit.Vladyslava MokliakVladyslava Mokliakhttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/37lombok2023-10-20T15:37:26+02:00Matija ObrezalombokUse lombok.Use lombok.Matija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/36Taxonomy Download URL2023-06-30T13:50:26+02:00Matija ObrezaTaxonomy Download URLOur current code fetches the cab file from
https://npgsweb.ars-grin.gov/gringlobal/uploads/installers/latest/taxonomy_data.cab
Please switch to the new URL
https://npgsweb.ars-grin.gov/gringlobal/uploads/documents/taxonomy_data.cabOur current code fetches the cab file from
https://npgsweb.ars-grin.gov/gringlobal/uploads/installers/latest/taxonomy_data.cab
Please switch to the new URL
https://npgsweb.ars-grin.gov/gringlobal/uploads/documents/taxonomy_data.cabArtem HrybeniukArtem Hrybeniukhttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/35Upgrade dependencies2022-12-27T08:38:33+01:00Matija ObrezaUpgrade dependenciesUpgrade at least the following:
- commons-beanutils/commons-beanutils@1.9.3
- com.google.guava/guava@31.1-jre
- commons-codec/commons-codec@1.11Upgrade at least the following:
- commons-beanutils/commons-beanutils@1.9.3
- com.google.guava/guava@31.1-jre
- commons-codec/commons-codec@1.11Artem HrybeniukArtem Hrybeniukhttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/34Java 112021-02-16T07:24:14+01:00Matija ObrezaJava 11Upgrade to J11.Upgrade to J11.3.0Maxym BorodenkoMaxym Borodenkohttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/12Add support for hybrids2020-06-13T20:10:46+02:00Matija ObrezaAdd support for hybridsSpecies check for `Medicago littoralis x truncatula` should result in validation of species *littoralis* and species *truncatula*.
Others:
* *Medicago truncatula x littoralis*
* *Citrus Unshiu X C. Nobilis* is probably *Citrus* *uns...Species check for `Medicago littoralis x truncatula` should result in validation of species *littoralis* and species *truncatula*.
Others:
* *Medicago truncatula x littoralis*
* *Citrus Unshiu X C. Nobilis* is probably *Citrus* *unshiu x nobilis* i.e. *Citrus unshiu × Citrus nobilis*
## More info ##
[Wikipedia](https://en.wikipedia.org/wiki/Hybrid_name_%28botany%29) says that
* A hybrid may get a name; this will usually be the option of choice for naturally occurring hybrids.
* A hybrid may also be indicated by a formula listing the parents. Such a formula uses the multiplication sign "×" to link the parents.
*Magnolia x thompsoniana* is a specific hybrid of *Magnolia virginiana × Magnolia tripetala* (see [Wiki](https://en.wikipedia.org/wiki/Magnolia_%C3%97_thompsoniana)). The × in *× thimpsoniana* is optional and is used to stress that this species is a hybrid of some sort:
> ... a taxonomist could decide to use either form of this name:
> Drosera ×anglica to emphasize that it is a hybrid, or
> Drosera anglica to emphasize that it is a species.
## GENUS ##
We can safely remove `X ` from the start of `GENUS` and then compare the remainder with the database. Generic hybrids are fun:
* *Citrus X Fortunella* *aurantium x japonica* probably stands for *Citrus aurantium × Fortunella japonica*
## SPECIES ##
We will have to take apart the `SPECIES` and inspect it for `x`, `X` and `×` signs. Each side will then be compared individually against the database.
* *Citrus Clementine X Tangelo Orla* would be *Citrus x Tangelo* *clementine x orla* in database, but written out as *Citrus clementine × Tangelo orla*?
When the `x` appears at the start of the `SPECIES` (e.g. in *Sorghum x almum*) the check must result in `OK` when the actual GRIN species is marked as **hybrid**. 1.0Matija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/8Find appropriate values for string comparison indexes2020-06-13T20:10:11+02:00Matija ObrezaFind appropriate values for string comparison indexesProgram now uses Dice's coefficient and not Levenshtein's distance. Should/Can we reintroduce Levenshtein's distance as a [0.0, 1.0] score?
Add tests on TAXONOMY_GENUS.txt data to validate the method results based on AVRDC and ICARDA data.Program now uses Dice's coefficient and not Levenshtein's distance. Should/Can we reintroduce Levenshtein's distance as a [0.0, 1.0] score?
Add tests on TAXONOMY_GENUS.txt data to validate the method results based on AVRDC and ICARDA data.1.0Matija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/33GRIN-Taxa accents2020-06-13T20:06:44+02:00Matija ObrezaGRIN-Taxa accentsAnother issue with accents marks and the Validator. In the example below, the data provider used (Jacq.) Maréchal for a SPAUTHOR. The Validator gives back a corrected version, this time without the accent in the e:
```
SPAUTHOR SPAUTHO...Another issue with accents marks and the Validator. In the example below, the data provider used (Jacq.) Maréchal for a SPAUTHOR. The Validator gives back a corrected version, this time without the accent in the e:
```
SPAUTHOR SPAUTHOR_check
(Jacq.) Maréchal (Jacq.) Marechal
```
The issue here is that Maréchal is correctly spelled and there is no need to remove the accent as the validator suggests. Even GRIN Taxonomy uses Maréchal (not Marechal): https://npgsweb.ars-grin.gov/gringlobal/taxonomydetail.aspx?id=41595
```csv
GENUS,SPECIES,SPAUTHOR
Vigna,aconitifolia,(Jacq.) Maréchal
```1.2Matija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/1Taxonomy checker2019-01-05T21:26:31+01:00Matija ObrezaTaxonomy checkerAdd *Taxonomy checker* functionality to the tool. The project depends on `opencsv`, so the source file should be in CSV format.
## CSV Template
The template uses MCPD taxonomy column names. The code inspects the header row for thes...Add *Taxonomy checker* functionality to the tool. The project depends on `opencsv`, so the source file should be in CSV format.
## CSV Template
The template uses MCPD taxonomy column names. The code inspects the header row for these columns and uses these as input analysis. All other columns in the input spreadsheet are ignored.
|Accession Number| GENUS | SPECIES|SPAUTHOR|SUBTAXA|SUBTAUTHOR|
| -------- | -------- | -------- | -------- | -------- | -------- |
|TMe-419| Manihot|esculenta|Crantz|subsp. flabellifolia|(Pohl) Cif.|
## Results
The tool produces a new CSV with extra columns containing suggested values:
|Accession Number| GENUS|GENUS_check|SPECIES|SPECIES_check|SPAUTHOR|SPAUTHOR_check|SUBTAXA|SUBTAXA_check|SUBTAUTHOR_check|
| -------- | -------- | -------- | -------- | -------- | -------- |-------- |-------- |-------- |-------- |-------- |
|TMe-419| Manihot|--------|esculenta|-------- |Crantz|-------- |subsp. flabellifolia|-------- |(Pohl) Cif.|-------- |
The `_check` columns can be added as last columns instead of being inserted next to the originals.
When the original value is valid there will be no values in the corresponding `_check` column.
## Configuration options
The tool can be configured to resolve to the *current* taxonomy instead of only checking for spelling mistakes.
## Filling gaps
The values for `SPAUTHOR` and `SUBTAUTHOR` are commonly not provided by crop gene banks. When *genus* and *species* provided are valid (i.e. no suggestions for change) the tool will suggest the species authority name from GRIN Taxonomy database. Similarly, when all other data is valid, value for `SUBTAUTHOR` will be suggested.
Even when these values exist in the source data, we only include suggestions as describe above.
## Notes
`GENUS_NAME` is not unique in the database, differences are in `GENUS_AUTHORITY` or `GENUS_SECTION`.
For each `SPECIES_NAME` there is one `SPECIES_AUTHORITY` within any one `TAXONOMY_GENUS_ID`.
1.0Matija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/2GenusRow field should be Enum2019-01-05T21:26:31+01:00Matija ObrezaGenusRow field should be Enum`QUALIFYING_CODE` can contain `=`, `=~`, `~`, `null`. This should be converted to an Enum.`QUALIFYING_CODE` can contain `=`, `=~`, `~`, `null`. This should be converted to an Enum.GoldMatija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/3Improve returned suggestions2019-01-05T21:26:31+01:00Matija ObrezaImprove returned suggestionsWithin #1 sort suggestions by best match first, limit number of results.Within #1 sort suggestions by best match first, limit number of results.Matija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/32Release 1.12017-09-19T23:14:43+02:00Matija ObrezaRelease 1.1Release version 1.1Release version 1.1Matija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/31Remove taxonomychecker-web module2017-05-25T18:06:07+02:00Matija ObrezaRemove taxonomychecker-web moduleThe `taxonomychecker-web` module has been moved to https://gitlab.croptrust.org/genesys-pgr/validator
Remove the module from this project and remove docker + deployment elements from the `.gitlab-ci.yml`.The `taxonomychecker-web` module has been moved to https://gitlab.croptrust.org/genesys-pgr/validator
Remove the module from this project and remove docker + deployment elements from the `.gitlab-ci.yml`.https://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/30Link to video2017-05-24T11:47:50+02:00Matija ObrezaLink to videoPlease add the paragraph to MCPD taxonomy validation tool (https://sandbox.genesys-pgr.org/taxonomychecker) under the heading
## How to use the MCPD taxonomy validation tool?
See this short video to learn how to use the MCPD taxono...Please add the paragraph to MCPD taxonomy validation tool (https://sandbox.genesys-pgr.org/taxonomychecker) under the heading
## How to use the MCPD taxonomy validation tool?
See this short video to learn how to use the MCPD taxonomy validation tool: https://www.youtube.com/watch?v=LR9Fl1P84Gc&index=6&list=PLDlzgGuc_qUrhzC0o4Mo5Esvn8vQ0R-D5
https://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/4Split the library from the CLI2017-05-20T21:36:30+02:00Matija ObrezaSplit the library from the CLITo minimize dependencies of the library itself split out the CLI code into a separate Maven project.To minimize dependencies of the library itself split out the CLI code into a separate Maven project.1.0Matija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/5Don't overwrite existing files2017-05-20T21:36:30+02:00Matija ObrezaDon't overwrite existing filesThe program should never overwrite existing files. It must fail with an error.The program should never overwrite existing files. It must fail with an error.1.0Matija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/6Build must pass Sonatype requirements2017-05-20T21:36:30+02:00Matija ObrezaBuild must pass Sonatype requirementsEnsure that the pre-1.0 build already passes all requirements on Sonatype and validate that *maven-release-plugin* works.Ensure that the pre-1.0 build already passes all requirements on Sonatype and validate that *maven-release-plugin* works.1.0Matija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/7SUBTAXA2017-05-20T21:36:30+02:00Matija ObrezaSUBTAXAGenus, species and species authority are checked, need to add check for `SUBTAXA` field. Subtaxa check will only be done when both genus and species are valid.
## SUBTAUTHOR
When there is one exact match for subtaxa we can suggest ...Genus, species and species authority are checked, need to add check for `SUBTAXA` field. Subtaxa check will only be done when both genus and species are valid.
## SUBTAUTHOR
When there is one exact match for subtaxa we can suggest the subtaxa authority of the match in the `SUBTAUTHOR_check` field.1.0Matija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/9Decrease score for deprecated taxa2017-05-20T21:36:30+02:00Matija ObrezaDecrease score for deprecated taxaTaxonomy records in GRIN-Global database point to the *current* accepted record. In this code, this is checked with `SpeciesRow#isCurrent()` or `GenusRow#isCurrent()`.
Change scoring so that not-current records get a lower score.
#...Taxonomy records in GRIN-Global database point to the *current* accepted record. In this code, this is checked with `SpeciesRow#isCurrent()` or `GenusRow#isCurrent()`.
Change scoring so that not-current records get a lower score.
## Example
Between *Vigna unguiculata* (L.) Walp. **var. dekindtiana** and *Vigna unguiculata* (L.) Walp. **subsp. dekindtiana** the first record is "deprecated" and is a synonym for the former. The scoring should reflect this preference and give the first match a slightly lower score.1.0Matija ObrezaMatija Obrezahttps://gitlab.croptrust.org/genesys-pgr/taxonomy-tools/-/issues/10GenusRow is not available in TaxonomyDatabase2017-05-20T21:36:30+02:00Matija ObrezaGenusRow is not available in TaxonomyDatabaseGenusRow contains data relevant to matching taxonomies (e.g. `isCurrent()`). It has now been excluded from TaxonomyDatabase by design.
Bring it back and use `GenusRow#isCurrent()` to drop the score when record is not current (same as ...GenusRow contains data relevant to matching taxonomies (e.g. `isCurrent()`). It has now been excluded from TaxonomyDatabase by design.
Bring it back and use `GenusRow#isCurrent()` to drop the score when record is not current (same as #9 ).1.0Matija ObrezaMatija Obreza