Commit afb329c6 authored by Matija Obreza's avatar Matija Obreza
Browse files

Documentation

parent 7705555e
......@@ -19,6 +19,10 @@ In this manual, all URLs are pointing to the Genesys sandbox environment at http
include::sections/security.adoc[]
include::sections/accession-api.adoc[]
include::sections/api-accession.adoc[]
include::sections/api-crop.adoc[]
== Acknowledgements
Special thanks go to Luca Matteis and Richard Bruskiewich from Bioversity International who have contributed
to the original documentation of the APIs.
Accession passport data basics
==============================
December 2015: Documentation commit {buildNumber}
:revnumber: {projectVersion}
:doctype: book
:toc: left
:toclevels: 5
:icons: font
:numbered:
:source-highlighter: pygments
:pygments-css: class
:pygments-linenums-mode: table
[[intro]]
Introduction
------------
This manual contains basic information on commonly used standards for accession
documentation and formats for data exchange. It documents Genesys extensions to
the standards.
https://www.genesys-pgr.org[Genesys PGR] (Plant Genetic Resources) is a free online global portal accessible at
link:$$https://www.genesys-pgr.org$$[www.genesys-pgr.org]
that allows the exploration of the world’s crop diversity through a single website. The
data published on Genesys follows the <<mcpd,Multi-crop Passport Descriptors>> standard.
The manual introduces
* <<wiews,FAO WIEWS>> database and <<wiews-instcode,WIEWS Institute codes>>
* FAO/Bioversity <<mcpd,Multi-crop Passport Descriptors>>
* <<mcpd-genesys,Genesys extensions>> to MCPD
* <<other-standards,Other standards>> relevant to accession documentation
include::sections/accedoc.adoc[]
include::sections/wiews.adoc[]
include::sections/mcpd.adoc[]
include::sections/iso.adoc[]
......@@ -67,3 +67,11 @@ include::sections/backup.adoc[]
include::sections/recovery.adoc[]
:leveloffset: 0
include::sections/wiews.adoc[]
include::sections/mcpd.adoc[]
include::sections/security.adoc[]
include::sections/api-accession.adoc[]
include::sections/api-crop.adoc[]
[[accedoc]]
== Accession documentation in genebanks
Collections of PGRFA material in genebanks document at least the following for each accession
* <<accedoc-accenumb,Accession number>>
* Acquisition date `ACQDATE` when accession entered the collection
* <<accedoc-other,Other accession identifiers>>
* <<accedoc-tax,Taxonomy>>
* <<accedoc-storage,Storage and maintenance>>
A single accession is usually maintained as several individual *inventories* or lots, where each inventory
follows different management policies and is maintained in different conditions (e.g. cryo and in vitro,
or base and active collection).
Inventory management is a topic of genebank collection management and is not further described here.
[[accedoc-accenumb]]
=== Accession number
Accession number is the unique identifier assigned to the material as it enters
the collection. This identifier generally has three components:
Prefix + Sequence number + Suffix
The *prefix* is commonly used to differentiate between different crop collections
maintained by the genebank.
.Some prefixes used by http://www.iita.org[IITA] genebank
* `TMe` Cassava _Manihot esculenta_ collection
* `TVSu` Bambara groundnut _Vigna subterranea_ collection
* `TZm` Maize _Zea mays_ collection
*Sequence number* is assigned manually or by a computer system to ensure there are
no duplicates. Some institutes prefer to zero-pad the number `00000102`.
The *suffix* allows differentiating samples of the same original material. The
exact meaning of the suffix is different for every institute.
[cols="1,1,1,2", options="header"]
.Example accession numbers
|===
|Prefix|Sequence number|Suffix|Accession number
|TMe|419||TMe-419
|TVSu|13||TVSu-13
|===
[[accedoc-other]]
=== Other accession identifiers
Material enters the collection by collecting, from breeding programs,
or acquisition from other institutes. In each case, the material will already have some
identifier assigned by the collector, breeder or other institute.
*Accession name* is the vernacular name of the material and is commonly captured by
the collector or assigned by the breeder.
[[accedoc-coll]]
==== Collected material
Genebank accessions obtained through collecting missions should maintain data about the site and dates of the
collecting and collector information.
[[accedoc-bred]]
==== Breeders material
Lines developed by breeding programs of the institute may be included the collection. Information provided by the breeders
should include the pedigree, ancestral information of the material, along with names and identifiers used by the breeding
program and the codes and names of institutes that developed the material.
[[accedoc-acq]]
==== Acquisitions
Material coming from other institutes and genebanks must be accompanied by accession passport data
as documented in the source genebank.
NOTE: *Country of origin* is the country where the material was collected or bred, not the country of source genebank.
Accession documentation should capture any identifiers provided by the source institute. This data
allows for validation and curation of passport data between the genebanks and allows researchers
to obtain material from either collection.
[[accedoc-tax]]
=== Taxonomy
Accession genus, species, species author, subtaxon and subtaxon authority are usually
known, but are subject to change after expert identification or change in taxonomic system.
https://npgsweb.ars-grin.gov/gringlobal/taxon/abouttaxonomy.aspx[GRIN Taxonomy for Plants] and the Mansfeld database can serve for validating accession taxa.
[[accedoc-storage]]
=== Storage and maintenance
Ex situ genebanks maintain PGR material as seed, in the field, in vitro, cryo or in DNA collections.
Inventories (lots) of one accession may be managed by different methods (e.g. seed and cryo).
See <<mcpd-storage,Storage>> in MCPD standard on how to capture multiple types of storage.
[[chApiAccession]]
== Managing Passport Data
Passport data is based on FAO Multi-Crop Passport Descriptors <<mcpd2>> format.
Accession records are *upserted*, meaning that when the matching accession record
. exists, it will be updated
. does not exist, a new record will be created
Accession data in the database will be updated with whatever data is provided in the
request JSON.
=== Accession identity
Prior to full adoption of Permanent Unique Identifiers for Germplasm, accessions could be
identified by the holding institute code (INSTCODE) and the accession number (ACCENUMB).
Genebanks maintaining two or more collections of crops would sometimes use the same
accession number, unique within one collection.
Genesys uses the *instCode*, *acceNumb* and *genus* triplet to uniquely identify an
accession in an institute:
[source,json,linenums]
----
{
"instCode": "NGA039", <1>
"acceNumb": "TMp-123", <2>
"genus": "Musa" <3>
}
----
<1> Holding institute code (INSTCODE)
<2> Accession number (ACCENUMB)
<3> Genus (GENUS)
=== JSON data model
The JSON data model of accession passport data closely follows <<mcpd2, MCPD>> definitions.
By default, institutes in Genesys are configured to "Use unique accession numbers within the institute".
The accession JSON object must provide two identifying elements: `instCode` and `acceNumb`.
In cases where accession numbers are not unique within the institute, `genus` is used to identify
the unique accession within the institute. Then the Accession JSON object must always provide three
identifying elements: `instCode`, `acceNumb` and `genus`.
All other fields are optional.
[source,json,linenums]
----
{
"instCode": "XYZ111",
"acceNumb": "M12345",
"genus": "Musa",
"species": "acuminata",
"spauthor": "Colla",
"subtaxa": "var. sumatrana",
"subtauthor": "(Becc.) Nasution",
"orgCty": ...,
"acqDate": "20010301",
"mlsStat": true,
"inTrust": false,
"available": true,
"historic": false,
"storage": [10, 20],
"sampStat": 200,
"duplSite": "BEL084",
"bredCode": ...,
"ancest": ....,
"remarks": [ "remark1", "remark2" ],
"acceUrl": "https://my-genebank.org/accession/1",
"geo": {
... <1>
},
"coll": {
... <2>
}
}
----
<1> JSON object with geographic data
<2> JSON object with collecting data
=== Clearing existing values
To reset or clear an existing value in the accession passport data, it should be provided
as `null`. Not providing a field means the field in the database should not be modified.
[source,json,linenums]
----
{
"instCode": "NGA039",
"acceNumb": "TMp-123",
"genus": "Musa",
"orgCty": null <1>
}
----
<1> Country of origin of accession is cleared by sending a `null` value.
=== Insert or update accessions
REST endpoint URL `/api/v0/acn/{instCode}/upsert` allows for inserting new accessions
or updating existing records in Genesys. It accepts a JSON array of Accession JSON objects.
The array provides for sending batches of 50 or 100 accessions in one call, reducing
the HTTP overhead and improving performance.
NOTE: Only the instCode and acceNumb are required (And in some cases genus).
NOTE: If a property is set to `null`, the existing value will be removed from the database.
NOTE: The server will return an error when `instCode` of JSONs does not match the `instCode` in the URL!
=== Deleting accessions
With the introduction of permanent identifiers for accession records in Genesys we have
also introduced the *Accession Archive*. The Archive holds passport data for accession records
that have been deleted from the active database.
REST endpoint URL `/api/v0/acn/{instCode}/delete` accepts an array of `instCode`, `acceNumb`, `genus` triplets
and deletes corresponding accession record from Genesys. The *DELETE* permission is required for this operation.
NOTE: Delete operation will fail if C&E data exists for any accessions listed.
.Delete 3 accessions from active database
[source,http,linenums]
----
POST /api/v0/acn/SYR002/delete
[{
"instCode": "SYR002",
"acceNumb": "12345",
"genus": "Vicia"
}, {
"instCode": "SYR002",
"acceNumb": "12345",
"genus": "Vicia"
}, {
"instCode": "SYR002",
"acceNumb": "IG 1",
"genus": "Vicia"
}]
----
[bibliography]
- [[[mcpd2]]] Alercia, A; Diulgheroff, S; Mackay, M.
http://www.bioversityinternational.org/e-library/publications/detail/faobioversity-multi-crop-passport-descriptors-v2-mcpd-v2/[FAO/Bioversity Multi-Crop Passport Descriptors V.2]. 2012.
[[chApiAccession]]
[[chApiCrop]]
== Managing Crop data
......@@ -6,6 +6,11 @@ Genesys maintains a database of crops and crop groups (e.g. forages). In additio
description, each crop defines a list of taxonomic rules that determine which taxonomies are
included (or excluded) in the group.
Crops and crop groups are referred to and identified by the crop's *short name*. The short name
placeholder in documentation below is marked by `{shortName}`. The short name should have no spaces
and it should contain US-ASCII characters only (a-Z, 0-9).
[NOTE]
.Crop Taxonomic rules
=====================================================================
......@@ -126,10 +131,36 @@ include::{snippets}/crop-create/request-fields.adoc[]
The response is a single crop record as stored on the server.
==== `curl` example
.Example request to register a new crop
include::{snippets}/crop-create/curl-request.adoc[]
=== Localization of crop title and description
The `i18n` field of the JSON crop object is a string encoded JSON object of a two level
JSON formatted dictionary string with first level keys `name` (for the name field)
and `description` (for the description field) and second level keys corresponding to ISO_639_2
encoded vernacular language tags.
For example:
[source,json,linenums]
----
{
"name": {
"en": "Musa",
"es": "Musa",
"ru": "Муса",
"zh": "穆萨"
},
"description": {
"en": "Bananas and plantains",
"es": "Los bananos y plátanos",
"ru": "Бананы и бананы",
"zh": "香蕉和大蕉"
}
}
----
=== Updating taxonomic rules
......@@ -166,38 +197,3 @@ This will remove the crop and crop rules from the system.
.Deleting a crop
include::{snippets}/crop-delete/curl-request.adoc[]
== Managing Passport Data
Accession records are *upserted*, meaning that when the matching accession record
. exists, it will be updated
. does not exist, a new record will be created
Accession data in the database will be updated with whatever data is provided in the
request JSON.
TIP: If you want to clear or un-set a value, upsert it as *null*.
`curl` Call
And this thing HTTP request
include::{snippets}/crop-create/http-request.adoc[]
Request fields
include::{snippets}/crop-create/request-fields.adoc[Kaboom]
HTTP Response
include::{snippets}/crop-create/http-response.adoc[]
Response fields
include::{snippets}/crop-create/response-fields.adoc[]
\ No newline at end of file
[[other-standards]]
== Other relevant standards
[[iso-3166]]
=== ISO-3166 Country codes
https://en.wikipedia.org/wiki/ISO_3166[ISO-3166] standard defines 'Codes for the representation of
names of countries and their subdivisions'.
https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3[ISO-3166-1 alpha-3] codes are three-letter country codes. The
Wikipedia page contains the listing of valid country codes.
Genesys uses http://download.geonames.org/export/dump/countryInfo.txt as the source of ISO-3166 country codes.
[[un-m49]]
=== UN M.49
UN defines standard country or area codes and geographical regions for statistical use:
* http://unstats.un.org/unsd/methods/m49/m49.htm
* http://unstats.un.org/unsd/methods/m49/m49alpha.htm
[[mcpd]]
== Multi-Crop Passport Descriptors
The http://www.bioversityinternational.org/e-library/publications/detail/faobioversity-multi-crop-passport-descriptors-v2-mcpd-v2/[Multi-crop Passport Descriptors (MCPD V.2)]
is a revision of the original FAO/IPGRI publication released in 2001,
expanded to accommodate emerging needs, such as the broader use of GPS tools, or the
implementation of the http://www.planttreaty.org[International Treaty on Plant Genetic Resources for
Food and Agriculture] Multilateral System for access and benefit sharing.
This MCPD V.2 list is an expansion of the first version of the MCPD, the descriptors and allowed values of the first
version form a subset of those in this revision. The 2001 list, developed jointly by
http://www.bioversityinternational.org[Bioversity International] (formerly IPGRI) and FAO,
has been widely used and is considered the international standard to facilitate germplasm passport
information exchange. These descriptors aim to be compatible with Bioversity’s crop descriptor
lists, with the descriptors used for the FAO World Information and Early Warning System (<<wiews,WIEWS>>)
on plant genetic resources (PGR), and with the https://www.genesys-pgr.org[Genesys PGR global portal].
For each multi-crop passport descriptor, a brief explanation of content, coding scheme and, in
parentheses, suggested fieldname are provided to assist in the computerized exchange of this type of data.
The authors of the <<mcpd,MCPD>> recognize that networks or groups of users may further expand the
MCPD list to meet their specific needs. As long as these additions allow for
an easy conversion to the format proposed in MCPD V.2, basic passport data can be exchanged worldwide in a
consistent manner.
=== MCPD Descriptors
[cols="1,3", options="header"]
.MCPD descriptors
|===
|Field name
|Description
|<<mcpd-wiews,INSTCODE>>|FAO WIEWS code of the institute where the accession is maintained.
|ACCENUMB|Unique identifier of the accession within a genebank.
|COLLNUMB|Original identifier assigned by the collector(s) of the sample, normally composed of the
name or initials of the collector(s) followed by a number (e.g. `FM9909`). This identifier is
essential for identifying duplicates held in different collections.
|<<mcpd-wiews,COLLCODE>>|FAO WIEWS code of the institute collecting the sample.
|COLLNAME|Name of the institute collecting the sample. This descriptor should only be used if
COLLCODE cannot be filled because the FAO WIEWS code for this institute is not available.
|COLLINSTADDRESS|Address of the institute collecting the sample. This descriptor should only
be used if COLLCODE cannot be filled because the FAO WIEWS code for this institute is not available.
|COLLMISSID|Identifier of the collecting mission used by the Collecting Institute (e.g. `CIATFOR-052`, `CN426`).
|GENUS|Genus name for taxon. Initial upper case letter required.
|SPECIES|Specific epithet portion of the scientific name in lower case letters.
The abbreviation `sp.` or `spp.` is allowed when exact species name is unknown.
|SPAUTHOR|Provide the authority for the species name.
|SUBTAXA|Subtaxon can be used to store any additional taxonomic identifier. The following abbreviations
are allowed: `subsp.` (for subspecies); `convar.` (for convariety); `var.` (for variety); `f.` (for form); `Group` (for 'cultivar group').
|SUBTAUTHOR|Provide the subtaxon authority at the most detailed taxonomic level.
|CROPNAME|Common name of the crop. Example: `malting barley`, `macadamia`, `maize`.
|ACCENAME|Either a registered or other designation given to the material received, other than the donor's
accession number (DONORNUMB) or collecting number (COLLNUMB). First letter upper case.
|ACQDATE|Date on which the accession entered the collection where YYYY is the year, MM is the month
and DD is the day. Missing data (MM or DD) should be indicated with hyphens or '00' [double zero].
|ORIGCTY|3-letter ISO 3166-1 code of the country in which the sample was originally collected
(e.g. landrace, crop wild relative, farmers' variety), bred or selected (breeding lines, GMOs,
segregating populations, hybrids, modern cultivars, etc.).
|COLLSITE|Location information below the country level that describes where the accession was collected,
preferable in English. This might include the distance in kilometers and direction from the nearest town,
village or map grid reference point, (e.g. `7km south of Curitiba in the state of Parana`).
|DECLATITUDE|Latitude expressed in decimal degrees. Positive values are North of the Equator; negative values are South of the Equator (e.g. `-44.6975`).
|DECLONGITUDE|Longitude expressed in decimal degrees. Positive values are East of the Greenwich Meridian; negative values are West of the Greenwich Meridian (e.g. `+120.9123`).
|COORDUNCERT|Uncertainty associated with the coordinates in meters. Leave the value empty if the uncertainty is unknown.
|COORDDATUM|The geodetic datum or spatial reference system upon which the coordinates given in decimal latitude
and longitude are based (e.g. `WGS84`, `ETRS89`, `NAD83`). The GPS uses the WGS84 datum.
|GEOREFMETH|The georeferencing method used (`GPS`, determined from `map`, `gazetteer`, or `estimated using software`).
Leave the value empty if georeferencing method is not known.
|ELEVATION|Elevation of collecting site expressed in meters above sea level. Negative values are not allowed.
|COLLDATE|Collecting date of the sample, where YYYY is the year, MM is the month and DD is the day. Missing data (MM or DD) should be indicated with hyphens or '00' [double szero].
|<<mcpd-wiews,BREDCODE>>|FAO WIEWS code of the institute that has bred the material. If the holding institute
has bred the material, the breeding institute code (BREDCODE) should be the same as the holding institute code (INSTCODE).
Follows INSTCODE standard.
|BREDNAME|Name of the institute (or person) that bred the material. This descriptor should only be used if BREDCODE cannot
be filled because the FAO WIEWS code for this institute is not available.
|<<mcpd-sampstat,SAMPSTAT>>|Biological status of the accession.
|ANCEST|Information about either pedigree or other description of ancestral information (e.g. parent variety
in case of mutant or selection). For example a pedigree `Hanna/7*Atlas//Turk/8*Atlas` or a description `mutation found in Hanna`,
`selection from Irene` or `cross involving amongst others Hanna and Irene`.
|COLLSRC|Collecting/acquisition source
|<<mcpd-wiews,DONORCODE>>|FAO WIEWS code of the donor institute. Follows INSTCODE standard.
|DONORNAME|Name of the donor institute (or person). This descriptor should be used only if DONORCODE cannot be filled because FAO WIEWS code for this institute is not available.
|DONORNUMB|Identifier assigned to an accession by the donor. Follows ACCENUMB standard.
|OTHERNUMB|Any other identifiers known to exist in other collections for this accession. Use the following
format: `INSTCODE:ACCENUMB;INSTCODE:identifier;…` INSTCODE and identifier are separated by a colon `:` without space.
Pairs of INSTCODE and identifier are separated by a semicolon `;` without space.
When the institute is not known, the identifier should be preceeded by a colon.
|<<mcpd-wiews,DUPLSITE>>|FAO WIEWS code of the institute(s) where a safety duplicate of the accession is maintained.
|DUPLINSTNAME|Name of the institute where a safety duplicate of the accession is maintained.
|<<mcpd-storage,STORAGE>>|Type of germplasm storage. If germplasm is maintained under different types of storage,
multiple choices are alllowed, separated by a semicolon (e.g. `20;30`).
|MLSSTAT|The status of an accession with regards to the Multilateral System (MLS) of the International Treaty on
Plant Genetic Resources for Food and Agriculture. Leave the value empty if the status is not known.
|REMARKS|The remarks field is used to add notes or to elaborate on descriptors with value `99` or `999` (= Other). Prefix remarks with the field name they refer to and a colon (:) without space (e.g. `COLLSRC:riverside`). Distinct remarks referring to different fields are separated by semicolon without space.
|===
[[mcpd-wiews]]
==== Institute codes in MCPD
Values for `INSTCODE`, `COLLCODE`, `BREDCODE`, `DONORCODE` and `DUPLSITE` must be provided as
<<wiews-instcode,FAO WIEWS codes>> of institutes.
[[mcpd-sampstat]]
==== Biological status of accession
The coding scheme proposed can be used at 2 different levels of detail: either by using the
general codes such as `100`, `200`, `300`, `400`, or by using the more specific codes
such as `110`, `120`, etc.
.Allowed values for `SAMPSTAT` field
* `100` Wild
** `110` Natural
** `120` Semi-natural/wild
** `130` Semi-natural/sown
* `200` Weedy
* `300` Traditional cultivar/landrace
* `400` Breeding/research material
** `410` Breeder's line
** `411` Synthetic population
** `412` Hybrid
** `413` Founder stock/base population
** `414` Inbred line (parent of hybrid cultivar)
** `415` Segregating population
** `416` Clonal selection
** `420` Genetic stock
** `421` Mutant (e.g. induced/insertion mutants, tilling populations)
** `422` Cytogenetic stocks (e.g. chromosome addition/substitution, aneuploids, amphiploids)
** `423` Other genetic stocks (e.g. mapping populations)
* `500` Advanced or improved cultivar (conventional breeding methods)
* `600` GMO (by genetic engineering)
* `999` Other (Elaborate in REMARKS field)
[[mcpd-storage]]
==== Accession storage
If germplasm is maintained under different types of storage, multiple values are allowed. When
an accession is maintained in active- and base collections, `STORAGE` corresponds to `11` and `13`
and can be encoded as `11;13`.
.Allowed values for `STORAGE` field
* `10` Seed collection
** `11` Short term
** `12` Medium term
** `13` Long term
* `20` Field collection
* `30` In vitro collection
* `40` Cryopreserved collection
* `50` DNA collection
* `99` Other (elaborate in REMARKS field)
[[mcpd-genesys]]
=== Genesys extensions to MCPD
[cols="1,3", options="header"]
.MCPD extensions
|===
|Field name
|Description
|<<mcpd-acceurl,ACCEURL>>|Accession URL
|<<mcpd-available,AVAILABLE>>|Indicates current availabilty of accession for distribution
|<<mcpd-historic,HISTORIC>>|Indicates whether the record represents an accession no longer actively maintained by the genebank
|UUID|Universally unique identifier of the accession record
|===
[[mcpd-acceurl]]