Commit 7b3c4444 authored by Matija Obreza's avatar Matija Obreza

Documentation

parent 5c614edc
# Genesys Anno - Standalone data uploader
Anno is a standalone Java application for managing accession-level data on
Genesys PGR - global database on PGR in world's genebanks. Genesys database is
accessible at https://www.genesys-pgr.org
The application allows users to map their Excel, CSV or database SQL queries to Multi-Crop
Passport Descriptor format (MCPD) and push the mapped data to Genesys for publication.
......@@ -183,7 +183,59 @@
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctor-maven-plugin</artifactId>
<version>1.5.3</version>
<executions>
<execution>
<id>output-html</id>
<phase>generate-resources</phase>
<goals>
<goal>process-asciidoc</goal>
</goals>
<configuration>
<backend>html5</backend>
<doctype>book</doctype>
<sourceHighlighter>coderay</sourceHighlighter>
<attributes>
<copycss>true</copycss>
<!-- <linkcss /> -->
<toc>left</toc>
<icons>font</icons>
<sectanchors>true</sectanchors>
<idprefix />
<idseparator>-</idseparator>
<docinfo1>true</docinfo1>
</attributes>
</configuration>
</execution>
<!-- <execution>
<id>output-docbook</id>
<phase>generate-resources</phase>
<goals>
<goal>process-asciidoc</goal>
</goals>
<configuration>
<backend>docbook</backend>
<doctype>book</doctype>
</configuration>
</execution> -->
</executions>
<configuration>
<sourceDirectory>src/main/asciidoc</sourceDirectory>
<preserveDirectories>true</preserveDirectories>
<headerFooter>true</headerFooter>
<numbered>true</numbered>
<!-- <imagesDir>images</imagesDir> -->
<attributes>
<buildNumber>${buildNumber}</buildNumber>
<projectArtifact>${project.artifactId}</projectArtifact>
<projectVersion>${project.version}</projectVersion>
<snippets>${snippetsDirectory}</snippets>
</attributes>
</configuration>
</plugin>
</plugins>
</build>
</project>
\ No newline at end of file
Genesys Anno User Manual
========================
December 2015: Commit {buildNumber}
:revnumber: {projectVersion}
:doctype: book
:toc: left
:toclevels: 5
:icons: font
:numbered:
:source-highlighter: pygments
:pygments-css: class
:pygments-linenums-mode: table
[[intro]]
Introduction
------------
*Anno* is a standalone Java application for managing accession-level data on
Genesys PGR - global database on PGR in world's genebanks. Genesys database is
accessible at https://www.genesys-pgr.org .
The application allows users to map their Excel, CSV or database SQL queries to Multi-Crop
Passport Descriptor format (MCPD) and push the mapped data to Genesys for publication.
*Genesys Sandbox*, a playground instance of the Genesys
database, is available for developers and integrators to validate and test their
configuration at https://sandbox.genesys-pgr.org
before pushing data to *live* Genesys servers.
include::sections/download.adoc[]
include::sections/workspace.adoc[]
include::sections/window.adoc[]
include::sections/settings.adoc[]
include::sections/datasources.adoc[]
include::sections/mapping.adoc[]
include::sections/push.adoc[]
include::sections/troubleshooting.adoc[]
include::sections/ack.adoc[]
[[version]]
== Version information
* Matija Obreza, Crop Trust: Initial document, January 2016.
=== Acknowledgements
*TBD*
[[datasources]]
== Data Sources
Anno is able to load data from Excel XLSX and CSV files and through database SQL queries.
Every data source *must* contain at least the following three "columns" that uniquely
identify an accession on Genesys:
. `INSTCODE`: the FAO WIEWS Institute code of the holding genebank
. `ACCENUMB`: Full identifier of the accession in your genebank
. `GENUS`: Genus of the accession
=== Excel and CSV files
To add an Excel or CSV file to the Project, click the "Add file" button in the Toolbar.
You will be prompted with an "Open file" dialog to select the source file to add to the project.
NOTE: Older versions of Excel files (with *xls* extension) are not supported.
Excel files may contain multiple sheets that will be listed as individual data source sheets.
CSV files contain only one sheet. The data sheets from source files are listed as sub-entries of
the source file. To open a data sheet and load the first 300 rows, double click the sheet name.
.Project with XLSX and CSV data sources
image::source-opened.png[role="text-center"]
Loading data from Excel files is straightforward and requires no further configuration. This is
not the case for CSV files. The file format of CSV files is much more flexible and may require
additional configuration before it loads correctly.
=== CSV file configuration
CSV files are plain text files and do not provide any information about the character encoding,
separator or quote character used to separate text strings from numbers.
NOTE: Use Excel XLSX files instead of CSV files when possible. Open the CSV in Excel, make sure
data is well formatted and save it in XLSX format.
When opening a CSV data source, Anno may not be able to load the file without you providing
information on formatting of the file in the *CSV* tab of the CSV data sheet.
[cols="1,4", options="header"]
.Configuration for CSV files: Formatting
|===
|Label|Description
|Character set|CSV files generated from databases usually default to the character set of the
operating system. Use "windows-1250" for files generated on Windows, "x-MacCentralEurope" for Mac OSX.
You will have to experiment (Reload) with different options.
|Separator|Pick comma (,) or tab (blank)
|Quote character|Pick single (') or double quote (")
|===
Even by providing the best settings for the CSV file, you cannot ensure that Anno will be able to read
all data correctly.
NOTE: Convert your CSV file to Excel XLSX format!
Click "Reload" to load the CSV file with the new settings.
=== CSV and Excel header row
Occasionally the data files contain additional rows at the start of the document that should be ignored by Anno.
[cols="1,4", options="header"]
.Configuration for data files: Headers
|===
|Label|Description
|Contains headers|Does your CSV file contain a header row?
|Header row index|What is the index of the header row? At the start or further down in the file?
|===
Click "Reload" to read the data with new settings. Make sure that headers are read correctly.
=== Databases
Databases have many advantages over CSV and Excel files and you are likely using a relational database
to manage your accession data. The application allows you to directly query any database system using
a valid JDBC driver that allows the application to connect to the RDBMS.
Click "Add database" in the Toolbar. This will add a JDBC connection to your database as a top-level
source element. You will be able to add individual, tailored SQL queries as actual Anno data sources.
.Adding a database as data source
image::source-database.png[role="text-center"]
[cols="1,4", options="header"]
.Add Database dialog
|===
|Label|Description
|Datasource type|Select the database type from the list of supported drivers: mysql, MS SQL Server, PostgreSQL, ODBC
|Datasource name|Provide a name for the database connection to be used as the top-level label of the data source
in the Project.
|Connection URL|Edit the JDBC connection string template. You will have to provide the database host name,
port and the database instance name.
|User and Password|Valid username and password to access the database.
|Connect|Attempt to connect to the database with provided settings.
|Download driver|Attempt to download the JDBC driver for the selected database type.
|===
Anno comes with mysql driver embedded, all other drivers
need to be downloaded separately. The *database type* determines which JDBC driver should be loaded.
NOTE: If your database type is not supported, contact helpdesk@genesys-pgr.org for assistance.
Click "Connect" to try to connect to the database. If all went well, you will be presented with the
prompt to add the database link to the Project. Otherwise fix the username, password and the JDBC connect
string (search engines are a good resource to find a valid JDBC connect string for your database!).
.Database successfully added as data source
image::database-connected.png[role="text-center"]
After the connection to the database is successfully established, the database connection is added as a
top-level data source. You are now able to add SQL queries as individual data sources to the project.
Right-click on the database data source and select "Add SQL query".
.Database successfully added as data source
image::database-addquery.png[role="text-center"]
This will create a data sheet entry under the database label, titled "Unnamed query". Double-click the entry in
the Project data source tree and update the query label and the SQL query itself.
.New database query screen
image::database-unnamed-query.png[role="text-center"]
Press the "Reload" button to load data from the database. This will refresh the contents of the data sheet.
NOTE: Save the Project file regularly.
You should start with a simple SQL query to the database and then create additional data sheets
in the Project as you query for additional accession data.
NOTE: All SQL queries need to include `INSTCODE`, `ACCENUMB` and `GENUS` columns! Use your SQL-JOIN-foo to write
an SQL query that includes this data.
.A dummy SQL query with core columns: `INSTCODE`, `ACCENUMB` and `GENUS`
image::database-basic.png[role="text-center"]
[[download]]
== Installing Anno
Anno is an open-source project, licensed under Apache License v2.
Anno requires the latest Java Run-Time Environment (JRE) to run.
NOTE: Make sure you don't allow installation of browser toolbars like Ask.com in the Java JRE installer.
Or any other changes to your default browser configuration.
WARNING: Disable Java in all Internet browsers on your computer. Nobody in their right mind would
still use Java applets in 2016.
If downloading pre-compiled binaries, make sure to download the latest version of Anno for your platform.
You probably have 64-bit CPU and JRE and should use the package labeled `x86_64`. `x86` is for the 32-bit JRE.
Download the package from the https://bitbucket.org/genesys2/anno-swt/downloads[downloads section]
extract if necessary and run the executable for your platform.
[cols="1,4"]
.Resources
|===
|Project page|https://bitbucket.org/genesys2/anno-swt
|Pre-compiled binaries|https://bitbucket.org/genesys2/anno-swt/downloads
|`git` repository URL|https://bitbucket.org/genesys2/anno-swt.git
|Issue tracker|https://bitbucket.org/genesys2/anno-swt/issues
|===
[[mapping]]
== Mapping to MCPD
Once you have successfully loaded a data sheet with data you wish to publish on Genesys,
you need to map the columns of your data sheet to Multi-crop Passport Descriptors listed
on the right side of the application window.
. Open the data sheet
. Click on the column heading label to load current column configuration
. Drag the descriptor from MCPD listing to the column configuration pane
After drag-and-dropping the MCPD descriptor in the column configuration pane, the *RDF term* field
will be populated with the descriptor URL (e.g. http://purl.org/germplasm/germplasmTerm#germplasmID).
NOTE: Your existing data must be compliant with MCPD for straightforward mapping!
=== Basic column mapping
Genesys requires the `INSTCODE`, `ACCENUMB` and `GENUS` for every accession in the data sheet.
Load your data sheet and make sure it contains these three columns.
. Click on the label of the column containing the *FAO WIEWS Institute Code of your genebank*
. Drag the `INSTCODE` descriptor from the MCPD list to the column definition pane
. Select the column containing your *genebank accession numbers*
. Drag the `ACCENUMB` descriptor from the MCPD list to the column definition pane
. Select the *genus* column of your accessions
. Drag the `GENUS` descriptor from the MCPD list to the column definition pane
.Mapping the `ACCENUMB` column
image::mapping-accenumb.png[role="text-center"]
Once these three columns are mapped, double-clicking on a row in the data sheet row listing will
display an alert dialog with the accession data in mapped Genesys JSON format.
.Preview of data in JSON format
image::mapping-json-basic.png[role="text-center"]
As you continue mapping other columns, more information will be included in the JSON preview dialog.
WARNING: An application error dialog will be displayed when mapping is incomplete!
=== Mapping your data to MCPD
*TBD*
=== Handling multiple values
MCPD standard specifies that multiple values may be provided for specific descriptors. One example of such data is
the MCPD `REMARKS` field. You may manage this data in one Excel column or use multiple columns
for each individual comment.
Anno allows you to specify whether a single column contains multiple values and allows you to specify how the
data should be split. Alternatively it allows you to map multiple columns of the data sheet to the same MCPD descriptor.
In both cases, the individual pieces of the data will be converted to an array of values.
=== Using regular expressions
*TBD*
[[push]]
== Pushing data to server
After you have mapped your data to MCPD and confirmed that the JSON looks "just fine",
you are ready to *push* the data to the Genesys server. See Configuration section for
details!
The *Push dialog* offers functions:
. Parse all
. Upload
. Remove
. Changing the logging level between DEBUG, INFO and WARN
.The Push dialog
image::push-dialog.png[role="text-center"]
=== Parse all
The "Parse all" action triggers a read-convert-parse operation of all records in the selected data
sheet. This is a useful operation that will check whether all of your data will correctly load, parse
and convert using the mapping definitions you have provided. Keep an eye on the log report
and fix your data before you attempt a *push* operation.
=== Upload
"Upload" will send the data from the selected data sheet to Genesys server specified in the Settings
dialog. You will be prompted with the Genesys server URL before you begin pushing updates to
the server.
=== Remove
"Remove" is an operation of last resort. It will remove accession records from the active database
to the archive. Only the three core columns must be mapped for the *delete* operation:
`INSTCODE`, `ACCENUMB` and `GENUS`.
WARNING: If applicable, Use `HISTORIC` flag instead of deleting records from Genesys.
Note that Genesys never actually deletes the accession data, it merely moves it to an archive that
remains accessible if the record is referenced by its `PURL`, the Permanent URL.
=== Log levels
The dialog toolbar allows you to toggle the log level from `DEBUG`, `INFO` to `WARN`. This determines the
level of detail rendered in the central log pane of the dialog. It defaults to `INFO`.
[[settings]]
== Connecting to Genesys server
The *Settings Dialog* allows you to configure the current Project and specify
which Genesys server will receive your data.
.Settings Dialog
image::settings-dialog.png[role="text-center"]
The settings are stored in the Project file and will be saved and loaded with the
rest of the project configuration.
[cols="1,4", options="header"]
.Settings Dialog
|===
|Label|Description
|Genesys Server URL|The base URL of the Genesys server instance to use for this project.
For testing purposes the Genesys Server URL must point to https://sandbox.genesys-pgr.org
For *production* use the live Genesys URL https://www.genesys-pgr.org
|Endpoints|Authorization and Token endpoints will automatically update when the server URL is
changed. Don't touch.
|Client API key and secret|Contact the helpdesk at helpdesk@genesys-pgr.org to obtain the valid
client key and secret. Different values are used for sandbox and production environments.
|Access and Refresh tokens|Authentication tokens used to identify you with Genesys server.
These are obtained by clicking "Authenticate" or loaded from the Project file.
|Clear tokens|Clears access and refresh token values. You will have to re-authenticate with the
server.
|Authenticate|Validates the current configuration against the server or asks you to authenticate
with the server.
|===
=== Authenticating with Genesys
A valid user account on Genesys is required. You may use your Google+ account or create the account
manually by providing a valid email address and an account password.
Make sure you have valid user accounts for the sandbox environment at https://sandbox.genesys-pgr.org/login
and the production servers at https://www.genesys-pgr.org/login
NOTE: You can use Google+ to create your user account on Genesys. You will not need to remember a separate password!
After you have obtained valid Client key and secret from helpdesk@genesys-pgr.org and created your
Genesys accounts, you can authenticate against the selected server (sandbox or production).
Click the "Authenticate" button. When the access and refresh tokens are missing or have expired,
the application will prompt you to authorize the
application's request to access Genesys on your behalf.
If tokens are still valid, their values in the
dialog will be updated with the message: "Tokens are up to date."
.Authentication Dialog
image::authentication-dialog.png[role="text-center"]
"Open link in browser" opens your default web browser application (Chrome, IE, Firefox, ...) and
Genesys prompts you to allow access to your resources. If you are not yet logged in
to Genesys, you will be prompted to log in before the confirmation dialog is displayed.
.Allow access
image::confirm-access.png[role="text-center"]
Select "Yes, allow access" and Genesys will generate a short-lived *verifier code* that you must copy and paste
to the *Verifier code* field in the Anno dialog. The verifier code is a 6-character string (e.g. `td1S83`).
Unless an error occurs or the verifier code times out, your access and refresh tokens will be updated.
After obtaining the tokens, *save the project* by clicking the "Save" button in the application Toolbar. Give
the project file a name that tells you which Genesys server (production or sandbox) you have selected.
[[trouble]]
== Troubleshooting
This tool is perfect! No way you have a problem! :-)
However, if you do run into trouble using this tool, contact helpdesk@genesys-pgr.org
for assistance and we will update the tool or this section of the documentation with
resolutions to commonly encountered problems.
[[window]]
== Window Layout
After workspace selection, the main application window is loaded. The window has four sections:
* Toolbar (top)
* List of Data sources (left)
* Data source view (center)
* MCPD descriptor list (right)
.Application Window
image::anno-blank.png[role="text-center"]
=== Toolbar
The toolbar provides access to top-level functions.
.Anno Toolbar
image::toolbar.png[role="text-center"]
[cols="1,4", options="header"]
.Toolbar buttons
|===
|Label|Description
|Load|Load an existing project file
|Save|Save the current project to a file
|Settings|Opens the Settings dialog
|Add file|Add a new data source file to the project
|Automap|Automatically map columns of the currently open dataset to MCPD descriptors
|Push|Opens a dialog to send data to Genesys
|Add database|Add a new database-backed data source to the project
|===
[[workspace]]
== Workspaces and Projects
Upon starting the application, you will be presented with a *Workspace Launcher* that
allows to create a new Workspace or load an existing Workspace from disk.
A *workspace* allows you to save your Anno configuration files and any JDBC drivers needed
to access your data in one location on your computer.
=== Creating a new Workspace
When you first start the application the you will need to create a new workspace to store
your configuration files.
.Workspace Launcher
image::workspace-launcher.png[role="text-center"]
Click the "Browse" button and navigate to the directory where you wish to create a new
workspace. You may need to create a new folder for the workspace.
Confirm your selection by pressing "OK".
NOTE: The application will check if the selected folder is empty.
=== Loading an existing workspace
Click the "Browse" button and navigate to the directory with your existing workspace data.
Confirm your selection by pressing "OK".
NOTE: The application will check if the selected folder is a valid Anno workspace folder.
.Invalid workspace folder selection
image::workspace-launcher-fail.png[role="text-center"]
=== Using the workspace
The workspace acts as a base directory for your configuration and data files. It is good practice
to copy the source files (CSV, Excel) to the workspace folder. This will help you better maintain
your settings and data files you publish on Genesys.
=== Under the hood
Anno creates a sub-folder "jdbc" in your workspace. This folder is used as a source location for
any JDBC drivers you may need to to access your databases.