Commit 4eee61de authored by Paul Farah Cox's avatar Paul Farah Cox Committed by Matija Obreza

Anno manual updates by Paul Cox, Scriptoria

parent 3e872ce5
......@@ -21,17 +21,16 @@ Genesys Anno User Manual
Introduction
------------
Genesys upload tool *Anno* is a standalone Java application for managing accession-level data on
Genesys PGR - global database on PGR in world's genebanks. Genesys database is
accessible at https://www.genesys-pgr.org .
The Genesys upload tool *Anno* is a standalone Java application for managing accession-level data on
Genesys PGR - a global database on plant genetic resources in the world's genebanks. The Genesys database is
accessible at https://www.genesys-pgr.org.
The application allows users to map their Excel, CSV or database SQL queries to Multi-Crop
Passport Descriptor format (MCPD) and push the mapped data to Genesys for publication.
The Anno application allows users to map their Excel XLSX files, CSV files or database SQL queries to Multi-Crop
Passport Descriptor (MCPD) format and push the mapped data to Genesys for publication.
*Genesys Sandbox*, a playground instance of the Genesys
database, is available for developers and integrators to validate and test their
configuration at https://sandbox.genesys-pgr.org
before pushing data to *live* Genesys servers.
configuration before pushing data to *live* Genesys servers. It can be found at https://sandbox.genesys-pgr.org.
......
......@@ -4,3 +4,4 @@
* Matija Obreza, Crop Trust: Initial document, January 2016.
* Matija Obreza, Crop Trust: PDF version, October 2017.
* Paul Cox, Scriptoria: Copyedits, December 2017.
[[datasources]]
== Data Sources
== Data sources
Anno is able to load data from Excel XLSX and CSV files and through database SQL queries.
Every data source *must* contain at least the following three "columns" that uniquely
Every data source *must* contain at least the following three columns that uniquely
identify an accession on Genesys:
. `INSTCODE`: the FAO WIEWS Institute code of the holding genebank
. `ACCENUMB`: Full identifier of the accession in your genebank
. `GENUS`: Genus of the accession
. `INSTCODE`: FAO WIEWS Institute Code of the holding genebank.
. `ACCENUMB`: Full identifier of the accession in the genebank.
. `GENUS`: Genus of the accession.
=== Excel and CSV files
To add an Excel or CSV file to the Project, click the "Add file" button in the Toolbar.
To add an Excel XLSX or CSV file to the project, click the "Add file" button in the toolbar.
You will be prompted with an "Open file" dialog to select the source file to add to the project.
NOTE: Older versions of Excel files (with *xls* extension) are not supported.
NOTE: Older versions of Excel files (with the *.xls* extension) are not supported.
Excel files may contain multiple sheets that will be listed as individual data source sheets.
Excel files may contain multiple sheets, which will be listed as individual data source sheets.
CSV files contain only one sheet. The data sheets from source files are listed as sub-entries of
the source file. To open a data sheet and load the first 300 rows, double click the sheet name.
......@@ -33,40 +33,37 @@ additional configuration before it loads correctly.
CSV files are plain text files and do not provide any information about the character encoding,
separator or quote character used to separate text strings from numbers.
NOTE: Use Excel XLSX files instead of CSV files when possible. Open the CSV in Excel, make sure
data is well formatted and save it in XLSX format.
NOTE: Use Excel XLSX files instead of CSV files when possible. If you have a CSV file, we recommend opening it in Excel, making sure the
data is well formatted, and saving it in XLSX format.
When opening a CSV data source, Anno may not be able to load the file without you providing
information on formatting of the file in the *CSV* tab of the CSV data sheet.
When opening a CSV data source, Anno may not be able to load the file until you provide
information on the formatting of the file in the *CSV* tab of the data sheet.
[cols="1,4", options="header"]
.Configuration for CSV files: Formatting
|===
|Label|Description
|Character set|CSV files generated from databases usually default to the character set of the
operating system. Use "windows-1250" for files generated on Windows, "x-MacCentralEurope" for Mac OSX.
You will have to experiment (Reload) with different options.
|Separator|Pick comma (,) or tab (blank)
|Quote character|Pick single (') or double quote (")
operating system. Try "windows-1250" for files generated on Windows, or "x-MacCentralEurope" for Mac OSX.
|Separator|Pick comma (,) or tab (blank).
|Quote character|Pick single (') or double quote (").
|===
Even by providing the best settings for the CSV file, you cannot ensure that Anno will be able to read
all data correctly.
You may have to experiment with different options. After making any changes to formatting, click "Reload" to load the CSV file with the new settings.
NOTE: Convert your CSV file to Excel XLSX format!
Click "Reload" to load the CSV file with the new settings.
Even by providing the best settings for the CSV file, however, you cannot ensure that Anno will be able to read
all data correctly. For best results, convert your CSV file to Excel XLSX format.
=== CSV and Excel header row
Occasionally the data files contain additional rows at the start of the document that should be ignored by Anno.
Some data files contain header rows at the top of columns, or within sheets, that should be ignored by Anno.
[cols="1,4", options="header"]
.Configuration for data files: Headers
|===
|Label|Description
|Contains headers|Does your CSV file contain a header row?
|Header row index|What is the index of the header row? At the start or further down in the file?
|Contains headers|Indicate that your CSV file contains a header row.
|Header row index|Provide the index of the header row, whether it is at the top of a sheet or further down in the file.
|===
Click "Reload" to read the data with new settings. Make sure that headers are read correctly.
......@@ -74,12 +71,12 @@ Click "Reload" to read the data with new settings. Make sure that headers are re
=== Databases
Databases have many advantages over CSV and Excel files and you are likely using a relational database
to manage your accession data. The application allows you to directly query any database system using
a valid JDBC driver that allows the application to connect to the RDBMS.
Databases have many advantages over CSV and Excel files, and relational databases are often used
to manage accession data. Anno allows you to directly query any database system using
a valid JDBC driver that permits the application to connect to the relational database management system.
Click "Add database" in the Toolbar. This will add a JDBC connection to your database as a top-level
source element. You will be able to add individual, tailored SQL queries as actual Anno data sources.
Click "Add database" in the toolbar. This will add a JDBC connection to your database as a top-level
source element. You will be able to add individual, tailored SQL queries as Anno data sources.
.Adding a database as data source
image::source-database.png[role="text-center",pdfwidth=75%]
......@@ -89,26 +86,26 @@ image::source-database.png[role="text-center",pdfwidth=75%]
.Add Database dialog
|===
|Label|Description
|Datasource type|Select the database type from the list of supported drivers: mysql, MS SQL Server, PostgreSQL, ODBC
|Datasource type|Select the database type from the list of supported drivers: MySQL, MS SQL Server, PostgreSQL or ODBC.
|Datasource name|Provide a name for the database connection to be used as the top-level label of the data source
in the Project.
in the project.
|Connection URL|Edit the JDBC connection string template. You will have to provide the database host name,
port and the database instance name.
|User and Password|Valid username and password to access the database.
port and database instance name.
|User and password|Enter a valid username and password to access the database.
|Connect|Attempt to connect to the database with provided settings.
|Download driver|Attempt to download the JDBC driver for the selected database type.
|===
Anno comes with mysql driver embedded, all other drivers
Anno comes with an embedded MySQL driver; all other drivers
need to be downloaded separately. The *database type* determines which JDBC driver should be loaded.
NOTE: If your database type is not supported, contact helpdesk@genesys-pgr.org for assistance.
Click "Connect" to try to connect to the database. If all went well, you will be presented with the
prompt to add the database link to the Project. Otherwise fix the username, password and the JDBC connect
string (search engines are a good resource to find a valid JDBC connect string for your database!).
Click "Connect" to try to connect to the database. If all went well, you will be presented with a
prompt to add the database link to the project. Otherwise, check the username, password and the JDBC connect
string (search engines are a good resource to find a valid JDBC connect string for your database).
.Database successfully added as data source
.Database successfully added as a data source
image::database-connected.png[role="text-center",pdfwidth=50%]
After the connection to the database is successfully established, the database connection is added as a
......@@ -119,19 +116,19 @@ Right-click on the database data source and select "Add SQL query".
image::database-addquery.png[role="text-center",pdfwidth=40%]
This will create a data sheet entry under the database label, titled "Unnamed query". Double-click the entry in
the Project data source tree and update the query label and the SQL query itself.
the project data source tree and update the query label and the SQL query itself.
.New database query screen
image::database-unnamed-query.png[role="text-center"]
Press the "Reload" button to load data from the database. This will refresh the contents of the data sheet.
NOTE: Save the Project file regularly.
NOTE: Save your project file regularly.
You should start with a simple SQL query to the database and then create additional data sheets
in the Project as you query for additional accession data.
in the project as you query for additional accession data.
NOTE: All SQL queries need to include `INSTCODE`, `ACCENUMB` and `GENUS` columns! Use your SQL-JOIN-foo to write
NOTE: All SQL queries must include `INSTCODE`, `ACCENUMB` and `GENUS` columns. Where necessary, try using SQL JOIN clauses to write
an SQL query that includes this data.
......
......@@ -2,27 +2,26 @@
[[download]]
== Installing Anno
Anno is an open-source project, licensed under Apache License v2.
Anno is an open-source project, licensed under the Apache License v2.
Anno requires the Java 8 Run-Time Environment (JRE) to run.
NOTE: Make sure you don't allow installation of browser toolbars like Ask.com in the Java JRE installer.
Or any other changes to your default browser configuration.
NOTE: Make sure you don't allow the Java JRE installer to install browser toolbars like Ask.com or make any other changes to your default browser configuration.
WARNING: Disable Java in all Internet browsers on your computer. Java applet technology has been marked for deprecation and is seldom used today.
WARNING: Disable Java in all Internet browsers on your computer. Nobody in their right mind would
still use Java applets in 2016.
If downloading pre-compiled binaries, make sure to download the latest version of Anno for your platform.
You probably have 64-bit CPU and JRE and should use the package labeled `x86_64`. `x86` is for the 32-bit JRE.
Most users will have a 64-bit CPU and JRE, and should use the package labeled `x86_64`. The package `x86` is for the 32-bit JRE.
Download the package from the https://goo.gl/r5QFns[downloads section]
extract if necessary and run the executable for your platform.
Download the package from the https://goo.gl/r5QFns[downloads section].
Extract, if necessary, and run the executable for your platform.
[cols="1,4"]
.Resources
|===
|Project page|https://gitlab.croptrust.org/genesys-pgr/anno
|Pre-compiled binaries|https://goo.gl/r5QFns
|`git` repository URL|https://gitlab.croptrust.org/genesys-pgr/anno.git
|`git` repository|https://gitlab.croptrust.org/genesys-pgr/anno.git
|Issue tracker|https://gitlab.croptrust.org/genesys-pgr/anno/issues
|===
......@@ -3,14 +3,14 @@
== Mapping to MCPD
Once you have successfully loaded a data sheet with data you wish to publish on Genesys,
you need to map the columns of your data sheet to Multi-crop Passport Descriptors listed
you need to map the columns of your data sheet to Multi-Crop Passport Descriptors (MCPDs) listed
on the right side of the application window.
. Open the data sheet
. Click on the column heading label to load current column configuration
. Drag the descriptor from MCPD listing to the column configuration pane
. Open the data sheet.
. Click on the column heading label to load the current column configuration.
. Drag the descriptor from the MCPD listing to the column configuration pane.
After drag-and-dropping the MCPD descriptor in the column configuration pane, the *RDF term* field
After dragging and dropping the descriptor into the column configuration pane, the *RDF term* field
will be populated with the descriptor URL (e.g. http://purl.org/germplasm/germplasmTerm#germplasmID).
NOTE: Your existing data must be compliant with MCPD for straightforward mapping!
......@@ -20,12 +20,12 @@ NOTE: Your existing data must be compliant with MCPD for straightforward mapping
Genesys requires the `INSTCODE`, `ACCENUMB` and `GENUS` for every accession in the data sheet.
Load your data sheet and make sure it contains these three columns.
. Click on the label of the column containing the *FAO WIEWS Institute Code of your genebank*
. Drag the `INSTCODE` descriptor from the MCPD list to the column definition pane
. Select the column containing your *genebank accession numbers*
. Drag the `ACCENUMB` descriptor from the MCPD list to the column definition pane
. Select the *genus* column of your accessions
. Drag the `GENUS` descriptor from the MCPD list to the column definition pane
. Click on the label of the column containing the *FAO WIEWS Institute Code of your genebank*.
. Drag the `INSTCODE` descriptor from the MCPD list to the column configuration pane.
. Select the column containing your *genebank accession numbers*.
. Drag the `ACCENUMB` descriptor from the MCPD list to the column configuration pane.
. Select the *genus* column of your accessions.
. Drag the `GENUS` descriptor from the MCPD list to the column configuration pane.
.Mapping the `ACCENUMB` column
......@@ -48,12 +48,12 @@ WARNING: An application error dialog will be displayed when mapping is incomplet
=== Handling multiple values
MCPD standard specifies that multiple values may be provided for specific descriptors. One example of such data is
The MCPD standard specifies that multiple values may be provided for specific descriptors. One example of such data is
the MCPD `REMARKS` field. You may manage this data in one Excel column or use multiple columns
for each individual comment.
Anno allows you to specify whether a single column contains multiple values and allows you to specify how the
data should be split. Alternatively it allows you to map multiple columns of the data sheet to the same MCPD descriptor.
Anno allows you to specify whether a single column contains multiple values, as well as how the
data should be split. Alternatively, it allows you to map multiple columns of the data sheet to the same descriptor.
In both cases, the individual pieces of the data will be converted to an array of values.
......
[[push]]
== Pushing data to server
== Pushing data to the server
After you have mapped your data to MCPD and confirmed that the JSON looks "just fine",
you are ready to *push* the data to the Genesys server. See Configuration section for
After you have mapped your data to MCPD and confirmed that the JSON looks correct,
you are ready to *push* the data to the Genesys server. See the Configuration section for
details!
The *Push dialog* offers functions:
The *Push dialog* offers four functions:
. Parse all
. Upload
. Remove
. Changing the logging level between DEBUG, INFO and WARN
. Change the log level
.The Push dialog
image::push-dialog.png[role="text-center"]
......@@ -20,31 +20,29 @@ image::push-dialog.png[role="text-center"]
The "Parse all" action triggers a read-convert-parse operation of all records in the selected data
sheet. This is a useful operation that will check whether all of your data will correctly load, parse
and convert using the mapping definitions you have provided. Keep an eye on the log report
and fix your data before you attempt a *push* operation.
and convert using the mapping definitions you have provided. Keep an eye on the log report
and fix your data as needed before you attempt a *push* operation.
=== Upload
"Upload" will send the data from the selected data sheet to Genesys server specified in the Settings
"Upload" will send the data from the selected data sheet to the Genesys server specified in the Settings
dialog. You will be prompted with the Genesys server URL before you begin pushing updates to
the server.
=== Remove
"Remove" is an operation of last resort. It will remove accession records from the active database
to the archive. Only the three core columns must be mapped for the *delete* operation:
"Remove" is an operation of last resort. It will move accession records from the active Genesys database
to the archive. Only the three core columns must be mapped to successfully perform the operation:
`INSTCODE`, `ACCENUMB` and `GENUS`.
WARNING: If applicable, Use `HISTORIC` flag instead of deleting records from Genesys.
WARNING: If applicable, use the `HISTORIC` flag instead of removing records from Genesys.
Note that Genesys never actually deletes the accession data, it merely moves it to an archive that
Note that Genesys never actually deletes accession data; it merely moves it to an archive that
remains accessible if the record is referenced by its `PURL`, the Permanent URL.
=== Log levels
The dialog toolbar allows you to toggle the log level from `DEBUG`, `INFO` to `WARN`. This determines the
The dialog toolbar allows you to toggle the log level between `DEBUG`, `INFO` and `WARN`. This determines the
level of detail rendered in the central log pane of the dialog. It defaults to `INFO`.
[[settings]]
== Connecting to Genesys server
== Connecting to a Genesys server
The *Settings Dialog* allows you to configure the current Project and specify
The *Settings dialog* allows you to configure the current project and specify
which Genesys server will receive your data.
.Settings Dialog
.Settings dialog
image::settings-dialog.png[role="text-center",pdfwidth=75%]
The settings are stored in the Project file and will be saved and loaded with the
The settings are stored in the project file and will be saved and loaded with the
rest of the project configuration.
[cols="1,4", options="header"]
.Settings Dialog
.Settings dialog
|===
|Label|Description
|Genesys Server URL|The base URL of the Genesys server instance to use for this project.
|Genesys server URL|The base URL of the Genesys server instance to use for this project.
For testing purposes the Genesys Server URL must point to https://sandbox.genesys-pgr.org
For testing purposes, the Genesys server URL should point to https://sandbox.genesys-pgr.org.
For *production* use the live Genesys URL https://www.genesys-pgr.org
|Endpoints|Authorization and Token endpoints will automatically update when the server URL is
changed. Don't touch.
For *production* use, it should point to the live Genesys URL, https://www.genesys-pgr.org.
|Endpoints|Authorization and token endpoints will automatically update when the server URL is
changed. Do not modify these.
|Client API key and secret|Contact the helpdesk at helpdesk@genesys-pgr.org to obtain the valid
client key and secret. Different values are used for sandbox and production environments.
|Access and Refresh tokens|Authentication tokens used to identify you with Genesys server.
These are obtained by clicking "Authenticate" or loaded from the Project file.
|Scope|Scopes granted to Anno to manage data on your behalf on Genesys.
client key and secret. Different values are used for the sandbox and production environments.
|Access and refresh tokens|Authentication tokens are used to identify you to the Genesys server.
These are obtained by clicking "Authenticate" or are loaded from the project file.
|Scope|Scopes are granted to Anno to manage data on your behalf on Genesys.
The scope must be: `write`
|Clear tokens|Clears access and refresh token values. You will have to re-authenticate with the
The scope must be: `write`.
|Clear tokens|Clear access and refresh token values. You will have to re-authenticate with the
server.
|Authenticate|Validates the current configuration against the server or asks you to authenticate
|Authenticate|Validate the current configuration against the server or authenticate
with the server.
|===
......@@ -40,13 +40,13 @@ rest of the project configuration.
A valid user account on Genesys is required. You may use your Google+ account or create the account
manually by providing a valid email address and an account password.
Make sure you have valid user accounts for the sandbox environment at https://sandbox.genesys-pgr.org/login
and the production servers at https://www.genesys-pgr.org/login
and the production servers at https://www.genesys-pgr.org/login.
NOTE: You can use Google+ to create your user account on Genesys. You will not need to remember a separate password!
After you have obtained valid Client key and secret from helpdesk@genesys-pgr.org and created your
Genesys accounts, you can authenticate against the selected server (sandbox or production).
After you have obtained a valid client key and secret from helpdesk@genesys-pgr.org and created your
Genesys account, you can authenticate against the selected server (sandbox or production).
Click the "Authenticate" button. When the access and refresh tokens are missing or have expired,
the application will prompt you to authorize the
......@@ -55,12 +55,11 @@ application's request to access Genesys on your behalf.
If tokens are still valid, their values in the
dialog will be updated with the message: "Tokens are up to date."
.Authentication Dialog
.Authentication dialog
image::authentication-dialog.png[role="text-center",pdfwidth=75%]
"Open link in browser" opens your default web browser application (Chrome, IE, Firefox, ...) and
Genesys prompts you to allow access to your resources. If you are not yet logged in
"Open link in browser" opens your default web browser (Chrome, IE, Firefox...) and prompts you to allow Genesys access to your resources. If you are not yet logged in
to Genesys, you will be prompted to log in before the confirmation dialog is displayed.
......@@ -68,8 +67,8 @@ to Genesys, you will be prompted to log in before the confirmation dialog is dis
image::confirm-access.png[role="text-center",pdfwidth=75%]
Select "Yes, allow access" and Genesys will generate a short-lived *verifier code* that you must copy and paste
to the *Verifier code* field in the Anno dialog. The verifier code is a 6-character string (e.g. `td1S83`).
to the *Verifier code* field in the Authentication dialog. The verifier code is a 6-character string (e.g. `td1S83`).
Unless an error occurs or the verifier code times out, your access and refresh tokens will be updated.
After obtaining the tokens, *save the project* by clicking the "Save" button in the application Toolbar. Give
After obtaining the tokens, *save the project* by clicking the "Save" button in the application toolbar. Give
the project file a name that tells you which Genesys server (production or sandbox) you have selected.
......@@ -4,9 +4,6 @@
This tool is perfect! No way you have a problem! :-)
However, if you do run into trouble using this tool, contact helpdesk@genesys-pgr.org
for assistance and we will update the tool or this section of the documentation with
resolutions to commonly encountered problems.
However, if you do run into trouble using this tool, contact helpdesk@genesys-pgr.org
for assistance. Based on your feedback, we will update the tool or update this section of the documentation with
resolutions to commonly encountered problems.
[[window]]
== Window Layout
== Window layout
After workspace selection, the main application window is loaded. The window has four sections:
* Toolbar (top)
* List of Data sources (left)
* List of data sources (left)
* Data source view (center)
* MCPD descriptor list (right)
.Application Window
.Application window
image::anno-blank.png[role="text-center"]
......@@ -17,18 +17,18 @@ image::anno-blank.png[role="text-center"]
The toolbar provides access to top-level functions.
.Anno Toolbar
.Anno toolbar
image::toolbar.png[role="text-center",pdfwidth=50%]
[cols="1,4", options="header"]
.Toolbar buttons
|===
|Label|Description
|Load|Load an existing project file
|Save|Save the current project to a file
|Settings|Opens the Settings dialog
|Add file|Add a new data source file to the project
|Automap|Automatically map columns of the currently open dataset to MCPD descriptors
|Push|Opens a dialog to send data to Genesys
|Add database|Add a new database-backed data source to the project
|Load|Load an existing project file.
|Save|Save the current project to a file.
|Settings|Open the Settings dialog.
|Add file|Add a new data source file to the project.
|Automap|Automatically map columns of the currently open dataset to MCPD descriptors.
|Push|Open a dialog to send data to Genesys.
|Add database|Add a new database-backed data source to the project.
|===
[[workspace]]
== Workspaces and Projects
== Workspaces and projects
Upon starting the application, you will be presented with a *Workspace Launcher* that
allows to create a new Workspace or load an existing Workspace from disk.
allows you to create a new workspace or load an existing workspace from disk.
A *workspace* allows you to save your Anno configuration files and any JDBC drivers needed
to access your data in one location on your computer.
=== Creating a new Workspace
=== Creating a new workspace
When you first start the application the you will need to create a new workspace to store
When you first start the application, you will need to create a new workspace to store
your configuration files.
.Workspace Launcher
......@@ -36,24 +36,24 @@ image::workspace-launcher-fail.png[role="text-center", pdfwidth=75%]
=== Using the workspace
The workspace acts as a base directory for your configuration and data files. It is good practice
to copy the source files (CSV, Excel) to the workspace folder. This will help you better maintain
to copy the source files (CSV, XLSX) to the workspace folder. This will help you better maintain
your settings and data files you publish on Genesys.
=== Under the hood
Anno creates a sub-folder "jdbc" in your workspace. This folder is used as a source location for
any JDBC drivers you may need to to access your databases.
Anno creates a sub-folder named "jdbc" in your workspace. This folder is used as a source location for
any JDBC drivers you may need to access your databases.
[[project]]
=== Projects
Anno allows you to manage the settings, data sources and data mapping in *Project* files. A project file
Anno allows you to manage the settings, data sources and data mapping in *project* files. A project file
contains:
. Server settings, including Genesys server URL, application keys and secrets
. Server settings, including the Genesys server URL, application keys and secrets
. Data sources: CSV, Excel and database queries
. Column configuration and mapping to MCPD
It is good practice to maintain one project file has the configuration used to test the data and push it to the Genesys Sandbox environment,
and a separate project file is used to publish data to the Genesys production servers.
It is good practice to maintain one project file with the configuration used to test the data and push it to the Genesys Sandbox environment,
and a separate project file to publish data to the Genesys production servers.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment