datasources.adoc 6.06 KB
Newer Older
Matija Obreza's avatar
Matija Obreza committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141

[[datasources]]
== Data Sources

Anno is able to load data from Excel XLSX and CSV files and through database SQL queries. 
Every data source *must* contain at least the following three "columns" that uniquely
identify an accession on Genesys:

. `INSTCODE`: the FAO WIEWS Institute code of the holding genebank
. `ACCENUMB`: Full identifier of the accession in your genebank
. `GENUS`: Genus of the accession  

=== Excel and CSV files

To add an Excel or CSV file to the Project, click the "Add file" button in the Toolbar.
You will be prompted with an "Open file" dialog to select the source file to add to the project.

NOTE: Older versions of Excel files (with *xls* extension) are not supported. 

Excel files may contain multiple sheets that will be listed as individual data source sheets.
CSV files contain only one sheet. The data sheets from source files are listed as sub-entries of 
the source file. To open a data sheet and load the first 300 rows, double click the sheet name.

.Project with XLSX and CSV data sources
image::source-opened.png[role="text-center"]

Loading data from Excel files is straightforward and requires no further configuration. This is
not the case for CSV files. The file format of CSV files is much more flexible and may require
additional configuration before it loads correctly.

=== CSV file configuration

CSV files are plain text files and do not provide any information about the character encoding,
separator or quote character used to separate text strings from numbers. 

NOTE: Use Excel XLSX files instead of CSV files when possible. Open the CSV in Excel, make sure 
data is well formatted and save it in XLSX format.

When opening a CSV data source, Anno may not be able to load the file without you providing 
information on formatting of the file in the *CSV* tab of the CSV data sheet.

[cols="1,4", options="header"] 
.Configuration for CSV files: Formatting
|===
|Label|Description
|Character set|CSV files generated from databases usually default to the character set of the 
	operating system. Use "windows-1250" for files generated on Windows, "x-MacCentralEurope" for Mac OSX.
	You will have to experiment (Reload) with different options.
|Separator|Pick comma (,) or tab (blank)
|Quote character|Pick single (') or double quote (")
|===

Even by providing the best settings for the CSV file, you cannot ensure that Anno will be able to read 
all data correctly.

NOTE: Convert your CSV file to Excel XLSX format!

Click "Reload" to load the CSV file with the new settings.

=== CSV and Excel header row

Occasionally the data files contain additional rows at the start of the document that should be ignored by Anno.

[cols="1,4", options="header"] 
.Configuration for data files: Headers
|===
|Label|Description
|Contains headers|Does your CSV file contain a header row?
|Header row index|What is the index of the header row? At the start or further down in the file?
|===

Click "Reload" to read the data with new settings. Make sure that headers are read correctly.


=== Databases

Databases have many advantages over CSV and Excel files and you are likely using a relational database
to manage your accession data. The application allows you to directly query any database system using
a valid JDBC driver that allows the application to connect to the RDBMS.

Click "Add database" in the Toolbar. This will add a JDBC connection to your database as a top-level
source element. You will be able to add individual, tailored SQL queries as actual Anno data sources.

.Adding a database as data source
image::source-database.png[role="text-center"]


[cols="1,4", options="header"] 
.Add Database dialog
|===
|Label|Description
|Datasource type|Select the database type from the list of supported drivers: mysql, MS SQL Server, PostgreSQL, ODBC
|Datasource name|Provide a name for the database connection to be used as the top-level label of the data source
	in the Project.
|Connection URL|Edit the JDBC connection string template. You will have to provide the database host name, 
	port and the database instance name.
|User and Password|Valid username and password to access the database.
|Connect|Attempt to connect to the database with provided settings.
|Download driver|Attempt to download the JDBC driver for the selected database type.
|===

Anno comes with mysql driver embedded, all other drivers
need to be downloaded separately. The *database type* determines which JDBC driver should be loaded.

NOTE: If your database type is not supported, contact helpdesk@genesys-pgr.org for assistance.

Click "Connect" to try to connect to the database. If all went well, you will be presented with the
prompt to add the database link to the Project. Otherwise fix the username, password and the JDBC connect
string (search engines are a good resource to find a valid JDBC connect string for your database!).

.Database successfully added as data source
image::database-connected.png[role="text-center"]

After the connection to the database is successfully established, the database connection is added as a 
top-level data source. You are now able to add SQL queries as individual data sources to the project.
Right-click on the database data source and select "Add SQL query".

.Database successfully added as data source
image::database-addquery.png[role="text-center"]

This will create a data sheet entry under the database label, titled "Unnamed query". Double-click the entry in
the Project data source tree and update the query label and the SQL query itself.

.New database query screen
image::database-unnamed-query.png[role="text-center"]

Press the "Reload" button to load data from the database. This will refresh the contents of the data sheet.

NOTE: Save the Project file regularly. 

You should start with a simple SQL query to the database and then create additional data sheets 
in the Project as you query for additional accession data.

NOTE: All SQL queries need to include `INSTCODE`, `ACCENUMB` and `GENUS` columns! Use your SQL-JOIN-foo to write
	an SQL query that includes this data.


.A dummy SQL query with core columns: `INSTCODE`, `ACCENUMB` and `GENUS`
image::database-basic.png[role="text-center"]