Natural query syntax
Querying Genesys today requires a construction of a Filter
object which is not something one could type up from scratch without deep understanding of the JSON model. Users must rely on the Genesys user interface to create queries.
@crabil demonstrated the use and utility of a more natural query syntax that is used by publication repositories. One of the main benefits of such approach is the ability to disconnect the querying syntax from the underlying implementation. This provides forward compatibility with the next implementations of Genesys and assures that user queries can still be executed (maybe not 100%) the same way as with the previous versions.
Examples of natural queries:
-
cassava[crop]
equals{ "crop": [ "cassava" ] }
-
(cassava[crop]) OR (banana[crop])
:{ "crop": [ "cassava", "banana" ] }
-
((cassava[crop]) OR (banana[crop])) AND (cimmyt[holder])
:{ "crop": [ "cassava", "banana" ], "institute": { "code": [ "MEX002" ] } }
-
amarill*
:{ "_text": "amarill*" }
-
cass*[cropName]
:{ "cropName": { "sw": [ "cass" ] } }
(filter.crop
would needStringFilter
support) -
TMp[accessionNumber]
:{ "accessionNumber": { "sw": [ "TMp" ] } }
-
2020:2021[created]
is an example of a date range:{ "createdDate": { "ge": "2020-01-01T...", "lt": "2021-01-01T..." } }
13.4:44.8[Latitude]
Syntax
Query syntax supports parentheses, OR, AND and NOT keywords. Each expression is either targeting a specific field (by label) or is a general full-text search.
Date and number expressions are range expressions and use fromValue : toValue
format.
query := oneQuery | orQuery | andQuery | notQuery;
orQuery := "(" orQuery ")" | query "OR" query
andQuery := "(" andQuery ")" | query "AND" query;
notQuery := "(" notQuery ")" | "NOT" query;
oneQuery := "(" oneQuery ")" | termQuery; -- unwrap ()
termQuery := fulltextQuery | (dateQuery | numberQuery | stringQuery) termField; -- single term query
stringQuery := "\"" stringQuery "\"" | string;
string := keyword | keyword " " keyword; -- e.g. 123, abc, 1ab3, 123 abc defg
keyword := alphaNum; -- no whitespace
fulltextQuery := stringQuery; -- e.g. "This is a test", This is a test
dateQuery := dateStr ":" dateStr; -- e.g. "2021":"3000"
dateStr := ("\"" dateVal "\"") | dateVal; -- e.g. "2021-01-01" or 2021-01-01, "2021-JAN", "1991-07", ...
dateVal := year ("-" month ("-" day)?)?;
year := integer;
month := integer[1-12] | "JAN" | "FEB" ...;
day := integer[1-31];
numberQuery := number ":" number; -- e.g. 300:800
number := integer | float;
float := integer ("." integer);
termField := "[" fieldLabel "]"; -- e.g. [Holding Institute]
fieldLabel := /[a-z]([a-z ]*)/; -- alphanum and space
Field labels and translation to filters
Field labels in queries correspond to Filter
fields. This allows for declaring field aliases, e.g. Holding institute
, Genebank code
, Institute code
and WIEWS code
can all map to the same filter field institute.code
while Genebank
can map to institute.name
.
It is also possible to determine the target filter field by the query "value" itself. When the query value is in WIEWS code format, we can automatically switch from the default institute.name
to institute.code
field. Similarly, a search for CIMMYT
may be converted to MEX002
for more precise results when applicable, or CGIAR
may expand into all institute codes of that network.
More appropriate fields may be automatically targeted when user provides a query that we detect as a Country name (or ISO3166 code). This may map to the country of provenance of accession instead of just a general full-text search if no field is provided. Country names may also expand into their current and past ISO codes.
Providing "parsed" query to the user
Since the query is now a user-provided string, the server needs to parse the incoming query and provide sensible feedback to the user when the query syntax is misunderstood.
Luckily, antlr supports parsing warnings/errors that can be returned to the user and the query maybe still "understood" by the system.
UI changes
The implementation requires a change in the underlying filtering implementation (#615 (moved)) and will no longer neatly map to UI filtering elements.
"Filters" in a similar portal are implemented with a very simple filtering by tags and are purely boolean. In Genesys, comparable "filters" are the BooleanFilters
(i.e. georeferenced
, inMls
, available
, historical
etc.)