Improve AccessionRefs
Run the unit test AccessionControllerTest#getAccessionsTest
and observe that the performance of AccessionRefAspect
is horribly slow:
WARN AccessionRefAspect:112 - Re-referencing AccessionRefs for 2000 accessions took 25291ms
AccessionRef
is now embedded and therefore the code must check every single Dataset and Subset individually, load all data, stream and process it for each individual accession every time an accession record is persisted.
AccessionRefAspect
must execute in a few milliseconds. It is a very important aspect and it affects the speed of uploading data to Genesys.
Ideally the aspect would only run
update accession_ref set accession_id = ? where instCode = ? and ((accenumb = ? and genus = ?) or (doi = ?))
@Entity
We need direct access to accession references so that we can do a direct query and update. The code should allow for single and List batch updating of references.
I see two options and am looking for ideas.
Option 1
Remove @Embeddable
from AccessionRef
and make it abstract and convert subset_accessions
and dataset_accessions
to two separate @Entity
classes: DatasetAccessionRef
and SubsetAccessionRef
that extend AccessionRef. This way the primary key can remain on instcode, genus, species and dataset/subset ID.
No data migration is required, only code changes.
Option 2
Convert AccessionRef
to @Entity
, add private Subset subset
and private Dataset dataset
(only one can be set). Liquibase needs to move data to the new table.
Subset
SubsetController now accepts a list of accession UUIDs, but it should accept a list of AccessionRefs (same as Dataset).