Definitions coming from various partners will differ in descriptions, spelling and codes. Curating the descriptor catalog will become a more difficult task with more descriptors added to the database.
We should introduce a descriptor similarity matrix where every descriptor is compared to all others in the system. Each comparison is a vector of multiple scores: text (title and description) similarity, codes similarity, etc.
Upon insert or update of a description trigger an update of the similarity matrix.
Using similarity matrix
- We will be able to display similar descriptors and perhaps even merge descriptor definitions.
- That may prevent multiple entries, or lead to development of common descriptors
- It will allow for quick identification of candidates for mapping onto common super-descriptors.