Guide to Search Updates in v4.1.0

Keyword Search

When conducting a search by keyword (a single word), the system will return results that either exactly match the specified word or are closely related (e.g., "SPM.5" might also return "SPM.4", and "test" might return "tests"). It is important to note that this functionality does not employ an n-gram search approach. By default, the search query is structured as follows:

("%s")^2 OR summary.title:(%s)^1.2 OR summary.keywords:(%s)^1.2 OR (%s)^1


This query prioritizes results in the following order: an exact match, a match in the title, a match in keywords, and lastly, a match in all other searchable fields.

https://www.loom.com/share/f1dd496714a844bab0d08f88683dd57b?sid=0578d1ea-c8a8-448f-96e8-ee9ca07ecbfc

1-20240327-121727.jpeg

For those cases where only an exact match is desired, it is possible to enclose the search term in quotes. This action forces the system to return only documents where the exact phrase is found.

It's worth noting that typically, enclosing a single word in quotes is unnecessary. However, due to the use of a standard analyzer, tokens such as "SPM.5" are split into separate words ("SPM" and "5"). Therefore, to ensure results only contain the exact match, enclosing the search term in quotes is recommended.

2-20240327-122114.png

Search by Phrase

When you initiate a search using a phrase (a group of words), the system is designed to return results that contain an exact match of the provided phrase. This means that every word in your search phrase will be considered in the sequence you've provided, ensuring the results are as relevant as possible to your query. For instance, screenshots included in the documentation illustrate how the first result displays an exact match of the search phrase, while subsequent results may only contain parts of the search phrase or the terms in a different order.

 

To refine your search to return only exact matches, it is recommended to enclose your search phrase in quotes. This approach directs the search engine to look for documents where the phrase appears exactly as entered.

 

Synonym Search

The synonym search functionality is configured within the application.yml file, which defines groups of synonymous terms to enhance search flexibility and accuracy. Presently, there is one group of synonyms established, comprising two terms: "MDW" and "MetadataWorks." This configuration ensures that a search for "MDW" will yield results containing both "MDW" and "MetadataWorks," treating them as equivalent terms in the context of the search.

 

 

 

Searching Structural Metadata: Has/Has Not

When conducting a search that involves structural metadata (either specifying the presence or absence of certain data), it is crucial to use the correct syntax to ensure the desired results are obtained. To indicate that your query utilizes the search syntax, precede it with the rs: prefix

 


From the data presented in the screenshots, we can derive the following insights:

  • Datasets Without Structural Metadata: There are a total of 2,673 datasets identified on the staging site that lack structural metadata.

  • Datasets With Structural Metadata: In contrast, 49 datasets have been enriched with structural metadata.

  • Total Datasets Overview: Cumulatively, the staging site hosts 2,722 datasets, encompassing both those with and without structural metadata.

 

Searching Within a Data Element or Data Class

Search specifically within a data element or data class by using the advanced search equivalent.

Key Usage Notes:

  • Equivalence: This method matches selecting a data element or class in the advanced search.

  • Syntax: Start your search with "rs:" to ensure it's conducted within your chosen category.

 

Searching with Multiple-Word Terms

Search for multiple words, and the engine will show results matching any of those words.

Result Ranking: Results are ranked by relevance. Those with more matches to your terms appear first.

 

 

Using Wildcard in Searches

To search for multiple similar keywords, like "CPM1," "CPM2," and "CPM10," use a wildcard character. For example, searching for "CPM?" will return all items that match this pattern, capturing variants like CPM1, CPM2, etc., in one query.

 

Using Boolean Operators

Enhance your searches with Boolean operators: AND, OR, NOT. Remember, searches are case-sensitive and require the "rs:" prefix for these operations.

Key Points:

  • Use "AND" to find items containing all specified terms.

  • Use "OR" to find items containing any of the specified terms.

  • Use "NOT" to exclude items containing the specified term.



Â