Analyzers
In this article you will find:
- 1 Analyzers
- 1.1 Examples
- 1.2 Special Characters
- 1.3 Content Data
- 2 Types of Analyzers
- 2.1 Hawksearch Analyzer
- 2.2 Snowball Analyzer
- 2.3 Standard Analyzer
- 2.4 Simple Analyzer
- 2.5 Stop Analyzer
- 2.6 White Space Analyzer
- 2.7 Language Analyzer
Analyzers
The Analyzer feature is available after the Query this Field is enabled. Our recommended default analyzer is the Hawksearch Analyzer. There may be situations where the configuration may need to be changed.
Examples
This section outlines examples of why the business user would override the default indexing configuration “Hawksearch Analyzer”
Special Characters
One of the data fields possess special characters. The user may search with or without the special characters. The business team would like the products to be found for either scenario. The default configuration removes all special characters. To accommodate this requirement the business user will need to index the data field in two different ways.
This will require the business user adds the field twice.
Once with the default configuration “Hawksearch Analyzer”. No analyzer is selected.
Once with the “White Space Analyzer”
Note: when the field is created, it will need to be created with two different name. It is important that the naming convention assist with identifying the configuration of the data. Examples: item_number and item_number_special_characters.
Content Data
The website may have a separate landing page for articles. The article records may possess a body data field. The data in this field may need less manipulation than the other data fields. It may be desired to do make limited changes to the existing data. The only adjusts to the data field required are; diving text at non-letter characters and lowercasing the words. If this is the case, then the Simple Analyzer would be utilized versus the default.
Types of Analyzers
This section outlines the available analyzers. There are five analyzer options that are available for the business user to select. The selection of an analyzer will override the default indexing configuration “Hawksearch Analyzer”. Each analyzer is explained below.
Hawksearch Analyzer
This is the default analyzer applied when the field is set to be queried. The Hawksearch Analyzer has the same properties as the Snowball Analyzer but will take into account for synonyms. This is the only analyzer that uses the synonyms configured in the Hawksearch workbench. This is the default analyzer used for Fields on for query.
Snowball Analyzer
The stemming step converts a word into its stem. For example, if the word “climbing” is entered, the analyzer would convert the word to “climb” and use that to search a field with the Snowball Analyzer set on it. It would return items that contained climber, climbing, and climb. It is possible that information can be lost describing the original form of your text. For example, the terms universe, university and universal all stem to the same root, “univers” and would all return the same results. This is likely not be the desired result.
Description:
Stemming is applied
Stop words are removed
Colons, #, %, $, parentheses, and slashes are removed
Removes underscores, hyphens, @, and & symbols unless they are part of words or numbers
Remove apostrophe if it is (a) at the beginning of a word, (b) at the end of a word, or (c) followed by the letter s
Separates numbers from text when numbers are at the beginning of a word
Letter characters are converted to lowercase
Best Used For:
Fields that have content consisting of multiple versions of a word.
Standard Analyzer
Description:
Separates text “smartly”, accounting for the following lexical types
alphanumerics
acronyms
company names
email addresses
computer hostnames
numbers
words with an interior apostrophe
serial numbers
IP addresses
Chinese and Japanese characters
Stop words are removed
Letter characters are converted to lowercase
No stemming applied
Best Used For:
Searching English words such as product name or short descriptions as well as fields with the values listed above.
Simple Analyzer
Description:
Separates text at non-letter characters and removes all non-letter characters
Letter characters are converted to lowercase
No stop words are removed
No stemming applied
Best Used For:
Fields that only have alphabetical characters and don’t need the advanced interpretation of the Standard Analyzer. For example, consider a field that stores famous 1-line quotes that will be queried. If a user searches “to be, or not to be” removing the standard stop words would leave nothing to search on. Additionally, if stemming were applied to this field, the results would not be as relevant as they would be without stemming. In a case like this, the Simple Analyzer makes a good choice.
Stop Analyzer
Description:
Stop words are removed
Divides text at non-letter characters and removes all non-letter characters
Letter characters are converted to lowercase
No stemming applied
Best Used For:
When a simple, text-only analyzer is needed that also removes stop words. This should be used on fields that are intended to only have values made up of alphabetic characters.
Example:
Stop word: with
Original Product Name: Men's Long Mesh Short With Pockets
Product name with Stop Analyzer implemented: men s long mesh short pockets
White Space Analyzer
Description:
Search terms divided at whitespace
No characters are removed
No characters are converted to lowercase
No stop words are removed
No stemming applied
Best Used For:
Searching by exactly what is entered by user. This could be useful on a field that may be queried with terms that are both proper names and common nouns such as: polish vs. Polish, bill vs. Bill, case vs. Case.
Language Analyzer
Description:
Handling stemming for various languages to return relevant and correct results out of the box
Handling searches with special characters/accents that various languages offer (e.g.: German searches that include ä ö ü ß)
Supporting 20 languages for which analyzer is available through Hawksearch dashboard
Best Used For:
Needing to display information on your website on many languages other than English
Return relevant results in various languages
If you currently do not use our language analyzer and your search relevancy is fine, no changes are needed. However, if there are some cases that you believe could improve/be solved by applying Language analyzer, we’d recommend trying it out on Dev and testing out results. Also, if you start using Language analyzer, we strongly recommend to perform a search tuning exercise as this analyzer directly impacts relevancy. Please reach out to our Support or Client Success teams if you have any questions.