Analyzers

In this article you will find:


Analyzers

The Analyzer feature is available after the Query this Field is enabled. Our recommended default analyzer is the Hawksearch Analyzer. There may be situations where the configuration may need to be changed.

Examples

This section outlines examples of why the business user would override the default indexing configuration “Hawksearch Analyzer”

Special Characters

One of the data fields possess special characters. The user may search with or without the special characters. The business team would like the products to be found for either scenario. The default configuration removes all special characters. To accommodate this requirement the business user will need to index the data field in two different ways.  

  • This will require the business user adds the field twice.

    • Once with the default configuration “Hawksearch Analyzer”.  No analyzer is selected.

    • Once with the “White Space Analyzer”

Note: when the field is created, it will need to be created with two different name. It is important that the naming convention assist with identifying the configuration of the data. Examples: item_number and item_number_special_characters.

Content Data

The website may have a separate landing page for articles. The article records may possess a body data field. The data in this field may need less manipulation than the other data fields. It may be desired to do make limited changes to the existing data. The only adjusts to the data field required are; diving text at non-letter characters and lowercasing the words. If this is the case, then the Simple Analyzer would be utilized versus the default.


Types of Analyzers

This section outlines the available analyzers. There are five analyzer options that are available for the business user to select. The selection of an analyzer will override the default indexing configuration “Hawksearch Analyzer”. Each analyzer is explained below.

Hawksearch Analyzer

This is the default analyzer applied when the field is set to be queried. The Hawksearch Analyzer has the same properties as the Snowball Analyzer but will take into account for synonyms. This is the only analyzer that uses the synonyms configured in the Hawksearch workbench. This is the default analyzer used for Fields on for query. 

Snowball Analyzer  

The stemming step converts a word into its stem.  For example, if the word “climbing” is entered, the analyzer would convert the word to “climb” and use that to search a field with the Snowball Analyzer set on it.  It would return items that contained climber, climbing, and climb.  It is possible that information can be lost describing the original form of your text.  For example, the terms universe, university and universal all stem to the same root, “univers” and would all return the same results.  This is likely not be the desired result.

Description:

  • Stemming is applied

  • Stop words are removed

  • Colons, #, %, $, parentheses, and slashes are removed

  • Removes underscores, hyphens, @, and & symbols unless they are part of words or numbers

  • Remove apostrophe if it is (a) at the beginning of a word, (b) at the end of a word, or (c) followed by the letter s

  • Separates numbers from text when numbers are at the beginning of a word

  • Letter characters are converted to lowercase

Best Used For:

Fields that have content consisting of multiple versions of a word. 

Standard Analyzer 


Description:

  • Separates text “smartly”, accounting for the following lexical types

    • alphanumerics

    • acronyms

    • company names

    • email addresses

    • computer hostnames

    • numbers

    • words with an interior apostrophe

    • serial numbers

    • IP addresses

    • Chinese and Japanese characters

  • Stop words are removed

  • Letter characters are converted to lowercase

  • No stemming applied

Best Used For:

Searching English words such as product name or short descriptions as well as fields with the values listed above.

Simple Analyzer


Description:

  • Separates text at non-letter characters and removes all non-letter characters

  • Letter characters are converted to lowercase

  • No stop words are removed

  • No stemming applied

Best Used For:

Fields that only have alphabetical characters and don’t need the advanced interpretation of the Standard Analyzer.  For example, consider a field that stores famous 1-line quotes that will be queried.  If a user searches “to be, or not to be” removing the standard stop words would leave nothing to search on.  Additionally, if stemming were applied to this field, the results would not be as relevant as they would be without stemming.  In a case like this, the Simple Analyzer makes a good choice.

Stop Analyzer


Description:

  • Stop words are removed

  • Divides text at non-letter characters and removes all non-letter characters

  • Letter characters are converted to lowercase

  • No stemming applied

Best Used For:

When a simple, text-only analyzer is needed that also removes stop words.  This should be used on fields that are intended to only have values made up of alphabetic characters.

Example:

Stop word: with

Original Product Name: Men's Long Mesh Short With Pockets

Product name with Stop Analyzer implemented: men s long mesh short pockets

White Space Analyzer


Description:

  • Search terms divided at whitespace

  • No characters are removed

  • No characters are converted to lowercase

  • No stop words are removed

  • No stemming applied

Best Used For:

Searching by exactly what is entered by user.  This could be useful on a field that may be queried with terms that are both proper names and common nouns such as: polish vs. Polish, bill vs. Bill, case vs. Case.

Language Analyzer

Description:

  • Handling stemming for various languages to return relevant and correct results out of the box

  • Handling searches with special characters/accents that various languages offer (e.g.: German searches that include ä ö ü ß)

  • Supporting 20 languages for which analyzer is available through Hawksearch dashboard

 

Best Used For:

  • Needing to display information on your website on many languages other than English

  • Return relevant results in various languages

  • If you currently do not use our language analyzer and your search relevancy is fine, no changes are needed. However, if there are some cases that you believe could improve/be solved by applying Language analyzer, we’d recommend trying it out on Dev and testing out results. Also, if you start using Language analyzer, we strongly recommend to perform a search tuning exercise as this analyzer directly impacts relevancy. Please reach out to our Support or Client Success teams if you have any questions.