1 Types of Analyzers
- 1.1 Hawk Analyzer
- 1.2 Snowball Analyzer
- 1.3 Standard Analyzer
- 1.4 Simple Analyzer
- 1.5 Stop Analyzer
- 1.6 White Space Analyzer

Facet configuration can impact index size and payload size, which has an impact on indexing time and engine performance. The following guidelines are standard best practices, your configurations may vary depending on your data, business requirements and use cases.

Types of Analyzers

This section outlines the available analyzers. There are five analyzer options that are available for the business user to select. The selection of an analyzer will override the default indexing configuration “Hawk Search Analyzer”. Each analyzer is explained below.

Hawk Analyzer

The Hawksearch Analyzer has the same properties as the Snowball Analyzer (see below) but will take into account for synonyms configured in the Hawksearch workbench. The Hawk Analyzer is the only analyze that will take into account the synonyms. This is the default analyzer applied when the field is set to be queried.

Snowball Analyzer

The stemming step converts a word into its stem. For example, if the word “climbing” is entered, the analyzer would convert the word to “climb” and use that to search a field with the Snowball Analyzer set on it. It would return items that contained climber, climbing, and climb. It is possible that information can be lost describing the original form of your text. For example, the terms universe, university and universal all stem to the same root, “univers” and would all return the same results. This is likely not be the desired result.

Description:

Stemming is applied
Stop words are removed
Colons, #, %, $, parentheses, and slashes are removed
Removes underscores, hyphens, @, and & symbols unless they are part of words or numbers
Remove apostrophe if it is (a) at the beginning of a word, (b) at the end of a word, or (c) followed by the letter s
Separates numbers from text when numbers are at the beginning of a word
Letter characters are converted to lowercase

Best Used For:

Fields that have content consisting of multiple versions of a word.

Standard Analyzer

The Standard Analyzer accounts for the following:

Description:

Separates text “smartly”, accounting for the following lexical types:
- Alphanumerics
- Acronyms
- Company names
- Email addresses
- Computer hostnames
- Numbers
- Words with an interior apostrophe
- Serial numbers
- IP addresses
- Chinese and Japanese characters
Stop words are removed
Letter characters are converted to lowercase
No stemming applied

Best Used For:

Searching English words such as units of measure as well as fields with the values listed above.

Simple Analyzer

The Simple Analyzer accounts for the following:

Description:

Separates text at non-letter characters and removes all non-letter characters
Letter characters are converted to lowercase
No stop words are removed
No stemming applied

Best Used For:

Fields that only have alphabetical characters and don’t need the advanced interpretation of the Standard Analyzer. For example, consider a field that stores famous 1-line quotes that will be queried. If a user searches “to be, or not to be” removing the standard stop words would leave nothing to search on. Additionally, if stemming were applied to this field, the results would not be as relevant as they would be without stemming. In a case like this, the Simple Analyzer makes a good choice.

Stop Analyzer

The Stop Analyzer accounts for the following:

Description:

Stop words are removed
Divides text at non-letter characters and removes all non-letter characters
Letter characters are converted to lowercase
No stemming applied

Best Used For:

When a simple, text-only analyzer is needed that also removes stop words. This should be used on fields that are intended to only have values made up of alphabetic characters.

Example:

Stop word: with

Original Product Name: Men's Long Mesh Short With Pockets

Product name with Stop Analyzer implemented: men s long mesh short pockets

White Space Analyzer

The White Space Analyzer accounts for the following:

Description:

Search terms divided at whitespace
No characters are removed
No characters are converted to lowercase
No stop words are removed
No stemming applied

Best Used For:

Searching by exactly what is entered by user. This could be useful on a field that may be queried with terms that are both proper names and common nouns such as: polish vs.

Polish, bill vs. Bill, case vs. Case.

Hawksearch Knowledge Base

Field Configuration: Best Practices - Analyzers

Types of Analyzers

Hawk Analyzer

Snowball Analyzer

Standard Analyzer

Simple Analyzer

Stop Analyzer

White Space Analyzer

Related content