Standard Data Feed Requirements for Datafeed-based indexing

This section applies to Hawksearch v2.0L-v4.0L. If you are using Hawksearch v4.0 without data feeds, please reference the documentation under https://bridgeline.atlassian.net/wiki/spaces/HSKB/pages/3462489656

Overview

The Hawksearch service enables online retailers and publishers the ability to drive a rich, compelling user experience. This experience drives visitors to the products and information that they are seeking. One main step of integration is providing a standard data feed set that the Hawksearch solution expects.


Data Feed Requirements

The Hawksearch service powers search, navigation, and merchandising aspects of a web- based application. The Hawksearch solution does not index your entire database, but it requires necessary data to drive an exceptional search experience. For example, a typical online retailer would need their product catalog indexed. Some data values that would be included in a data feed are product title, description, price, and manufacturer name.

 Typically, the following files are included in a set of feeds sent to Hawksearch:

  • items.txt

  • attributes.txt

  • content.txt

  • hierarchy.txt

  • timestamp.txt

 Occasionally, there are additional feeds that are required.  If you have the need for an additional feed than is described in this document, please discuss with your Hawksearch representative.  Some of the common additional feeds are:

  • 3rd party rating feed

  • Custom sorting feed (for facet values)

  • Custom pricing feed


Sample Data Feeds

Download this zip file of sample data files.


Hawksearch Standard Data Feed Format

Hawksearch has a Standard Data Feed Format. This format is meant to be comprehensive, flexible, and easy for all clients to create.  The standard data format is designed to make it easy to add any number of attribute data for the items easily without increasing the file size too much. If you require additional columns or would like to remove a required column, please consult with the Hawksearch representative before making these changes.


File Format Requirements

Encoding

UTF-8

Column Delimiter

Tab, Semicolon, Comma

Column Headers

Required; Must be lower-cased

Row Delimiter

Unix Format (in a row delimiter)

File Name

Lowercase and named as: items.txt, content.txt attributes.txt, hierarchy.txt, timestamp.txt

Data Quality

The data on the file should follow strict CSV standards. For standard CSV format please reference the link below:

 

http://www.ietf.org/rfc/rfc4180.txt

In any cases where a Field value contains a double quote, the entire field value must be enclosed in double quote AND each double quote within the value must be escaped by a preceding double quote.  This indicates that the interior double quote is not the end of a data value.

 Example 1

The value for a field is: Special rate “1.79”

The value of the field would be: “Special rate ““1.79”””

 Example 2

The value for a field is: Blue Rug – 36” x 48”

The value of the field would be: “Blue Rug – 36”” x 48”””

In any cases where a Field value contains a line return or a carriage return, the entire field value needs to be enclosed in a double quote.  Without this, the import process will interpret the carriage return as the beginning of a new item.


Item Data Feed

File Name: items.txt

The Items Feed is applicable to e-commerce sites.  The file consists of records that describe each product. Each record is represented by a unique ID and other values that support this unique record. The additional columns contained in this data feed file have a one-to-one relationship with the unique ID.

The unique IDs cannot be duplicated in the file.  A unique ID will never have two product titles or two retail prices.

If your site contains configurable products and has a single parent product and multiple children product that associate to the parent product, you can specify these as separate line items on the item data feed. For any information that is common to both the parent and child (example: description), you can repeat that description information in both columns for the parent and child items. To specify the relationship between the parent and child item please specify the unique_id of the parent item in the group_id column value line item for the child(ren).

Please reference the items.txt sample file that was provided with the data feed guidelines for an example. In that sample item ABC123 is a parent and item with sku ABC12345 is a child item and references the id of the parent in the group_id column to specify the relationship. Group_id will be used to roll up items.

The creation of this data feed file may consist of table joins on the client’s data layer, but Hawksearch expects one file. For each row, values that don’t exist (e.g. sale price) can be left blank. If additional data values are required, they should be added to the attributes.txt file. The items.txt is the filename to use for the Item Data Feed. Also, the column headers must match the column names listed below. The column names must be lower-cased.

Column Name

Data Description

Required

unique_id

Unique alphanumeric ID. Must not be duplicated.

Yes

name

Title of item

Yes

url_detail

URL of Record Detail Page

Yes

image

URL of Thumbnail Image

Yes

price_retail

Floating Point Value – Retail Price

Yes

price_sale

Floating Point Value – Sale Price

Yes

price_sort

Floating Point Value – Price to be used for sorting purposes

 

group_id

Rollup key. If used, this field must be filled in for all items.

 

description_short

Searchable Short Description

 

description_long

Long Description

 

sku

Alphanumeric Manufacturer SKU / Model number.  Only needed if different than unqiue_id value.

 

sort_default

Default sort if available on the site end for items based on an integer ran calculated on the site NOTE: For search requests this is used as a secondary sort for all items that have the same score. The score is calculated based on the keyword that the user used and the searchable data associated with the item.

 

Item_operation

When using partial updates this column will need to be filled out on partial files. “D” indicated that the item will need to be deleted. “A” indicated item to be added. “U” indicated an update to an item. For full files, you can leave this column empty since it will be ignored.

 

Please note that:

  • The columns not marked as required do not need to exist in items.txt

  • Make sure all required column names are present

  • If you wish to add columns to the file please discuss with the Hawksearch Professional Services Team

This is a screenshot example of items.txt.


Content Data Feed

File Name: content.txt

How-to articles and non-product content can be indexed as well. Similar to the Item Data Feed, each record is represented by a unique ID and other values that support this unique record. The additional columns contained in this data feed file have a one-to-one relationship with the unique ID.  Across both the items.txt file and the content.txt file the unique_id needs to be unique.  Unique id should also not change, as this would affect tracking. 

The creation of this data feed file may consist of table joins on the client’s data layer, but Hawksearch expects one file. For each row, values that don’t exist (e.g. image) can be left blank.

Also, the column headers must match the column names listed below. The column names are case-sensitive.

Column Name

Data Description

Required

unique_id

Unique alphanumeric ID. Must not be duplicated.

Yes

name

Title

Yes

url_detail

URL of Record Detail Page

Yes

image

URL of Thumbnail Image

 

description_short

Description – Since this is non-product content, include the full text of the content. Strip out all tab, carriage return, and line feed characters

 

content_operation

When using partial updates this column will need to be filled out on partial files. “D” indicated that the content will need to be deleted. “A” indicates content to be added. “U” indicates an update to a piece of content. For full files you can leave this column empty since it will be ignored.

Yes for Partial Updates

Please note that:

  • Attributes can be associated with content if the unique_id values match.

  • Attributes can be associated using the Attributes File (as long as unique_id values aren’t duplicated), or by creating a new Content Attributes File.  Please consult with your Hawksearch representative to determine the best option for your implementation.

  • When adding a category hierarchy that is specific to content, it must be added to the Hierarchy feed with a new root.  (That root will have a parent_hierarchy_id of 0.)

This is a screenshot example of the content.txt file:


Attributes Data Feed

File Name: attributes.txt

The Attributes Data Feed file consists of records that relate to unique IDs. There may be multiple records related to a unique ID. Each record consists of a unique ID, an attribute key name, and an attribute value.

For example, ten rows can exist in the Attributes Data Feed that relate to one unique ID. These ten rows describe that the unique ID is in five different product categories, has three different colors, is for a woman, and is a clearance item.

The creation of this data feed file may consist of table joins on the client’s data layer. Hawksearch will be expecting one file, attributes.txt, to include all related attributes to the unique ID. To add additional attributes in the future, additional records would be added to attributes.txt.

This is a screenshot example of attributes.txt.


Hierarchy Data Feed

File Name: hierarchy.txt

Hawksearch supports multi-level hierarchies. For rapid deployment, we require the hierarchy.txt file to represent any hierarchical attributes that your data supports.  Usually this is a category hierarchy, but other hierarchical attributes can also be defined. It is a straightforward way to represent the hierarchy in a flat file format and can support multi-level hierarchies. Unique IDs would map to these hierarchies through the Attributes Data Feed (attributes.txt). As with all data feeds, any customization to this feed will involve the Hawksearch Professional Services Team.  Multi-dimensional hierarchies can be accommodated with customizations.  An example of this is a category that has two parent_hierarchy_id values to map.  If your data requires this, please discuss with your Hawksearch representative.  It may require additional scoping for your project.

This is a screenshot example of hierarchy.txt.  This example shows two properties that have a hierarchy structure.  These would both be used to create two nested facets.  If you have only one hierarchy property to define the attribute at the top (i.e. Category) will always have parent_hierarchy_id of 0.

Example: What is Parent Hierarchy Id?


Timestamp / Control File

File Name: timestamp.txt

To ensure your data feeds are ready for processing, we recommend adding a timestamp/control file to the set of files that is generated by your site process. This ensures that proper checks will be in place to validate the files before they are indexed.

The timestamp file contains the following details: 

  • The time that your process finished generating the data feeds in UTC

  • Whether the dataset is a full or partial feed

  • The name of the file followed by the count of records for each of the files

This is a screenshot example of timestamp.txt.


Partial Updates

 


Rebuilding Index

HawkSearch’s search index can be rebuilt manually or through API. To build from the dashboard, login to your HawkSearch dashboard engine and click on the Rebuild Index button under the Workbench header.

The REST API can be used to automate or trigger an on-demand index rebuild process in Hawksearch.  You will need an API key to use for authenticating with the API. Please contact your Hawksearch Representative to request an API Key.

The API is the preferred method for triggering the indexing process, however, if you need this to be a scheduled process on the Hawksearch end, please discuss with your Hawksearch Representative.

URL for the REST API to rebuild index: https://bridgeline.atlassian.net/wiki/spaces/HSKB/pages/3462491116


Feed Delivery Method


Custom Sort Feed (Optional)

This section is to support the Custom sort option for facet values that can be selected in the Workbench.

Facet Value Sorting

When one of the built-in sorting options do not meet a client’s requirement, a Custom sort order can be defined. To implement a custom sort, a Custom Sort Feed must be used.

Some example facets that might use this:

  • Size

    • S, M, L, XL

    • 0.5, 1⁄2, 2/3, 0.75

    • 1”, 6”, 1’

    • 28 Regular, 28 Long, 30 Regular, 30 Long o 14.5x32/33,14.5x34/35,15x34/35

  • Days of the week

    • Sun, Mon, Tue, Wed, Thu, Fri, Sat

  • Seasons/Holidays

    • Spring, Summer, Fall, Winter

    • New Year, Mardi Gras, Easter, Mother’s Day, Graduation, Father’s Day

  • Anniversaries/Birthdays

    • First, Second, Third, Fourth o Tenth,Twentieth,Thirtieth

  • Color/Color Families

    • Red, Orange, Yellow, Green, Blue, Purple, Brown, Black o Light, Medium, Dark

    • Silver, Bronze, Black, Gold, Crystal

Hawk Search Custom Sort Feed Format

This format is designed to make it easy to add sort values easily. If certain columns do not apply to your data or you need to add additional columns to existing files those changes can be incorporated. However please consult with the Hawksearch representative before making these changes.

File Format Requirements

To support a successful data import, we require the following format requirements:

 

 

 

 

Encoding

UTF-8

Column Delimiter

Tab

Column Headers

Required; Case Sensitive

Row Delimiter

Unix Format (\n as a row delimiter)

File Name

Lowercase (e.g. custom_facet_sort.txt)

Data Quality

The data on the file should follow strict CSV standards. For standard CSV format please reference the link below:

http://www.ietf.org/rfc/rfc4180.txt

Custom Facet Value Sort Feed

The Custom Sort Feed file consists of records that have of a facet_name, a facet_value and a sort order value. The facet_name will be the field name, as defined in the Hawksearch field section of the Workbench, that is used to build a facet that need its values sorted in a way that cannot be accomplished using the built-in Hawk functionality. There will be multiple records related to a facet_name.

For example, if a facet can have 15 possible values in the data, there will be 15 rows in the Custom Sort Feed that relate to the one field name that supplies the data to the facet. All 15 rows will have the same facet_name value, but different values in the facet_value and sort columns.

Custom Facet Sort Feed Columns

Column Name

Data Description

Required

Column Name

Data Description

Required

facet_name

Name of the field that populates the facet (as defined in the Field listing page in the Hawksearch Workbench), all in lowercase and no spaces or symbols.

Yes

facet_value

Facet Value

Yes

sort

Sort order of value (e.g. 5, 10, 15, 20 etc)

Yes

Please note that:

  • Make sure all possible field values are included for any field that is in the file. If a field value is not included, it will appear at the top of the list of displayed values in the facet.

  • Make sure all sort values are unique within each field.

  • It is recommended that you space your sort values by 5 or more to allow for future additions without rework.

This is a screenshot example of custom_facet_sort.txt:

Add Record to Timestamp/Control File

FILE NAME: timestamp.txt

To ensure your data feeds are ready for processing, we recommend adding a timestamp/control file to the set of files that is generated by your site process. If this is part of your feeds, a Line Count should be added for the Custom Sort Feed. The system will be confirming that the row counts provided in the timestamp file match the counts on the actual files. If the counts match, the system will proceed as normal. If the counts do not match, the system will throw an error. This is to eliminate indexing files that were partially downloaded or were corrupted during downloads.


Other Questions?

If you have any 3rd party feeds that you would like integrated into Hawksearch, please contact your Hawksearch Representative.

For questions about the data feeds that Hawksearch can accept, please contact your Hawksearch Representative.