Standard Data Feed Requirements for Datafeed-based indexing
This section applies to Hawksearch v2.0L-v4.0L. If you are using Hawksearch v4.0 without data feeds, please reference the documentation under https://bridgeline.atlassian.net/wiki/spaces/HSKB/pages/3462489656
Overview
The Hawksearch service enables online retailers and publishers the ability to drive a rich, compelling user experience. This experience drives visitors to the products and information that they are seeking. One main step of integration is providing a standard data feed set that the Hawksearch solution expects.
Data Feed Requirements
The Hawksearch service powers search, navigation, and merchandising aspects of a web- based application. The Hawksearch solution does not index your entire database, but it requires necessary data to drive an exceptional search experience. For example, a typical online retailer would need their product catalog indexed. Some data values that would be included in a data feed are product title, description, price, and manufacturer name.
Typically, the following files are included in a set of feeds sent to Hawksearch:
items.txt
attributes.txt
content.txt
hierarchy.txt
timestamp.txt
Occasionally, there are additional feeds that are required. If you have the need for an additional feed than is described in this document, please discuss with your Hawksearch representative. Some of the common additional feeds are:
3rd party rating feed
Custom sorting feed (for facet values)
Custom pricing feed
Sample Data Feeds
Download this zip file of sample data files.
Hawksearch Standard Data Feed Format
Hawksearch has a Standard Data Feed Format. This format is meant to be comprehensive, flexible, and easy for all clients to create. The standard data format is designed to make it easy to add any number of attribute data for the items easily without increasing the file size too much. If you require additional columns or would like to remove a required column, please consult with the Hawksearch representative before making these changes.
File Format Requirements
Item Data Feed
File Name: items.txt
The Items Feed is applicable to e-commerce sites. The file consists of records that describe each product. Each record is represented by a unique ID and other values that support this unique record. The additional columns contained in this data feed file have a one-to-one relationship with the unique ID.
The unique IDs cannot be duplicated in the file. A unique ID will never have two product titles or two retail prices.
If your site contains configurable products and has a single parent product and multiple children product that associate to the parent product, you can specify these as separate line items on the item data feed. For any information that is common to both the parent and child (example: description), you can repeat that description information in both columns for the parent and child items. To specify the relationship between the parent and child item please specify the unique_id of the parent item in the group_id column value line item for the child(ren).
Please reference the items.txt sample file that was provided with the data feed guidelines for an example. In that sample item ABC123 is a parent and item with sku ABC12345 is a child item and references the id of the parent in the group_id column to specify the relationship. Group_id will be used to roll up items.
The creation of this data feed file may consist of table joins on the client’s data layer, but Hawksearch expects one file. For each row, values that don’t exist (e.g. sale price) can be left blank. If additional data values are required, they should be added to the attributes.txt file. The items.txt is the filename to use for the Item Data Feed. Also, the column headers must match the column names listed below. The column names must be lower-cased.
This is a screenshot example of items.txt.
Content Data Feed
File Name: content.txt
How-to articles and non-product content can be indexed as well. Similar to the Item Data Feed, each record is represented by a unique ID and other values that support this unique record. The additional columns contained in this data feed file have a one-to-one relationship with the unique ID. Across both the items.txt file and the content.txt file the unique_id needs to be unique. Unique id should also not change, as this would affect tracking.
The creation of this data feed file may consist of table joins on the client’s data layer, but Hawksearch expects one file. For each row, values that don’t exist (e.g. image) can be left blank.
Also, the column headers must match the column names listed below. The column names are case-sensitive.
This is a screenshot example of the content.txt file:
Attributes Data Feed
File Name: attributes.txt
The Attributes Data Feed file consists of records that relate to unique IDs. There may be multiple records related to a unique ID. Each record consists of a unique ID, an attribute key name, and an attribute value.
For example, ten rows can exist in the Attributes Data Feed that relate to one unique ID. These ten rows describe that the unique ID is in five different product categories, has three different colors, is for a woman, and is a clearance item.
The creation of this data feed file may consist of table joins on the client’s data layer. Hawksearch will be expecting one file, attributes.txt, to include all related attributes to the unique ID. To add additional attributes in the future, additional records would be added to attributes.txt.
This is a screenshot example of attributes.txt.
Hierarchy Data Feed
File Name: hierarchy.txt
Hawksearch supports multi-level hierarchies. For rapid deployment, we require the hierarchy.txt file to represent any hierarchical attributes that your data supports. Usually this is a category hierarchy, but other hierarchical attributes can also be defined. It is a straightforward way to represent the hierarchy in a flat file format and can support multi-level hierarchies. Unique IDs would map to these hierarchies through the Attributes Data Feed (attributes.txt). As with all data feeds, any customization to this feed will involve the Hawksearch Professional Services Team. Multi-dimensional hierarchies can be accommodated with customizations. An example of this is a category that has two parent_hierarchy_id values to map. If your data requires this, please discuss with your Hawksearch representative. It may require additional scoping for your project.
This is a screenshot example of hierarchy.txt. This example shows two properties that have a hierarchy structure. These would both be used to create two nested facets. If you have only one hierarchy property to define the attribute at the top (i.e. Category) will always have parent_hierarchy_id of 0.
Example: What is Parent Hierarchy Id?
Timestamp / Control File
File Name: timestamp.txt
To ensure your data feeds are ready for processing, we recommend adding a timestamp/control file to the set of files that is generated by your site process. This ensures that proper checks will be in place to validate the files before they are indexed.
The timestamp file contains the following details:
The time that your process finished generating the data feeds in UTC
Whether the dataset is a full or partial feed
The name of the file followed by the count of records for each of the files
This is a screenshot example of timestamp.txt.
Partial Updates
Rebuilding Index
HawkSearch’s search index can be rebuilt manually or through API. To build from the dashboard, login to your HawkSearch dashboard engine and click on the Rebuild Index button under the Workbench header.
The REST API can be used to automate or trigger an on-demand index rebuild process in Hawksearch. You will need an API key to use for authenticating with the API. Please contact your Hawksearch Representative to request an API Key.
The API is the preferred method for triggering the indexing process, however, if you need this to be a scheduled process on the Hawksearch end, please discuss with your Hawksearch Representative.
URL for the REST API to rebuild index: https://bridgeline.atlassian.net/wiki/spaces/HSKB/pages/3462491116
Feed Delivery Method
Custom Sort Feed (Optional)
This section is to support the Custom sort option for facet values that can be selected in the Workbench.
Facet Value Sorting
When one of the built-in sorting options do not meet a client’s requirement, a Custom sort order can be defined. To implement a custom sort, a Custom Sort Feed must be used.
Some example facets that might use this:
Size
S, M, L, XL
0.5, 1⁄2, 2/3, 0.75
1”, 6”, 1’
28 Regular, 28 Long, 30 Regular, 30 Long o 14.5x32/33,14.5x34/35,15x34/35
Days of the week
Sun, Mon, Tue, Wed, Thu, Fri, Sat
Seasons/Holidays
Spring, Summer, Fall, Winter
New Year, Mardi Gras, Easter, Mother’s Day, Graduation, Father’s Day
Anniversaries/Birthdays
First, Second, Third, Fourth o Tenth,Twentieth,Thirtieth
Color/Color Families
Red, Orange, Yellow, Green, Blue, Purple, Brown, Black o Light, Medium, Dark
Silver, Bronze, Black, Gold, Crystal
Hawk Search Custom Sort Feed Format
This format is designed to make it easy to add sort values easily. If certain columns do not apply to your data or you need to add additional columns to existing files those changes can be incorporated. However please consult with the Hawksearch representative before making these changes.
File Format Requirements
To support a successful data import, we require the following format requirements:
|
|
---|---|
Encoding | UTF-8 |
Column Delimiter | Tab |
Column Headers | Required; Case Sensitive |
Row Delimiter | Unix Format (\n as a row delimiter) |
File Name | Lowercase (e.g. custom_facet_sort.txt) |
Data Quality | The data on the file should follow strict CSV standards. For standard CSV format please reference the link below: |
Custom Facet Value Sort Feed
The Custom Sort Feed file consists of records that have of a facet_name, a facet_value and a sort order value. The facet_name will be the field name, as defined in the Hawksearch field section of the Workbench, that is used to build a facet that need its values sorted in a way that cannot be accomplished using the built-in Hawk functionality. There will be multiple records related to a facet_name.
For example, if a facet can have 15 possible values in the data, there will be 15 rows in the Custom Sort Feed that relate to the one field name that supplies the data to the facet. All 15 rows will have the same facet_name value, but different values in the facet_value and sort columns.
Custom Facet Sort Feed Columns
Column Name | Data Description | Required |
---|---|---|
facet_name | Name of the field that populates the facet (as defined in the Field listing page in the Hawksearch Workbench), all in lowercase and no spaces or symbols. | Yes |
facet_value | Facet Value | Yes |
sort | Sort order of value (e.g. 5, 10, 15, 20 etc) | Yes |
Please note that:
Make sure all possible field values are included for any field that is in the file. If a field value is not included, it will appear at the top of the list of displayed values in the facet.
Make sure all sort values are unique within each field.
It is recommended that you space your sort values by 5 or more to allow for future additions without rework.
This is a screenshot example of custom_facet_sort.txt:
Add Record to Timestamp/Control File
FILE NAME: timestamp.txt
To ensure your data feeds are ready for processing, we recommend adding a timestamp/control file to the set of files that is generated by your site process. If this is part of your feeds, a Line Count should be added for the Custom Sort Feed. The system will be confirming that the row counts provided in the timestamp file match the counts on the actual files. If the counts match, the system will proceed as normal. If the counts do not match, the system will throw an error. This is to eliminate indexing files that were partially downloaded or were corrupted during downloads.
Other Questions?
If you have any 3rd party feeds that you would like integrated into Hawksearch, please contact your Hawksearch Representative.
For questions about the data feeds that Hawksearch can accept, please contact your Hawksearch Representative.