Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Anchor
_Toc408320352
_Toc408320352
Overview

The Hawk Search Hawksearch service enables online retailers and publishers the ability to drive a rich, compelling user experience. This experience drives visitors to the products and information that they seeking. One main step of integration is providing a standard data feed set that the Hawk Search Hawksearch solution expects.

Anchor
_Toc408320353
_Toc408320353
Data Feed Requirements

The Hawk Search Hawksearch service powers search, navigation, and merchandising aspects of a web-based application. The Hawk Search Hawksearch solution does not index your entire database, but it requires necessary data to drive an exceptional shopping experience. For example, a typical online retailer would need their product catalog indexed. Some data values that would be included in a data feed are product title, description, price, and manufacturer name.

Anchor
_Toc408320354
_Toc408320354
Formatting Options

Hawk Search Hawksearch understands that clients already have existing data feeds that are being sent to third-party systems such as marketplaces. Hawk Search Hawksearch is designed to be agnostic – we can handle all types of data feed format. 
Here are a few data feed formats that we have already integrated into Hawk SearchHawksearch:

Image Modified

Image Modified

Image Modified


Image Modified

Image Modified

Image Modified

Image Modified

Image Modified

Image Modified

… and more!

Anchor
Step1166
Step1166

Anchor
_Toc408320355
_Toc408320355

...

Hawksearch Standard Data Feed Format

Along with handling third-party data feed formats, Hawk Search Hawksearch has a Standard Data Feed Format. This format is meant to be comprehensive, flexible, and easy for all clients to create.
The standard data format is designed to make it easy to add any number of attribute data for the items easily without increasing the file size too much. If certain columns do not apply to your data or you need to add additional columns to existing files those changes can be incorporated. However please consult with the Hawk Search Hawksearch representative before making these changes.

...

Anchor
_Toc408320357
_Toc408320357
Flat-File Format Properties

Anchor
Step1168
Step1168
Encoding

UTF-8

Column Delimiter

Tab, Comma

Column Headers

Required; Case Sensitive

Row Delimiter

Unix Format (\n as a row delimiter)

File Name

Lowercase (e.g. items.txt, attributes.txt, hierarchy.txt)

Data Quality

The data on the file should follow strict CSV standards. For standard CSV format please reference the link below:

http://www.ietf.org/rfc/rfc4180.txt

Anchor
_Toc408320358
_Toc408320358
Main Data Feed

FILE NAME: items.txt
The Main Data Feed file consists of records that describe each product or content. Each record is represented by a unique ID and other values that support this unique record. The additional columns contained in this data feed file have a one-to-one relationship with the unique ID. 
For example, a data feed file contains product catalog records. The unique ID is the product SKU and supporting values with one-to-one relationships are product title, product description, retail price, and sale price. A unique ID will never have two product titles or two retail prices. 
If your site contains configurable products and has a single parent product and multiple children product that associate to the parent product, you can specify these as separate line items on the main data feed. For any information that is common to both the parent and child (example: description), you can repeat that description information in both columns for the parent and child items. To specify the relationship between the parent and child item please specify the id of the parent item in the group_id column value line item for the child 
Please reference the items.txt sample file that was provided with the data feed guidelines for an example. In that sample item ABC123 is a parent and item with sku ABC12345 is a child item and references the id of the parent in the group_id column to specify the relationship. 
The creation of this data feed file may consist of table joins on the client's data layer, but Hawk Search Hawksearch expects one file. For each row, values that don't exist (e.g. sale price) can be left blank. If additional data values are required, column names can be added. Contact the Hawk Search Hawksearch Professional Services for any custom modifications. The items.txt is the filename to use for the Main Data Feed. Also, the column headers must match the column names listed below. The column names are case-sensitive.

Anchor
_Toc408320359
_Toc408320359
Pre-Configured Data Columns

Column Name

Data Description

Required

unique_id

Unique Alphanumeric ID. Must not be duplicated.

Yes

Name

Item Title

Yes

url_detail

URL of Record Detail Page

Yes

Image

URL of thumbnail image

Yes

price_retail

Floating Point Value - Retail Price

Yes

price_sale

Floating Point Value – Sale Price

Yes

price_special

Floating Point Value – Special Price if offered

 


group_id

Rollup Key. If used, this field must be filled in for all items.

 


description_short

Searchable Short Description

 


description_long

Long Description

 


Sku

Alphanumeric Manufacturer SKU / Model#

 


sort_default

Default sort if available on the site end for items based on an integer rank calculated on the site
NOTE: For search requests this is used as a secondary sort for all items that have the same score. The score is calculated based on the keyword that the user used and the searchable data associated with the item

 


sort_rating

Floating Point Value of Avg Rating

 


is_free_shipping

Binary Value if Free Shipping (0 or 1)

 


is_new

Binary Value if New (0 or 1)

 


is_on_sale

Binary Value if On Sale (0 or 1)

 


keyword_tags

List of Searchable Keywords Separated by Commas (e.g. horror,suspense,drama)

 


metric_days_added

Integer Value of Days Record Added

 


metric_inventory

Integer Value of Total Inventory

 


metric_pct_bounceback

Floating point Value of Bounce Back %

 


metric_pct_conversion

Floating Point Value of Conversion Rate

 


metric_pct_on_sale

Floating Point Value of On Sale Percentage

 


metric_profit_dollars

Floating Point Value of Profit Dollars

 


metric_profit_margin

Floating Point Value of Profit Margin

 


metric_sales_velocity

Floating Point Value of Sales Velocity Score

 


metric_total_details_views

Integer Value of Total Detail Page Views

 


metric_total_units_sold

Integer Value of Total Units Sold

 


item_operation

When using partial updates this column will need to be filled out on partial files. "D" indicates that the item will need to be deleted. "A" indicates item was added/updated. For full files you can leave this column empty since it will be ignored.

 



Please note that:

  • All columns do not need to exist in items.txt
  • Make sure all required column names are present
  • If you wish to add columns to the file please discuss with the Hawk Search Hawksearch Professional Services Team.

...

FILE NAME: attributes.txt 
The Attributes Data Feed file consists of records that relate to unique IDs. There may be multiple records related to a unique ID. Each record consists of a unique ID, an attribute key name, and an attribute value.
For example, ten rows can exist in the Attributes Data Feed that relate to one unique ID. These ten rows describe that the unique ID is in five different product categories, has three different colors, is for a woman, and is a clearance item. 
The creation of this data feed file may consist of table joins on the client's data layer. Hawk Search Hawksearch will be expecting one file, attributes.txt, to include all related attributes to the unique ID. To add additional attributes in the future, additional records would be added to attributes.txt.

Anchor
_Toc408320361
_Toc408320361
Pre-Configured Data Columns

Column Name

Data Description

Required

unique_id

Unique Alphanumeric ID. Must not be duplicated.

Yes

Key

The name of the attribute

Yes

Value

The value of the attribute

Yes


Please Note:

  • Attribute Keys should be properly-cased for display and consistent(e.g. Color)
  • Attribute Values should be standardized across each Attribute Key (e.g. Blue vs Navy or Indigo) and correctly cased as it should appear on the filters. For example if color value is blue, then you should stick to the correct casing for all items that have the color value and the casing should be consistent to what you want to have appear on facet values example (Blue)
  • Category ID values should consist of the lowest level categories. Hawk Search Hawksearch will understand the higher level categories associated with an item.
  • If both parents and children are specified on the items file please link all child attributes to the child unique_id and the parent attributes to the parent item unique_id

...

FILE NAME: hierarchy.txt 
Hawk SearchHawksearch's Agile Navigation supports multi-level hierarchies. For rapid deployment, we require the hierarchy.txt file to represent the category hierarchy. It is a straightforward way to represent the hierarchy in a flat file format and can support multi-level hierarchies. Unique IDs would map to these hierarchies through the Attributes Data Feed (attributes.txt). As with all data feeds, any customization to this feed will involve the Hawk Search Hawksearch Professional Services Team.

Anchor
_Toc408320363
_Toc408320363
Pre-Configured Data Columns

Column Name

Data Description

Required

category_id

Unique Alphanumeric ID. No duplicate values are accepted.

Yes

category_name

The name of the Category

Yes

parent_category_id

The Category ID of the Parent Category. For the top level Categories, use 0 (zero) as the parent_category_id

Yes

is_active

If you wish to send over all categories to

Hawk Search

Hawksearch including disabled categories

 


sort_order

The sort order value that should be used while displaying this in the filter if it is available

No


This is a screenshot example of hierarchy.txt

...

FILE NAME: content.txt 
How-to articles and non-product content can be indexed as well. Similar to the Main Data Feed, each record is represented by a unique ID and other values that support this unique record. The additional columns contained in this data feed file have a one-to-one relationship with the unique ID. 
The creation of this data feed file may consist of table joins on the client's data layer, but Hawk Search Hawksearch expects one file. For each row, values that don't exist (e.g. sale price) can be left blank. If additional data values are required based on Business Requirements, column names can be added. Contact the Hawk Search Hawksearch Professional Services for any custom modifications.
Also, the column headers must match the column names listed below. The column names are case-sensitive.

Column Name

Data Description

Required

unique_id

Unique Alphanumeric ID. Must not be duplicated.

Yes

Name

Item Title

Yes

url_detail

URL of Record Detail Page

Yes

Image

URL of thumbnail image

 


description_short

Item Description – Since this is non-product content, include the full text of the item. Strip out all tab, carriage return, and line feed characters.

 


content_operation

When using partial updates this column will need to be filled out on partial files. "D" indicates that the item will need to be deleted. "A" indicates item was added/updated. For full files you can leave this column empty since it will be ignored.

Yes for partial updates


Please note that:
Attributes can be associated with these items if the unique_id values match. 
This is a screenshot example of the content.txt file:

If content is not in a structured format, contact the Hawk Search Hawksearch Professional Services Team to use other approaches to crawl your content sources. One approach is using our Content Aggregation tool, Blosm, to transform unstructured data to structured content. 

Anchor
_Toc408320365
_Toc408320365
Timestamp/Control File
FILE NAME: timestamp.txt 
To ensure your data feeds are ready for processing, we recommend adding a timestamp/control file to the set of files that is generated by your site process. This ensures that proper checks will be in place to validate the files before they are indexed.
The timestamp file contains the following details:

  • The time that your process finished generating the data feeds in UTC
  • Whether the dataset is a full or partial feed
  • The name of the file followed by the count of lines for each of the files

Every time the Hawk Search Hawksearch process runs to index the new data, we use the data provided in the timestamp file to run a series of checks.

...

The timestamp file is downloaded from the web accessible URL or the FTP server. This will enable the Hawk Search Hawksearch process to verify that the site is generating files at an agreed upon frequency.

...

The first check consist of checking the date and time of the files provided. If the Hawk Search Hawksearch process is expecting the site to generate a file every 3 hours. The process checks the timestamp file and it is from 6 hours prior, the Hawk Search Hawksearch will not download or reprocess the files. The process will throw an error to notify the team that files are out of date.

...

The indexing process on the Hawk Search Hawksearch end will be scheduled to run in accordance with the feed generation schedule on the site end. 
There is also the option of a REST API available at the URL below. This URL can be used for re-building the index for the site once the feeds generation has completed. You will need an API key to use for authenticating with the API. Please contact your Hawk Search Hawksearch Representative to request an API Key.

...

http://api.hawksearch.info/api/v3 
Please reference the Index method for additional details.

Anchor
_Toc408320374
_Toc408320374
Feed Delivery Method
Hawk Search Hawksearch can accept the feeds using the delivery methods below.

...

Files can be hosted at an FTP location. You can host the files on your own FTP server and provide credentials to the Hawk Search Hawksearch team to access the account. Hawk Search Hawksearch supports both FTP and SFTP protocol and can download files from either server. If you do not have your own FTP server and need an account, ask your Hawk Search Hawksearch Representative to provide one.

...

Files can also be hosted on a directory on your server. Please provide the Hawk Search Hawksearch Team the path to the files so these can be appropriately downloaded from the server. If you have the path protected please provide the Hawk Search Hawksearch Team credentials for the same.

...

For questions about the data feeds that Hawk Search Hawksearch can accept, please contact: sales@hawksearch.com