Anchor

	Lesson258
	Lesson258

Anchor

	_GoBack
	_GoBack

Anchor

	_Toc392591387
	_Toc392591387

Anchor

	_Toc394912400
	_Toc394912400

Data Feed Requirements
Table of Contents
Overview
Data Feed Requirements
Formatting Options
Hawk Search Standard Data Feed Format
File Format Requirements
Flat-File Format Properties
Main Data Feed
Pre-Configured Data Columns
Attributes Data Feed
Pre-Configured Data Columns
Category Hierarchy
Pre-Configured Data Columns
Content Data Feed
Timestamp/Control File
Download Timestamp File
Timestamp check
Partial or Full Feed
File Name Check
Line Count
Rebuilding Index
Staging Site:
Production Site:
Feed Delivery Method
FTP
Web Accessible URL:
Other Questions?

Anchor
_Toc408320352
_Toc408320352
Overview

The Hawk Search service enables online retailers and publishers the ability to drive a rich, compelling user experience. This experience drives visitors to the products and information that they seeking. One main step of integration is providing a standard data feed set that the Hawk Search solution expects.

Anchor
_Toc408320353
_Toc408320353
Data Feed Requirements

The Hawk Search service powers search, navigation, and merchandising aspects of a web-based application. The Hawk Search solution does not index your entire database, but it requires necessary data to drive an exceptional shopping experience. For example, a typical online retailer would need their product catalog indexed. Some data values that would be included in a data feed are product title, description, price, and manufacturer name.

Anchor
_Toc408320354
_Toc408320354
Formatting Options

Hawk Search understands that clients already have existing data feeds that are being sent to third-party systems such as marketplaces. Hawk Search is designed to be agnostic – we can handle all types of data feed format.
Here are a few data feed formats that we have already integrated into Hawk Search:

...

… and more!

Anchor

	Step1166
	Step1166

Anchor
_Toc408320355
_Toc408320355
Hawk Search Standard Data Feed Format

Along with handling third-party data feed formats, Hawk Search has a Standard Data Feed Format. This format is meant to be comprehensive, flexible, and easy for all clients to create.
The standard data format is designed to make it easy to add any number of attribute data for the items easily without increasing the file size too much. If certain columns do not apply to your data or you need to add additional columns to existing files those changes can be incorporated. However please consult with the Hawk Search representative before making these changes.

Anchor
Step1167
Step1167
Anchor
_Toc408320356
_Toc408320356
File Format Requirements

To support a successful data import, we require the following format requirements:

Anchor
_Toc408320357
_Toc408320357
Flat-File Format Properties

Anchor

	Step1168
	Step1168

Encoding

UTF-8

Column Delimiter

Tab, Comma

Column Headers

Required; Case Sensitive

Row Delimiter

Unix Format (\n as a row delimiter)

File Name

Lowercase (e.g. items.txt, attributes.txt, hierarchy.txt)

Data Quality

The data on the file should follow strict CSV standards. For standard CSV format please reference the link below:

http://www.ietf.org/rfc/rfc4180.txt

|

Anchor
_Toc408320358
_Toc408320358
Main Data Feed

FILE NAME: items.txt
The Main Data Feed file consists of records that describe each product or content. Each record is represented by a unique ID and other values that support this unique record. The additional columns contained in this data feed file have a one-to-one relationship with the unique ID.
For example, a data feed file contains product catalog records. The unique ID is the product SKU and supporting values with one-to-one relationships are product title, product description, retail price, and sale price. A unique ID will never have two product titles or two retail prices.
If your site contains configurable products and has a single parent product and multiple children product that associate to the parent product, you can specify these as separate line items on the main data feed. For any information that is common to both the parent and child (example: description), you can repeat that description information in both columns for the parent and child items. To specify the relationship between the parent and child item please specify the id of the parent item in the group_id column value line item for the child
Please reference the items.txt sample file that was provided with the data feed guidelines for an example. In that sample item ABC123 is a parent and item with sku ABC12345 is a child item and references the id of the parent in the group_id column to specify the relationship.
The creation of this data feed file may consist of table joins on the client's data layer, but Hawk Search expects one file. For each row, values that don't exist (e.g. sale price) can be left blank. If additional data values are required, column names can be added. Contact the Hawk Search Professional Services for any custom modifications. The items.txt is the filename to use for the Main Data Feed. Also, the column headers must match the column names listed below. The column names are case-sensitive.

Anchor
_Toc408320359
_Toc408320359
Pre-Configured Data Columns

Column Name	Data Description	Required
unique_id	Unique Alphanumeric ID. Must not be duplicated.	Yes
Name	Item Title	Yes
url_detail	URL of Record Detail Page	Yes
Image	URL of thumbnail image	Yes
price_retail	Floating Point Value - Retail Price	Yes
price_sale	Floating Point Value – Sale Price	Yes
price_special	Floating Point Value – Special Price if offered
group_id	Rollup Key. If used, this field must be filled in for all items.
description_short	Searchable Short Description
description_long	Long Description
Sku	Alphanumeric Manufacturer SKU / Model#
sort_default	Default sort if available on the site end for items based on an integer rank calculated on the site NOTE: For search requests this is used as a secondary sort for all items that have the same score. The score is calculated based on the keyword that the user used and the searchable data associated with the item
sort_rating	Floating Point Value of Avg Rating
is_free_shipping	Binary Value if Free Shipping (0 or 1)
is_new	Binary Value if New (0 or 1)
is_on_sale	Binary Value if On Sale (0 or 1)
keyword_tags	List of Searchable Keywords Separated by Commas (e.g. horror,suspense,drama)
metric_days_added	Integer Value of Days Record Added
metric_inventory	Integer Value of Total Inventory
metric_pct_bounceback	Floating point Value of Bounce Back %
metric_pct_conversion	Floating Point Value of Conversion Rate
metric_pct_on_sale	Floating Point Value of On Sale Percentage
metric_profit_dollars	Floating Point Value of Profit Dollars
metric_profit_margin	Floating Point Value of Profit Margin
metric_sales_velocity	Floating Point Value of Sales Velocity Score
metric_total_details_views	Integer Value of Total Detail Page Views
metric_total_units_sold	Integer Value of Total Units Sold
item_operation	When using partial updates this column will need to be filled out on partial files. "D" indicates that the item will need to be deleted. "A" indicates item was added/updated. For full files you can leave this column empty since it will be ignored.

...

This is a screenshot example of items.txt.

Anchor
_Toc408320360
_Toc408320360
Attributes Data Feed

FILE NAME: attributes.txt
The Attributes Data Feed file consists of records that relate to unique IDs. There may be multiple records related to a unique ID. Each record consists of a unique ID, an attribute key name, and an attribute value.
For example, ten rows can exist in the Attributes Data Feed that relate to one unique ID. These ten rows describe that the unique ID is in five different product categories, has three different colors, is for a woman, and is a clearance item.
The creation of this data feed file may consist of table joins on the client's data layer. Hawk Search will be expecting one file, attributes.txt, to include all related attributes to the unique ID. To add additional attributes in the future, additional records would be added to attributes.txt.

Anchor
_Toc408320361
_Toc408320361
Pre-Configured Data Columns

Column Name	Data Description	Required
unique_id	Unique Alphanumeric ID. Must not be duplicated.	Yes
Key	The name of the attribute	Yes
Value	The value of the attribute	Yes

...

This is a screenshot example of attributes.txt.

Anchor
_Toc408320362
_Toc408320362
Category Hierarchy

FILE NAME: hierarchy.txt
Hawk Search's Agile Navigation supports multi-level hierarchies. For rapid deployment, we require the hierarchy.txt file to represent the category hierarchy. It is a straightforward way to represent the hierarchy in a flat file format and can support multi-level hierarchies. Unique IDs would map to these hierarchies through the Attributes Data Feed (attributes.txt). As with all data feeds, any customization to this feed will involve the Hawk Search Professional Services Team.

Anchor
_Toc408320363
_Toc408320363
Pre-Configured Data Columns

Column Name	Data Description	Required
category_id	Unique Alphanumeric ID. No duplicate values are accepted.	Yes
category_name	The name of the Category	Yes
parent_category_id	The Category ID of the Parent Category. For the top level Categories, use 0 (zero) as the parent_category_id	Yes
is_active	If you wish to send over all categories to Hawk Search including disabled categories
sort_order	The sort order value that should be used while displaying this in the filter if it is available	No

This is a screenshot example of hierarchy.txt.

Anchor
_Toc408320364
_Toc408320364
Content Data Feed

FILE NAME: content.txt
How-to articles and non-product content can be indexed as well. Similar to the Main Data Feed, each record is represented by a unique ID and other values that support this unique record. The additional columns contained in this data feed file have a one-to-one relationship with the unique ID.
The creation of this data feed file may consist of table joins on the client's data layer, but Hawk Search expects one file. For each row, values that don't exist (e.g. sale price) can be left blank. If additional data values are required based on Business Requirements, column names can be added. Contact the Hawk Search Professional Services for any custom modifications.
Also, the column headers must match the column names listed below. The column names are case-sensitive.

...

Every time the Hawk Search process runs to index the new data, we use the data provided in the timestamp file to run a series of checks.

Anchor
_Toc408320366
_Toc408320366
Download Timestamp File

The timestamp file is downloaded from the web accessible URL or the FTP server. This will enable the Hawk Search process to verify that the site is generating files at an agreed upon frequency.

Anchor
_Toc408320367
_Toc408320367
Timestamp check

The first check consist of checking the date and time of the files provided. If the Hawk Search process is expecting the site to generate a file every 3 hours. The process checks the timestamp file and it is from 6 hours prior, the Hawk Search will not download or reprocess the files. The process will throw an error to notify the team that files are out of date.

Anchor
_Toc408320368
_Toc408320368
Partial or Full Feed

The next check consist of establishing if the data files are full or partial feeds. Based on this information, we run the appropriate process.

Anchor
_Toc408320369
_Toc408320369
File Name Check

A check will be done on the names of the files. The system will react different based on the scenarios outlined below:

If one of the expected files are not provided, the system will throw an error.
If the name of the file is different than originally provided, then the system will throw an error.
If an additional file is provided with a name that was not previously established, the file will be ignored.

Anchor
_Toc408320370
_Toc408320370
Line Count

The last check is on the counts of lines for each of the file. The system will be confirming that the row counts provided in the timestamp file match the counts on the actual files. If the counts match, the system will proceed as normal. If the counts do not match, the system will throw an error. This is to eliminate indexing files that were partially downloaded or were corrupted during downloads

Anchor
_Toc408320371
_Toc408320371
Rebuilding Index

The indexing process on the Hawk Search end will be scheduled to run in accordance with the feed generation schedule on the site end.
There is also the option of a REST API available at the URL below. This URL can be used for re-building the index for the site once the feeds generation has completed. You will need an API key to use for authenticating with the API. Please contact your Hawk Search Representative to request an API Key.

Anchor
_Toc408320372
_Toc408320372
Staging Site:

http://staging.hawksearch.com/api/

Anchor
_Toc408320373
_Toc408320373
Production Site:

http://api.hawksearch.info/api/v3
Please reference the Index method for additional details.

Anchor

	_Toc408320374
	_Toc408320374

Feed Delivery Method
Hawk Search can accept the feeds using the delivery methods below.

Anchor
_Toc408320375
_Toc408320375
FTP

Files can be hosted at an FTP location. You can host the files on your own FTP server and provide credentials to the Hawk Search team to access the account. Hawk Search supports both FTP and SFTP protocol and can download files from either server. If you do not have your own FTP server and need an account, ask your Hawk Search Representative to provide one.

Anchor
_Toc408320376
_Toc408320376
Web Accessible URL:

Files can also be hosted on a directory on your server. Please provide the Hawk Search Team the path to the files so these can be appropriately downloaded from the server. If you have the path protected please provide the Hawk Search Team credentials for the same.

Anchor
_Toc408320377
_Toc408320377
Other Questions?

For questions about the data feeds that Hawk Search can accept, please contact: sales@hawksearch.com

...

Versions Compared

Old Version 2

New Version 3

Key

Anchor
_Toc408320352
_Toc408320352
Overview

Anchor
_Toc408320353
_Toc408320353
Data Feed Requirements

Anchor
_Toc408320354
_Toc408320354
Formatting Options

Anchor
_Toc408320355
_Toc408320355
Hawk Search Standard Data Feed Format

Anchor
Step1167
Step1167
Anchor
_Toc408320356
_Toc408320356
File Format Requirements

Anchor
_Toc408320357
_Toc408320357
Flat-File Format Properties

Anchor
_Toc408320358
_Toc408320358
Main Data Feed

Anchor
_Toc408320359
_Toc408320359
Pre-Configured Data Columns

Anchor
_Toc408320360
_Toc408320360
Attributes Data Feed

Anchor
_Toc408320361
_Toc408320361
Pre-Configured Data Columns

Anchor
_Toc408320362
_Toc408320362
Category Hierarchy

Anchor
_Toc408320363
_Toc408320363
Pre-Configured Data Columns

Anchor
_Toc408320364
_Toc408320364
Content Data Feed

Anchor
_Toc408320366
_Toc408320366
Download Timestamp File

Anchor
_Toc408320367
_Toc408320367
Timestamp check

Anchor
_Toc408320368
_Toc408320368
Partial or Full Feed

Anchor
_Toc408320369
_Toc408320369
File Name Check

Anchor
_Toc408320370
_Toc408320370
Line Count

Anchor
_Toc408320371
_Toc408320371
Rebuilding Index

Anchor
_Toc408320372
_Toc408320372
Staging Site:

Anchor
_Toc408320373
_Toc408320373
Production Site:

Anchor
_Toc408320375
_Toc408320375
FTP

Anchor
_Toc408320376
_Toc408320376
Web Accessible URL:

Anchor
_Toc408320377
_Toc408320377
Other Questions?

Page Comparison

Versions Compared

Old Version 2

New Version 3

Key

Anchor_Toc408320352_Toc408320352Overview

Anchor_Toc408320353_Toc408320353Data Feed Requirements

Anchor_Toc408320354_Toc408320354Formatting Options

Anchor_Toc408320355_Toc408320355Hawk Search Standard Data Feed Format

AnchorStep1167Step1167 Anchor_Toc408320356_Toc408320356File Format Requirements

Anchor_Toc408320357_Toc408320357Flat-File Format Properties

Anchor_Toc408320358_Toc408320358Main Data Feed

Anchor_Toc408320359_Toc408320359Pre-Configured Data Columns

Anchor_Toc408320360_Toc408320360Attributes Data Feed

Anchor_Toc408320361_Toc408320361Pre-Configured Data Columns

Anchor_Toc408320362_Toc408320362Category Hierarchy

Anchor_Toc408320363_Toc408320363Pre-Configured Data Columns

Anchor_Toc408320364_Toc408320364Content Data Feed

Anchor_Toc408320366_Toc408320366Download Timestamp File

Anchor_Toc408320367_Toc408320367Timestamp check

Anchor_Toc408320368_Toc408320368Partial or Full Feed

Anchor_Toc408320369_Toc408320369File Name Check

Anchor_Toc408320370_Toc408320370Line Count

Anchor_Toc408320371_Toc408320371Rebuilding Index

Anchor_Toc408320372_Toc408320372Staging Site:

Anchor_Toc408320373_Toc408320373Production Site:

Anchor_Toc408320375_Toc408320375FTP

Anchor_Toc408320376_Toc408320376Web Accessible URL:

Anchor_Toc408320377_Toc408320377Other Questions?

Anchor
_Toc408320352
_Toc408320352
Overview

Anchor
_Toc408320353
_Toc408320353
Data Feed Requirements

Anchor
_Toc408320354
_Toc408320354
Formatting Options

Anchor
_Toc408320355
_Toc408320355
Hawk Search Standard Data Feed Format

Anchor
Step1167
Step1167
Anchor
_Toc408320356
_Toc408320356
File Format Requirements

Anchor
_Toc408320357
_Toc408320357
Flat-File Format Properties

Anchor
_Toc408320358
_Toc408320358
Main Data Feed

Anchor
_Toc408320359
_Toc408320359
Pre-Configured Data Columns

Anchor
_Toc408320360
_Toc408320360
Attributes Data Feed

Anchor
_Toc408320361
_Toc408320361
Pre-Configured Data Columns

Anchor
_Toc408320362
_Toc408320362
Category Hierarchy

Anchor
_Toc408320363
_Toc408320363
Pre-Configured Data Columns

Anchor
_Toc408320364
_Toc408320364
Content Data Feed

Anchor
_Toc408320366
_Toc408320366
Download Timestamp File

Anchor
_Toc408320367
_Toc408320367
Timestamp check

Anchor
_Toc408320368
_Toc408320368
Partial or Full Feed

Anchor
_Toc408320369
_Toc408320369
File Name Check

Anchor
_Toc408320370
_Toc408320370
Line Count

Anchor
_Toc408320371
_Toc408320371
Rebuilding Index

Anchor
_Toc408320372
_Toc408320372
Staging Site:

Anchor
_Toc408320373
_Toc408320373
Production Site:

Anchor
_Toc408320375
_Toc408320375
FTP

Anchor
_Toc408320376
_Toc408320376
Web Accessible URL:

Anchor
_Toc408320377
_Toc408320377
Other Questions?