Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Goal

This article provides information on how to index documents which are above the document size limit.

Prerequisite

Configure Connector - Configure Hawksearch

Setup document size limit

  1. Open the backend of your Sitefinity instance.

  2. Navigate to Administartion → Settings and click Advanced (your-site-domain/Sitefinity/Administration/Settings/Advanced)

  3. Open the Hawksearch configuration

  4. Under document size limit enter 4000KB

  5. Save the changes

Upload content above the limit

For the purpose of this example we are going to upload .docx, .txt and .pdf files.

  1. Open the backend of your Sitefinity instance

  2. Navigate to Content → Documents & Files

  3. Upload files above the document size limit e.g. 10MB

During indexing the files are stripped and only the text content is extracted. Some files contain a lot of metadata so a 10MB .pdf may only contain 1MB of actual data.

Setup search service

In order to index documents above the document size limit you need to inherit the HawksearchService class and override the AdaptDocuments. Here we will demonstrate how to empty the content field or take the first 500 words in it.

The following code snippet demonstrates how to strip the document from it’s Content field in order to pass the document size limit check.

Empty content field

using System.Collections.Generic;
using System.Linq;
using Hawksearch.Search;
using Telerik.Sitefinity.Services.Search.Data;
using Telerik.Sitefinity.Configuration;
using Telerik.Sitefinity.Services.Search.Model;
using Hawksearch.Configuration;
using Hawksearch.SDK.Indexing;

namespace Hawksearch122.Custom
{
    public class CustomSearchService : HawksearchService
    {
        protected override List<SubmitDocument> AdaptDocuments(IEnumerable<IDocument> documents)
        {
            var documentList = new List<IDocument>();
            var configManager = ConfigManager.GetManager();
            var hawkConfig = configManager.GetSection<HawkSearchConfig>();

            foreach (var document in documents)
            {
                var documentSize = 0.0;
                var modifiedDocument = document;

                foreach (var field in document.Fields)
                {
                    if (field.Value != null)
                    {
                        documentSize += System.Text.Encoding.Unicode.GetByteCount(field.Value.ToString()) / 1024.0;
                    }
                }

                if (documentSize > hawkConfig.DocumentSizeLimit)
                {
                    modifiedDocument = this.ModifyDocument(document);
                }

                documentList.Add(modifiedDocument);
            }

            return base.AdaptDocuments(documentList);
        }

        private IDocument ModifyDocument(IDocument document)
        {
            var fields = new List<IField>(document.Fields);
            var contentField = document.Fields.FirstOrDefault(f => f.Name == "Content");

            if (contentField != null)
            {
                contentField.Value = string.Empty;
            }

            var modifiedDocument = new Document(fields, document.IdentityField.Name);

            return modifiedDocument;
        }
    }
}

Take first 500 words

using System.Collections.Generic;
using System.Linq;
using Hawksearch.Search;
using Telerik.Sitefinity.Services.Search.Data;
using Telerik.Sitefinity.Configuration;
using Telerik.Sitefinity.Services.Search.Model;
using Hawksearch.Configuration;
using Hawksearch.SDK.Indexing;
using Field = Telerik.Sitefinity.Services.Search.Publishing.Field;

namespace Hawksearch122.Custom
{
    public class CustomSearchService : HawksearchService
    {
        protected override List<SubmitDocument> AdaptDocuments(IEnumerable<IDocument> documents)
        {
            var documentList = new List<IDocument>();
            var configManager = ConfigManager.GetManager();
            var hawkConfig = configManager.GetSection<HawkSearchConfig>();

            foreach (var document in documents)
            {
                var documentSize = 0.0;
                var modifiedDocument = document;

                foreach (var field in document.Fields)
                {
                    if (field.Value != null)
                    {
                        documentSize += System.Text.Encoding.Unicode.GetByteCount(field.Value.ToString()) / 1024.0;
                    }
                }

                if (documentSize > hawkConfig.DocumentSizeLimit)
                {
                    modifiedDocument = this.ModifyDocument(document);
                }

                documentList.Add(modifiedDocument);
            }

            return base.AdaptDocuments(documentList);
        }

        private IDocument ModifyDocument(IDocument document)
        {
            var wordLimit = 500;
            var fields = new List<IField>(document.Fields);
            var contentField = document.Fields.FirstOrDefault(f => f.Name == "Content");
            fields.Remove(contentField);
            contentField = this.ExtractFieldContent(contentField, wordLimit);
            fields.Add(contentField);

            var modifiedDocument = new Document(fields, document.IdentityField.Name);

            return modifiedDocument;
        }

        private IField ExtractFieldContent(IField contentField, int wordLimit)
        {
            var fieldValue = contentField.Value.ToString();

            if (!string.IsNullOrWhiteSpace(fieldValue))
            {
                var modifiedContent = string.Join(" ", fieldValue.Split(' ').Take(wordLimit).ToArray());
                contentField = new Field
                {
                    Name = "Content",
                    Value = modifiedContent
                };
            }

            return contentField;
        }
    }
}

Register Custom Search Service

In order to use your custom search service instead of the built-in one you need to register it in the backend.

  1. Open the backend of your Sitefinity instance.

  2. Navigate to Administration → Settings (your-site-domain/Sitefinity/Administration/Settings)

  3. Go to Advanced (your-site-domain/Sitefinity/Administration/Settings/Advanced)

  4. Under Search → Search services → Hawksearch enter the TypeName of you custom search service (e.g. SitefinityWebApp.CustomSearchService)

  5. Save the changes

  • No labels