Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

In this article you will find:

Table of Contents

Goal

This article provides information on how to index documents which are above the document size limit.

...

  1. Open the backend of your Sitefinity instance .

  2. Navigate to Administartion → Settings and click Advanced (your-site-domain/Sitefinity/Administration/Settings/Advanced)

  3. Open the Hawksearch configuration

  4. Under document size limit enter 4000KB100KB

  5. Save the changes

Upload content above the limit

...

  1. Open the backend of your Sitefinity instance

  2. Navigate to Content → Documents & Files

  3. Upload files above the document size limit e.g. 10MB

...

  1. 20MB

  2. Navigate to Administration → Search indexes (your-site-domain/Sitefinity/Administration/Search indexes)

  3. Open your currently used and active Index

  4. Select Documents from the scope. This will add the documents you have uploaded to the index

Note

During indexing the files are stripped and only the text content is extracted. Some files contain a lot of metadata or embedded resources (e.g. photos) so a 10MB 20MB .pdf may only contain 1MB 2MB of actual data.

Info

Setup search service

In order to index documents above the document size limit you need to inherit the HawksearchService class and override the AdaptDocuments.

Here we will demonstrate how to

...

:

empty the content field

...

OR take the first 500 words in it.

Info

The following code snippet demonstrates how to strip the document from it’s Content field in order to pass the document size limit check.

...

Empty content field

Code Block
languagec#
using System;
using System.Collections.Generic;
using System.Linq;
using Hawksearch.Search;
using Telerik.Sitefinity.Services.Search.Data;
using Telerik.Sitefinity.Configuration;
using Telerik.Sitefinity.Services.Search.Model;
using Hawksearch.Configuration;
using Hawksearch.SDK.Indexing;
using
Field = Telerik.Sitefinity.Services.Search.Publishing.Field;

namespace Hawksearch122.Custom
namespace SitefinityWebApp.Search
{
    public class CustomSearchService : HawksearchService
    {
        private protectedconst overridestring List<SubmitDocument>DocumentContentType AdaptDocuments(IEnumerable<IDocument> documents)= "Telerik.Sitefinity.Libraries.Model.Document";

       { protected override List<SubmitDocument> AdaptDocuments(IEnumerable<IDocument> documents)
       var documentList = new List<IDocument>();
{
            var configManagerdoc = ConfigManager.GetManagerdocuments.ToList().FirstOrDefault();
            var hawkConfigdocumentList = new configManager.GetSection<HawkSearchConfig>List<IDocument>(documents);

            foreachif (vardoc document in documents!= null)
            {
                var documentSizecontentTypeField = 0.0;
   doc.Fields.FirstOrDefault(f => f.Name == "ContentType");

           var modifiedDocument = document;  if (contentTypeField != null)
            foreach (var field in document.Fields){
                {    if (string.Equals(contentTypeField.Value.ToString(), DocumentContentType, StringComparison.InvariantCultureIgnoreCase))
                 if (field.Value != null)   {
                        var configManager = ConfigManager.GetManager();
{                        var documentSizehawkConfig += System.Text.Encoding.Unicode.GetByteCount(field.Value.ToString()) / 1024.0configManager.GetSection<HawkSearchConfig>();
                    }    documentList.Clear();

           }             foreach (var document in documents)
if (documentSize > hawkConfig.DocumentSizeLimit)                 {    {
                modifiedDocument = this.ModifyDocument(document);          var modifiedDocument = document;
      }                      var documentSize = documentListthis.AddCalculateDocumentSize(modifiedDocumentdocument);

           }                 returnif base.AdaptDocuments(documentList);
 (documentSize > hawkConfig.DocumentSizeLimit)
      }          private IDocument ModifyDocument(IDocument document)         {
             var fields = new List<IField>(document.Fields);               var contentFieldmodifiedDocument = documentthis.Fields.FirstOrDefault(f => f.Name == "Content");ModifyDocument(document);
                         if (contentField != null)}

           {                 fieldsdocumentList.RemoveAdd(contentFieldmodifiedDocument);
            }            }
 var   modifiedDocument = new Document(fields, document.IdentityField.Name);            }
 return modifiedDocument;         }     }
   }

Take first 500 words

Code Block
using System.Collections.Generic;
using System.Linq;
using Hawksearch.Search;
using Telerik.Sitefinity.Services.Search.Data;
using Telerik.Sitefinity.Configuration;
using Telerik.Sitefinity.Services.Search.Model;
using Hawksearch.Configuration;
using Hawksearch.SDK.Indexing;
using Field = Telerik.Sitefinity.Services.Search.Publishing.Field;

namespace Hawksearch122.Custom
{         }

            return base.AdaptDocuments(documentList);
     public class CustomSearchService :}
HawksearchService
    {    private     protected override List<SubmitDocument> AdaptDocuments(IEnumerable<IDocument> documentsdouble CalculateDocumentSize(IDocument document)
        {
            var documentListdocumentSize = 0.0;
new
List<IDocument>();            foreach (var configManagerfield =in ConfigManagerdocument.GetManager(Fields);
            var{
hawkConfig = configManager.GetSection<HawkSearchConfig>();              foreachif (var document in documents)field.Value != null)
                {
                 var   documentSize += 0System.Text.Encoding.Unicode.GetByteCount(field.Value.ToString()) / 1024.0;
                }
var modifiedDocument = document;         }

       foreach (var field in document.Fields) return documentSize;
        }

    {    private IDocument ModifyDocument(IDocument document)
        {
    if (field.Value != null)     var fields = new List<IField>(document.Fields);
           { var contentField = document.Fields.FirstOrDefault(f => f.Name == "Content");

            if   documentSize += System.Text.Encoding.Unicode.GetByteCount(field.Value.ToString()) / 1024.0;(contentField != null)
            {
         }       contentField.Value = string.Empty;
       }     }

           if (documentSizevar > hawkConfig.DocumentSizeLimit)
  modifiedDocument = new Document(fields, document.IdentityField.Name);

            {
                    modifiedDocument = this.ModifyDocument(document)return modifiedDocument;
        }
    }
  }

                documentList.Add(modifiedDocument);
            }

            return base.AdaptDocuments(documentList);
        }

        private IDocument ModifyDocument(IDocument document)
        {
      }
Note

Once you implement the code in Visual Studio , build your solution and you will also have to reindex the index you are using from Administrator → Search Indexes → Action → Reindex

Info

Expected results

Now if you Inspect your frontend page you should be able to see the title of your large document, but not the content. There will be no content field as well in the XHR search → results → document fields

Take the first 500 words

Code Block
languagec#
using System;
using System.Collections.Generic;
using System.Linq;
using Hawksearch.Search;
using Telerik.Sitefinity.Services.Search.Data;
using Telerik.Sitefinity.Configuration;
using Telerik.Sitefinity.Services.Search.Model;
using Hawksearch.Configuration;
using Hawksearch.SDK.Indexing;
using Field = Telerik.Sitefinity.Services.Search.Publishing.Field;

namespace SitefinityWebApp.Search
{
    public class CustomSearchService : HawksearchService
    {
     var wordLimit = 500;private const string DocumentContentType = "Telerik.Sitefinity.Libraries.Model.Document";

      var fields =protected new List<IField>(document.Fields);override List<SubmitDocument> AdaptDocuments(IEnumerable<IDocument> documents)
        {
            var contentFielddoc = documentdocuments.FieldsToList().FirstOrDefault(f);
=> f.Name == "Content");         var documentList = new fields.RemoveList<IDocument>(contentFielddocuments);

           contentField = this.ExtractFieldContent(contentField, wordLimit);if (doc != null)
            {
  fields.Add(contentField);              var modifiedDocumentcontentTypeField = new Document(fields, document.IdentityField.Namedoc.Fields.FirstOrDefault(f => f.Name == "ContentType");

            return modifiedDocument;   if (contentTypeField != null)
  }          private IField ExtractFieldContent(IField contentField, int{
wordLimit)         {             var fieldValue = contentFieldif (string.Equals(contentTypeField.Value.ToString();, DocumentContentType, StringComparison.InvariantCultureIgnoreCase))
           if (!string.IsNullOrWhiteSpace(fieldValue))        {
    {                 var modifiedContent = string.Join(" ", fieldValue.Split(' ').Take(wordLimit).ToArray()var configManager = ConfigManager.GetManager();
                contentField = new Field     var hawkConfig = configManager.GetSection<HawkSearchConfig>();
        {                documentList.Clear();

   Name = "Content",                   foreach (var Value = modifiedContentdocument in documents)
                        {
   };             }            var modifiedDocument return= contentFielddocument;
        }
    }
}
                      var documentSize = this.CalculateDocumentSize(document);

                            if (documentSize > hawkConfig.DocumentSizeLimit)
                            {
                                modifiedDocument = this.ModifyDocument(document);
                            }

                            documentList.Add(modifiedDocument);
                        }
                    }
                }
            }

            return base.AdaptDocuments(documentList);
        }

        private double CalculateDocumentSize(IDocument document)
        {
            var documentSize = 0.0;

            foreach (var field in document.Fields)
            {
                if (field.Value != null)
                {
                    documentSize += System.Text.Encoding.Unicode.GetByteCount(field.Value.ToString()) / 1024.0;
                }
            }

            return documentSize;
        }

        private IDocument ModifyDocument(IDocument document)
        {
            var wordLimit = 500;
            var fields = new List<IField>(document.Fields);
            var contentField = document.Fields.FirstOrDefault(f => f.Name == "Content");
            fields.Remove(contentField);
            contentField = this.ExtractFieldContent(contentField, wordLimit);
            fields.Add(contentField);

            var modifiedDocument = new Document(fields, document.IdentityField.Name);

            return modifiedDocument;
        }

        private IField ExtractFieldContent(IField contentField, int wordLimit)
        {
            var fieldValue = contentField.Value.ToString();

            if (!string.IsNullOrWhiteSpace(fieldValue))
            {
                var modifiedContent = string.Join(" ", fieldValue.Split(' ').Take(wordLimit).ToArray());

                contentField = new Field
                {
                    Name = "Content",
                    Value = modifiedContent
                };
            }

            return contentField;
        }
    }
}
Note

Once you implement the code in Visual Studio , build your solution and you will also have to reindex the index you are using from Administrator → Search Indexes → Action → Reindex

Info

Expected results

Now if you Inspect your frontend page you should be able to see the content with first 500 symbols of your large document. There will be also a content field with 500 symbols in the XHR search → results → document fields

Register Custom Search Service

In order to use your custom search service instead of the built-in one you need to register it in the backend.

...

Open the backend of your Sitefinity instance.

...

Navigate to Administration → Settings (your-site-domain/Sitefinity/Administration/Settings)

...

Go to Advanced (your-site-domain/Sitefinity/Administration/Settings/Advanced)

...

Under Search → Search services → Hawksearch enter the TypeName of you custom search service (e.g. SitefinityWebApp.CustomSearchService)

...

to register it in the backend.

Info

Please refer to this documentation - Register custom search service