Skip to content

Examine Facets proposal #310

@nikcio

Description

@nikcio

Examine Facets proposal

Linked PR #311
Linked PR #312
Linked PR #313

What is faceted search?

Faceted search is a technique that involves augmenting traditional search techniques with a faceted navigation system, allowing users to narrow down search results by applying multiple filters based on faceted classification of the items. It is sometimes referred to as a parametric search technique. A faceted classification system classifies each information element along multiple explicit dimensions, called facets, enabling the classifications to be accessed and ordered in multiple ways rather than in a single, pre-determined, taxonomic order. (Source)

Description

This proposal is on the implementation of faceted search in Examine. The proposal is mostly based on finding reasonable interface abstractions the best approach for building the feature and less on specific implementation details.

Previous implementation

Facets are available for Examine when targeting .NET framework via the Examine.Facets package by Callum Whyte. This proposal is based on the implementation of that package.

Motivation

I'm currently working on a project which would have a great use for Examine facets.

Approach

1. Externalize the Faceting features

The first approach is to externalize the Faceting features to a separate Nuget package in the same way Examine.Facets works a POC of this approach can be seen here: POC: Examine.Facets

2. Internalize the Faceting features (I think this is the best approach)

The second approach is to make faceting available directly in the existing Examine package and existing classes this would make it possible to avoid creating the following implementations and instead add the features to the existing classes:

This approach, therefore, allows the default searcher to do a faceted search and this will lower the barrier to entry because a developer wouldn't have to register a separate searcher and explicitly use this searcher for faceted searching. As seen in this example:

Example 1 (From the existing Examine.Facets package by Callum)

// Setup
if (_examineManager.TryGetIndex("CustomIndex", out IIndex index))
{
    if (index is LuceneIndex luceneIndex)
    {
        var searcher = new FacetSearcher(
            "FacetSearcher",
            luceneIndex.GetIndexWriter(),
            luceneIndex.DefaultAnalyzer,
            luceneIndex.FieldValueTypeCollection
        );

        _examineManager.AddSearcher(searcher);
    }
}

// Fetching a searcher
_examineManager.TryGetSearcher("FacetSearcher", out ISearcher searcher);

Example 2 (From a test in the POC)

TrackingIndexWriter writer = indexer.IndexWriter;
var searcherManager = new SearcherManager(writer.IndexWriter, true, new SearcherFactory());
var searcher = new FacetSearcher(nameof(FacetSearcher), searcherManager, analyzer, indexer.FieldValueTypeCollection);

Example source

Structure

Bases

Searching

public interface IFacetField
{
    /// <summary>
    /// The field name
    /// </summary>
    string Field { get; }

    /// <summary>
    /// The field to get the facet field from
    /// </summary>
    string FacetField { get; set; }
}

Searching results

public interface IFacetValue
{
    /// <summary>
    /// The label of the facet value
    /// </summary>
    string Label { get; }

    /// <summary>
    /// The occurrence of a facet field
    /// </summary>
    float Value { get; }
}
public interface IFacetResult : IEnumerable<IFacetValue>
{
    /// <summary>
    /// Gets the facet for a label
    /// </summary>
    /// <param name="label"></param>
    /// <returns></returns>
    IFacetValue Facet(string label);
}

Example facet result

Tags:

Software (121)
People (20)
Packages (2)
public interface IFacetResults
{
    /// <summary>
    /// Facets from the search
    /// </summary>
    IDictionary<string, IFacetResult> Facets { get; }
}

Extensions

/// <summary>
/// Get the values for a particular facet in the results
/// </summary>
public static IFacetResult GetFacet(this ISearchResults searchResults, string field)
{
    // Implementation
}
/// <summary>
/// Get all of the facets in the results
/// </summary>
public static IEnumerable<IFacetResult> GetFacets(this ISearchResults searchResults)
{
    // Implementation
}

Types of facets

Sources of information about Lucene's facet search are:

String Facet

Allows for counting the documents that share the same string value.

New FieldDefinitionTypes:

  • FacetFullText
  • FacetFullTextSortable

Extends the existing FullText and FullTextSortable type and adds the required SortedSetDocValuesFacetField to the indexed document. Without this field, SortedSetDocValuesFacetCounts will not work.

New query methods

On IQuery

/// <summary>
/// Add a facet string to the current query
/// </summary>
IFacetQueryField Facet(string field);

/// <summary>
/// Add a facet string to the current query, filtered by value
/// </summary>
IFacetQueryField Facet(string field, string value);

/// <summary>
/// Add a facet string to the current query, filtered by multiple values
/// </summary>
IFacetQueryField Facet(string field, string[] values);
public interface IFacetQueryField : IBooleanOperation
{
    /// <summary>
    /// Maximum number of terms to return
    /// </summary>
    IFacetQueryField MaxCount(int count);

    /// <summary>
    /// Sets the field where the facet information will be read from
    /// </summary>
    IFacetQueryField FacetField(string fieldName)
}

New IFacetField

public interface IFacetFullTextField : IFacetField
{
    /// <summary>
    /// Maximum number of terms to return
    /// </summary>
    int MaxCount { get; set; }

    /// <summary>
    /// Filter values
    /// </summary>
    string[] Values { get; set; }
}

Facets config / New index methods - Optional addition. Properly not the most used feature

FacetsConfig allows for setting some values in the index which are useful for faceting API docs

On LuceneIndexOptions

public FacetsConfig FacetConfig { get; set; }

This will make it possible to set the facet configuration on the specific index and reuse it when searching.

Methods to change the field used when reading facets (default is $facets which is where all facet values are indexed if FacetsConfig.SetIndexFieldName(dimName, indexFieldName) is not called.):

See IFacetQueryField (It's not possible to specify the reading field in range facets)

This will make it possible to set the faceting field per facet field giving the most flexibility when composing a query,

Note: The FacetConfig will also need to be available at search time in the searchExecutor to be used in the constructor when using Taxonomy

Numeric Range Facet

Used with numbers to build range facets. For example, it would group documents of the same price range.

Double Range

New FieldDefinitionTypes:

  • FacetDouble
  • FacetFloat

Extends the existing Double and Float type and adds the required DoubleDocValuesField and SingleDocValuesField respectively, aswell as the SortedSetDocValuesFacetField to enable string like faceting, to the indexed document. Without the fields, DoubleDocValuesField and SingleDocValuesField faceting will not work.

New query methods

On IQuery

/// <summary>
/// Add a range facet to the current query
/// </summary>
IFacetRangeQueryField Facet(string field, DoubleRange[] doubleRanges);
public interface IFacetDoubleRangeQueryField : IBooleanOperation
{
    /// <summary>
    /// Sets if the range query is on <see cref="float"/> values
    /// </summary>
    /// <param name="isFloat"></param>
    /// <returns></returns>
    IFacetDoubleRangeQueryField IsFloat(bool isFloat);
}

New IFacetField

public interface IFacetDoubleField : IFacetField
{
    DoubleRange[] DoubleRanges { get; set; }
}

Long Range / Numeric range

New FieldDefinitionTypes:

  • FacetInt
  • FacetLong
  • FacetDateTime
  • FacetDateYear
  • FacetDateMonth
  • FacetDateDay
  • FacetDateHour
  • FacetDateMinute

Extends the existing types and adds the required NumericDocValuesField, aswell as the SortedSetDocValuesFacetField to enable string like faceting, to the indexed document. Without the fields, NumericDocValuesField faceting will not work.

New query methods

On IQuery

/// <summary>
/// Add a range facet to the current query
/// </summary>
IFacetRangeQueryField Facet(string field, Int64Range[] longRanges);
public interface IFacetLongRangeQueryField : IBooleanOperation
{
}

New IFacetField

public interface IFacetLongField : IFacetField
{
    Int64Range[] LongRanges { get; set; }
}

Taxonomy Facet

Doing Taxonomy requires using a speciffic writer (DirectoryTaxonomyWriter) and is therefore out of the scope of this proposal.

See more at: https://norconex.com/facets-with-lucene/


What now

  • Decide on the best approach for the feature.
  • Implement the API proposal.
  • Add documentation on the new facet API.
  • Refine the API proposal to be acceptable if it's not already.
  • Release version with the proposal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions