Skip to content

Detailed Description

dbraga edited this page Nov 11, 2014 · 2 revisions

WikiDetailed Description

Thoth Core (Indexing)

Thoth Core is the module responsible of creating and indexing Thoth documents.
Prior to this module, Solr requests are collected and injected as messages into a queue system (e.g: activeMQ).
For more information: Thoth data Collection
Thoth Core module is responsible for :

  • Dequeueing those messages
  • Parsing messages
  • Creating Thoth Documents
  • Indexing Thoth Documents inside the near-real time core of Thoth index.

Thoth Document

Thoth document is a representation of a Solr request. A Solr request can be a normal Solr query, a sharded Solr query, an exception or a different request generated by a custom handler or component installed on a Solr instance. The mapping between a Solr request and a Thoth document is 1:1

Thoth document is logically divided into two sections:

  • Server information: contains information about the instance that received or generated the request. This will contain the hostname, port number, core name, pool name (if part of a pool)
  • Request information: contains information specific to the request: timestamp, actual request (query if Solr query, stack trace if exception etc), query time if applicable, number of hits if applicable etc

Existing Types

  • com.trulia.thoth.requestdocuments.SolrQueryRequestDocument : Document that represents a Solr request
  • com.trulia.thoth.requestdocuments.SolrShardedQueryRequestDocument : Document that represents a Solr sharded request
  • com.trulia.thoth.requestdocuments.SolrExceptionRequestDocument : Document that represents a Solr request that generated an exception

Complete list of available types of Thoth request documents is here
Load Thoth request documents

Thoth core will try to load all the request documents specified in the request documents configuration file.

Location of the configuration file can be specified in the application.properties file :

# snippet from application.properties
thoth.requestDocument.configuration.file=/tmp/requestDocuments.xml

The content should look something like this

<?xml version="1.0" encoding="UTF-8"?>
<RequestDocuments>
    <RequestDocument>
        <name>ExceptionSolrQuery</name>
        <className>com.trulia.thoth.requestdocuments.SolrExceptionRequestDocument</className>
    </RequestDocument>
    <RequestDocument>
        <name>SolrQuery</name>
        <className>com.trulia.thoth.requestdocuments.SolrQueryRequestDocument</className>
    </RequestDocument>
    <RequestDocument>
        <name>SolrShardedQuery</name>
        <className>com.trulia.thoth.requestdocuments.SolrShardedQueryRequestDocument</className>
    </RequestDocument>
</RequestDocuments>

For a complete example look here

Create custom Thoth request document

To create a new type of Thoth document be sure to:

  • Extend com.trulia.thoth.requestdocuments.AbstractBaseRequestDocument
  • Implement the public void populateSolrInputDocument(SolrInputDocument solrInputDocument) method

Add the newly created jar in the classpath and be sure to add the new request document to the request documents configuration file.

Shrinking Feature

For detailed information about the Thoth Index, click here.
Depending of the size of the collected data, the near-realtime core of Thoth index could become too big to support fast and reliable search.
Thoth Core is able to overcome this problem using the shrinking feature. Every x minutes, a shrinking event is triggered and it will:

  • Generate a shrank document: from n Thoth documents for a single server will generate a single summary document
  • Index the shrank document in the shrank core
  • Clean the near-real time core

The shrinking time period x is configurable inside application.properties , see Configuration and Setup
Shriking time should be directly related to the amount of data flow that your search infrastructure is supporting.

Clone this wiki locally