-
Notifications
You must be signed in to change notification settings - Fork 13
Detailed Description
Wiki ▸ Detailed Description
Thoth Core is the module responsible of creating and indexing Thoth documents.
Prior to this module, Solr requests are collected and injected as messages into a queue system (e.g: activeMQ).
For more information: Thoth data Collection
Thoth Core module is responsible for :
- Dequeueing those messages
- Parsing messages
- Creating Thoth Documents
- Indexing Thoth Documents inside the near-real time core of Thoth index.
Thoth document is a representation of a Solr request. A Solr request can be a normal Solr query, a sharded Solr query, an exception or a different request generated by a custom handler or component installed on a Solr instance. The mapping between a Solr request and a Thoth document is 1:1
Thoth document is logically divided into two sections:
- Server information: contains information about the instance that received or generated the request. This will contain the hostname, port number, core name, pool name (if part of a pool)
- Request information: contains information specific to the request: timestamp, actual request (query if Solr query, stack trace if exception etc), query time if applicable, number of hits if applicable etc
-
com.trulia.thoth.requestdocuments.SolrQueryRequestDocument
: Document that represents a Solr request -
com.trulia.thoth.requestdocuments.SolrShardedQueryRequestDocument
: Document that represents a Solr sharded request -
com.trulia.thoth.requestdocuments.SolrExceptionRequestDocument
: Document that represents a Solr request that generated an exception
Complete list of available types of Thoth request documents is here
Load Thoth request documents
Thoth core will try to load all the request documents specified in the request documents configuration file.
Location of the configuration file can be specified in the application.properties
file :
# snippet from application.properties
thoth.requestDocument.configuration.file=/tmp/requestDocuments.xml
The content should look something like this
<?xml version="1.0" encoding="UTF-8"?>
<RequestDocuments>
<RequestDocument>
<name>ExceptionSolrQuery</name>
<className>com.trulia.thoth.requestdocuments.SolrExceptionRequestDocument</className>
</RequestDocument>
<RequestDocument>
<name>SolrQuery</name>
<className>com.trulia.thoth.requestdocuments.SolrQueryRequestDocument</className>
</RequestDocument>
<RequestDocument>
<name>SolrShardedQuery</name>
<className>com.trulia.thoth.requestdocuments.SolrShardedQueryRequestDocument</className>
</RequestDocument>
</RequestDocuments>
For a complete example look here
To create a new type of Thoth document be sure to:
- Extend
com.trulia.thoth.requestdocuments.AbstractBaseRequestDocument
- Implement the
public void populateSolrInputDocument(SolrInputDocument solrInputDocument)
method
Add the newly created jar in the classpath and be sure to add the new request document to the request documents configuration file.
For detailed information about the Thoth Index, click here.
Depending of the size of the collected data, the near-realtime core of Thoth index could become too big to support fast and reliable search.
Thoth Core is able to overcome this problem using the shrinking feature.
Every x minutes, a shrinking event is triggered and it will:
- Generate a shrank document: from n Thoth documents for a single server will generate a single summary document
- Index the shrank document in the shrank core
- Clean the near-real time core
The shrinking time period x is configurable inside application.properties
, see Configuration and Setup
Shriking time should be directly related to the amount of data flow that your search infrastructure is supporting.