Faster searches with property indexes

Now in preview, Seq 2024.3 introduces a new index type on event properties, known as expression indexes. Under the right circumstances, expression indexes reduce search time from minutes to milliseconds.

Expression indexes are created using the context menu on an event property. A property like OrderId is a strong candidate for an index because different events are likely to have different OrderIds.

An expanded event with 'Index this' highlighted for the 'OrderId' property.

Once the index has been built, searches of the form OrderId = <value> will use the index. If the indexed property is well suited to indexing it can be possible to search terabytes in less than a second (often much less).

While Seq can scan through events very fast, doing so requires a lot of system resources that are not then available for other tasks. When an index is not available Seq scans through the event stream, in order, checking each event sequentially. Seq is optimized for this scenario, with tricks like fragment searches and sparse deserialization, so that it can process millions of events per second, but a well designed index makes it possible to answer queries while doing much less work.

Database indexes organize the data in a way that reduces the space that must be searched to find a result. It is analogous to the organization of books in a book store. The books may be organized by the author's name, or by genre. Either way, if I am looking for horror stories by Lovecraft I can skip to the correct section, reducing my search to a much smaller set of books. Seq events have always been organized by @Timestamp, making searching by @Timestamp effectively instant. The introduction of expression indexes allows events to be organized, and therefore efficiently searchable, by new dimensions. Most expression indexes will improve the efficiency of searches, however, some properties make much better indexes than others.

Choosing index properties

The optimal expression index has values that tend to appear only in small portions of the log. This may be a property with many different values (high-cardinality) but does not have to be. The important property is that checking the index for a particular value should be able to restrict the search to a small fraction of the log. If the bookstore above organized their stock into hardcover and paperback sections it would not be ideal. The customer would still have to search approximately half of the books in the store. If instead the store chose to organize books by author the situation would be much improved, and the customer would only need to search a tiny portion of the store's books.

When choosing index properties for Seq anything with hundreds, or thousands, of discreet values is likely to be a good index.

The cost of indexes

Each additional index slightly increases the background work that Seq must do to keep indexes up to date, and each must be stored on disk. The Data > Indexing page supports monitoring and maintaining Seq indexes, including expression indexes, signal indexes and alert indexes.

The total indexing time for the last 24 hours provides an indication of the compute cost of maintaining all of the indexes, while the storage requirements are shown for each index.

Indexing page showing a list of expression indexes, signal indexes and alert indexes.

Signal indexes

Seq already has another type of index, signal indexes, that indexes predicates (expressions that are either true or false for a given event).

For example, a signal index for the predicate OrderId = 'order-da48278b2a20310221d5d2' is great for finding events where the OrderId is exactly order-da48278b2a20310221d5d2, but cannot help find events where the OrderId has any other value. In comparison, an expression index on the OrderId property indexes all values of the OrderId property. Any search for OrderId = <value> will try to use the index.

In general, if a property has a small number of known values then it is best to create a signal index for each value. If a property has a large number of values, or values that are not known in advance, then an expression index is probably better. Signal indexes are compact and efficient to combine. Expression indexes have the flexibility to search for any value. Judicious use of both index types will help your Seq instance to serve more queries with less resources.

Learning more

Liam McLennan

Read more posts by this author.