Announcing Seq 2024.3

TL;DR: With Seq 2024.3, Seq gains user-defined high-cardinality indexes, its first new index type since the introduction of signal indexes back in 2018! If you're eager to skip to the binaries and dig around for yourself, you can get MSIs at https://datalust.co/download, or pull datalust/seq:latest from Docker Hub.

Debugging is unpredictable. Because of this, log analysis is unpredictable too. Unlike a typical database system, executing the very same query millions of times in a day, Seq might be called on to run dozens of new, novel queries in the course of a single debugging session.

Debugging is also real-time. Who wants to wait five minutes for an ingestion pipeline to complete, while probing endpoints during an outage? Seq needs to include new data in query results immediately, if it's going to be useful for fixing the most critical bugs.

Both of these requirements - for ad-hoc queries, and for immediate access to ingested data - resulted in Seq being designed to run well without traditional indexes.

But, a system that functions without indexes can still benefit from them. Indexes are phenomenally powerful! There's no optimization quite like simply doing less work, and that's what indexes are all about.

Signal indexes today

Seq v5 introduced signal indexes: lazily-computed, page-granularity bitmap indexes that track which parts of the event stream contain events matching a Boolean predicate. For example, the built-in "Errors" signal uses a predicate resembling @Level in ['error', 'fatal'] ci. Seq builds a signal index by evaluating the predicate against each event in the stream, and setting the bit for the corresponding page to 1 if the result is true. When you activate a signal like "Errors" in the Seq signal bar, your searches only touch the disk pages that actually contain error events. If these are rare, then search times are cut dramatically.

Signal indexes are wonderful because they're very, very space-efficient. Just one bit per 4 KB page of indexed data! Because signal indexes are cheap to store, Seq allows any user to create a signal that speeds up their future searches within it.

This strategy added at least one zero to the event stream size that a given server can reasonably manage. And because Seq falls back to optimized, parallel, brute-force search over recently-ingested data and gaps in the index, search results are still served in real-time whether signal indexes are used or not.

Introducing expression indexes

In Seq 2024.3, we're adding expression indexes, a new index type that complements signal indexes perfectly.

Signal indexes rely on a predicate being fully-specified up-front. This could be a scarily-complex condition over multiple event properties. Or, it might just be a simple comparison, Environment = 'Staging', or Environment = 'Production'.

The Environment case works because there are usually only a handful of possible values for Environment, so we can create signal indexes for all of them, and click to choose between "Staging", "Production", and any others we're interested in.

Signal indexes don't work for comparisons against an unknown, or very large set of possible values. It's not practical to speed up searching for any EmailAddress, OrderId, or BrowserSession using signal indexes: the values aren't known up-front, or there are too many to create signals for all of them.

These kinds of high-cardinality properties are the target of expression indexes. An expression index is defined by choosing a property to index:

And it's activated automatically whenever a search contains an exact equality comparison between that property and a constant value:

RequestId = 'e04e319b144f869c9bc229'

This data set is 23 GB, built just for this post using:

seqcli sample ingest

On an 18 GB, 11-core (5-performance-core) M3 MacBook Pro, running:

select count(*)
from stream
where RequestId = 'e04e319b144f869c9bc229'

(using count aggregation so there's no result paging, and best-of-three) takes 7,728 ms without the index.

Once the index has been applied, the same query completes in 🔥 8 ms.

Expression indexes are implemented using page-granularity, disk-backed hashtables. Expression indexes are lazily computed, and although more storage is required than for signal indexes, they're also extremely space-efficient. On the 23 GB data set, the RequestId index needs 166 MB on disk.

Why are they "expression" indexes and not "property" indexes, though? The UI and the vast majority of use cases revolve around simple property comparisons, but expression indexes work for complex property paths - Cart.Items[0].Name - and in fact, for any expression that Seq can compute over the events being indexed.

What makes a good expression index?

When choosing properties to index,

  1. They should be properties you're likely to search on. There's no value in creating indexes that aren't used.
  2. They should make sense to use in equality comparisons. Expression indexes only apply for = comparisons between the property and a constant value. Expression indexes don't speed up numeric comparisons, substring searches, or anything like that.
  3. They should have a wide spread of possible values. As a rule of thumb, any property with fewer than a dozen different values will make a poor expression index.

Note that there's nothing here about the types or sizes of the values. Expression indexes work for text and numeric data, and the size of the value doesn't have any effect on the space required to store the index.

Should I index messages? Levels? Trace or span ids? Seq already indexes @TraceId and @EventType by default. While you may choose to create expression indexes using built-in properties (through the API or seqcli expressionindex create), it's probably best if you trust us to make indexing choices for built-in properties.

Other changes in Seq 2024.3

A full list of changes is on GitHub. Some highlights:

There's also plenty more polish, bug fixes, and dependency upgrades 🙂.

Upgrading

We hope you enjoy the new features and updates in Seq 2024.3!

Upgrades from earlier Seq 202x releases are in-place and highly-compatible: just run the Windows MSI, or pull the new datalust/seq:latest image and re-start your Seq container.

See docs.datalust.co/docs/upgrading for detailed information, or reach out to the Seq team for help, via support@datalust.co. 👋