Suppressing noise to maximize Seq performance

Log data is noisy! Sometimes a rock band playing at the foot of your bed kind of noisy. This isn't always a bad thing - a lot of good log data is speculative, and when it turns out to be important you're glad it's there.

Log noise creates significant challenges for efficiently storing and searching logs. There are two basic approaches to meeting this challenge - either:

  1. keep scaling up your infrastructure to cope, or
  2. continuously identify noise and keep it out of the hot paths for search and analysis.

Sometimes, you'll need a little of both, but this post is about (2).

Behind the scenes, your log data (whatever the system) is organized on persistent storage in pages.

Each page, whether it's 4 or 16 KB, or some other size, is handled by the operating system and storage stack as a single unit - when fetching it into RAM through read operations, caching it to boost performance, or flushing it to disk as part of a write.

A diagram showing a file broken up into 4K pages. Read operations work at page-level granularity.

When your log search needs to read data, each page of potential matches has to be pulled into RAM, decompressed, and inspected in some way. This can be rather expensive: I/O may be slow, IOPS scarce, or CPUs saturated.

The fewer pages a log search has to touch, the faster it will complete. This is where signals come into the picture.

A Seq signal is a predicate that matches a subset of events in the event store. It might be as simple as a single test:

Region = 'us-west-2'

Or something much more complex.

Behind the scenes, Seq uses signals to build bitmap indexes: for each page in the event store, if any of the events in it match the given signal, then the bit corresponding to that page will be "on" in the bitmap index.

A signal index over the file from the previous image. The index shows a list of "bits" corresponding to the pages in the target file. The pages containing hits are "on" in the index, and others are "off".

Once a signal index has been populated, searches with that signal activated will only need to touch pages that are marked in the index as having matches, so the total amount of IO and CPU work may be significantly reduced.

Signal indexes are actually even better than that - because they're bitmaps, they can be combined using typical bitmap & (and) and | (or) operations to further narrow the search space when multiple signals are active at the same time.

You can use this property of signal indexes to speed up your everyday workflow with Seq.

Seq 2023.1 includes big improvements in signal index usage. If you haven't upgraded already, now's the time.

Identifying noise

Structured logs turn out to be fantastic for pinpointing individual event types. While traditional plain text logs often share common patterns between related messages, structured logs have access to far more precise information when targeting a particular kind of event.

This could be using properties on the log events, or message templates that identify individual event types, if your source logging library supports them.

When you open your log stream, chances are you're greeted by mostly noise in the very first page of events:

The Seq Events screen, with unfiltered log stream including fine-grained database operation timings.

The first step to get all of this out of the hot path is to pick a noisy event, and use a property or the event type to exclude it:

Events screen with a single database operation timing event selected. A drop-down is shown beneath the "Type" menu item, with submenu item "Exclude" highlighted.

Over in the right-hand signal bar, this will create a new signal; we'll optimistically call this one "Quiet":

The Events screen after selecting "Exclude" on an event type. The screen shows the signal editor open in the right pane.

To get things underway, now is the time to scour your log for noise and exclude as much uninteresting, repetitive detail as you can. If you can construct a filter that pinpoints a noisy event precisely, surrounding it with not (...) before adding it to the signal will have the effect you want.

A few minutes later you should be looking at a much cleaner, clearer log stream. Searching within this stream, with the Quiet signal selected, will ignore all of those noisy events.

Indexing time

Now's the time to check out Data > Storage. If you have quite a bit of data in your event store, chances are it's mostly pink: events that haven't yet been indexed for your new signal definition.

Storage screen showing unindexed data highlighted in pink.

Time to make a coffee! In ten minutes to an hour, returning to this screen should show mostly green blocks of data with indexes defined.

Making your default workspace quiet

Up on the left of the Seq navigation bar sits a mysterious little drop-down that probably says "Personal". This is your selected workspace and it determines which signals show in the signal bar, among other things.

Clicking the drop-down and editing the workspace will take you to this screen:

Workspace edit screen. The title of the workspace is Personal. In "Default signals" the Quiet signal from the previous step is selected.

Now, add Quiet to the list of default signals for the workspace and save.

Next time you return to Events, the Quiet signal will be activated automatically.

How does this make Seq faster?

With Quiet activated, and kept up to date, searches and queries will completely ignore pages filled only with noisy events - and that's often the vast majority of them.

Activating additional signals alongside Quiet will narrow this down even further.

When Seq can avoid reading a page, it can also be kept out of the operating system's page cache, leaving more room for interesting data and accelerating all other searches.

If you're on a large team, creating a shared workspace and making the Quiet signal "shared" will maximize this effect. Find some docs on organizing workspaces here.

How did you go? If you've had great results, or if you're wondering why you haven't, please drop us a line here or via support@datalust.co. Have fun!

Nicholas Blumhardt

Read more posts by this author.