Log data is noisy! Sometimes a rock band playing at the foot of your bed kind of noisy. This isn't always a bad thing - a lot of good log data is speculative, and when it turns out to be important you're glad it's there.
Log noise creates significant challenges for efficiently storing and searching logs. There are two basic approaches to meeting this challenge - either:
- keep scaling up your infrastructure to cope, or
- continuously identify noise and keep it out of the hot paths for search and analysis.
Sometimes, you'll need a little of both, but this post is about (2).
The role of signal indexes in efficient search
Behind the scenes, your log data (whatever the system) is organized on persistent storage in pages.
Each page, whether it's 4 or 16 KB, or some other size, is handled by the operating system and storage stack as a single unit - when fetching it into RAM through read operations, caching it to boost performance, or flushing it to disk as part of a write.
When your log search needs to read data, each page of potential matches has to be pulled into RAM, decompressed, and inspected in some way. This can be rather expensive: I/O may be slow, IOPS scarce, or CPUs saturated.
The fewer pages a log search has to touch, the faster it will complete. This is where signals come into the picture.
A Seq signal is a predicate that matches a subset of events in the event store. It might be as simple as a single test:
Region = 'us-west-2'
Or something much more complex.
Behind the scenes, Seq uses signals to build bitmap indexes: for each page in the event store, if any of the events in it match the given signal, then the bit corresponding to that page will be "on" in the bitmap index.
Once a signal index has been populated, searches with that signal activated will only need to touch pages that are marked in the index as having matches, so the total amount of IO and CPU work may be significantly reduced.
Signal indexes are actually even better than that - because they're bitmaps, they can be combined using typical bitmap &
(and) and |
(or) operations to further narrow the search space when multiple signals are active at the same time.
You can use this property of signal indexes to speed up your everyday workflow with Seq.
Seq 2023.1 includes big improvements in signal index usage. If you haven't upgraded already, now's the time.
Identifying noise
Structured logs turn out to be fantastic for pinpointing individual event types. While traditional plain text logs often share common patterns between related messages, structured logs have access to far more precise information when targeting a particular kind of event.
This could be using properties on the log events, or message templates that identify individual event types, if your source logging library supports them.
When you open your log stream, chances are you're greeted by mostly noise in the very first page of events:
The first step to get all of this out of the hot path is to pick a noisy event, and use a property or the event type to exclude it:
Over in the right-hand signal bar, this will create a new signal; we'll optimistically call this one "Quiet":
To get things underway, now is the time to scour your log for noise and exclude as much uninteresting, repetitive detail as you can. If you can construct a filter that pinpoints a noisy event precisely, surrounding it with not (...)
before adding it to the signal will have the effect you want.
A few minutes later you should be looking at a much cleaner, clearer log stream. Searching within this stream, with the Quiet signal selected, will ignore all of those noisy events.
Indexing time
Now's the time to check out Data > Storage. If you have quite a bit of data in your event store, chances are it's mostly pink: events that haven't yet been indexed for your new signal definition.
Time to make a coffee! In ten minutes to an hour, returning to this screen should show mostly green blocks of data with indexes defined.
Making your default workspace quiet
Up on the left of the Seq navigation bar sits a mysterious little drop-down that probably says "Personal". This is your selected workspace and it determines which signals show in the signal bar, among other things.
Clicking the drop-down and editing the workspace will take you to this screen:
Now, add Quiet to the list of default signals for the workspace and save.
Next time you return to Events, the Quiet signal will be activated automatically.
How does this make Seq faster?
With Quiet activated, and kept up to date, searches and queries will completely ignore pages filled only with noisy events - and that's often the vast majority of them.
Activating additional signals alongside Quiet will narrow this down even further.
When Seq can avoid reading a page, it can also be kept out of the operating system's page cache, leaving more room for interesting data and accelerating all other searches.
If you're on a large team, creating a shared workspace and making the Quiet signal "shared" will maximize this effect. Find some docs on organizing workspaces here.
How did you go? If you've had great results, or if you're wondering why you haven't, please drop us a line here or via support@datalust.co. Have fun!