A Seq query language primer
Seq 2020.1 includes some interesting query language improvements, including object literals, a universal
ci
case-insensitive text comparison modifier, conditional expressions, and a bunch of new built-ins. Now seems like a good time to reintroduce our much-loved query language from the ground, up!
Seq collects logs from applications and services, to drive dashboarding and alerts, and to make debugging easier when things go wrong in production.
All of these tasks revolve around searching and analyzing large numbers of events, trying to spot patterns and anomalies, or finding specific events based on their properties. This is what the Seq query language is all about!
Searching and analyzing structured logs
The Seq query language serves two closely related, yet distinct, purposes. First, it's the foundation of quick and easy log searching: finding structured log events that match some criteria. Second, it's a language for log analysis: computing statistics and revealing patterns in aggregated log data.
As a language for log search, the Seq query language needs to be ergonmic, forgiving, and terse. Log search is usually ad-hoc and exploratory - and often done in a hurry!
As a language for log analysis, the language must be clear and precise. Especially when results are numeric or aggregated from many records, the underlying computation needs to behave exactly as the user expects. Log analysis is much more like programming in this context.
Seq's query language meets these goals through two modes, based around a common, core syntax. As searching for log events is the first thing most developers do with Seq, it's conventional to introduce the simpler search expressions first, before examining queries for log analysis.
Search expressions
Many diagnostic sessions start with an error. Something's gone wrong, an error message or error id has been reported, and application logs are often where we turn to track down the root cause.
If you type an error message directly into Seq's search box, you'll often find some events that contain it:
What we've typed isn't a valid search expression, though: rather, Seq has tried to parse our input and failed; the results we see are events that contained the literal text "operation timed out".
Search results in Seq are always in time-descending order: the events at the top are the newest, getting older as we read down the page.
Application logs can be a very big haystack indeed. To pick out the needles, simple text search isn't usually enough. If we want to filter out warnings and see only timouts that resulted in an error, we need to write a valid search expression to exclude them:
Let's break down this expression and look at each part. Along the way we can introduce a number of core features of the Seq query language.
Logical and comparison operators
If you've worked with SQL you'll immediately notice the familiar operators and
and <>
. Seq uses SQL-style operators in expressions.
The comparisons are =
(equality), <>
(inequality, "not equal to"), <
, <=
, >
, and >=
in their usual roles. In search expressions, Seq will also accept C-style ==
and !=
, but these are convenience aliases that are not supported in query mode, so you should prefer the SQL-style versions.
The logical operators are and
, or
, and prefix not
. Just as with the comparisons, C-style &&
, ||
and prefix !
are accepted for convenience.
Properties
@Level
is a built-in property. This means it isn't part of the log event's "payload" but a special, well-known piece of data that is tracked separately for every event.
If we had written Level <> 'Warning'
instead, Level
would refer to a regular property of the event (which doesn't exist, in this case).
There are several built-in properties like @Level
. The most important are:
@Timestamp
— the time an event occurred, stored as an opaque numeric value; every event in Seq must be timestamped@Message
— a message associated with the event; event's don't have to have human-readable messages associated with them, but it's much friendlier to read logs with meaningful messages, so Seq encourages it by making@Message
first-class@MessageTemplate
— structured log sources that support message templates can send the template that produced a log event and Seq will make it available in this built-in property; the message template is like a "type" for a structured event, and Seq computes a numeric@EventType
property based on it to simplify querying events by type@Exception
— just like messages, log events don't necessarily all have associated exception information or stack traces, but these details are important enough to get their own field, and@Properties
— the generic structured data associated with the event, in a key-value map.
Strings and text fragments
Remaining in our example expression are two bits of text, the original error message we were searching for, and 'Warning'
:
"operation timed out" and @Level <> 'Warning'
These are slightly different constructs in Seq's query language.
"operation timed out"
, double-quoted, is a text fragment. It's not compared with anything: it represents some fragment of text that might appear in the event's message (or exception/stack trace). Text fragments are handy for quick searches that need a little more detail: "operation" and not "timed out"
, for example. Text fragments are always matched in a case-insensitive manner.
'Warning'
, with single quotes, is a string. It acts and is used just like any normal programming language string.
String operations and the ci
modifier
We've already seen equality and inequality, both of which can be applied to strings. Seq supports more sophisticated comparisons using like
and not like
:
@Level not like 'Warn%'
The like
operator, borrowed also from SQL, supports %
(zero-or-more characters) and _
(one character) wildcards, enabling prefix, suffix, substring, and more complex comparisons.
New in Seq 2020.1, the univeral case-insensitivity modifier ci
turns whatever operation it is applied to into a case-insensitive one. The ci
modifier is a postfix operator that works with any string comparison, =
, <>
, like
, in
, and more:
Subsystem = 'smtp' ci
Applied to an equality operation like this we'll match values of the Subsystem
property in any character case: smtp
, SMTP
, Smtp
, etc.
If we go back to our original example, "operation timed out"
, we can express this using string comparisons as:
@Message like '%operation timed out%' ci or @Exception like '%operation timed out%' ci
Regular expressions
To round out our discussion of strings and text, regular expression literals are worth a quick mention. These use /
-delimited syntax, familiar from JavaScript:
/o.*n/
The example above matches events that contain any text delimited in by the characters o
and n
: 'only'
, 'operation'
, and so-on.
Regular expressions can be used just like text fragments, as above, and they're also supported by overloads of =
(full string match), <>
(not a full string match), plus some of the built-in functions we'll look at later:
Source = /(Microsoft|System).*Internal/
Data types
The core data types in the Seq query language include:
'single quoted'
- strings (which we've already seen); the quote character'
can be escaped by doubling:'Can''t'
123
,0.45
,0xc0ffee
- numbers, internally represented as 128-bit decimal values30d
,100ms
- durations, specified as whole numbers ofd
ays,h
ours,m
inutes,s
econds, orms
millisecondstrue
andfalse
- Booleans[1, 'test', true]
- arrays with[0]
zero-based numeric indexing{ace: 1, 'beta gamma': 23}
- objects and dictionaries with string keysnull
The last one, null
, is a value in the Seq query language: Result = null
will return true
if the result property exists and has the value null
.
It's worth mentioning that behind the scenes, durations are just numbers. Durations are handy, though, because they're on the right scale to use in comparisons with @Timestamp
and now()
: for example @Timestamp > now() - 30d
.
Object literals are new to the Seq query language in Seq 2020.1. Like arrays, objects can be compared using regular =
and <>
syntax:
EventId = {Id: 1, Name: 'UnhandledException'}
Subproperties of an object are accessed using .MemberAccess
or ['indexer']
syntax.
Functions
Functions fill their normal role and use Name(Arg0, Arg1, ...)
call syntax. There are many built-in functions; some of the more frequently-used:
coalesce(a0, a1, ...)
- return the first non-null argument (variable-length argument lists are new in Seq 2020.1; accepts two arguments in Seq 5)DateTime(s)
,TimeSpan(s)
- parse date and time strings into their internal numeric representationsLength(o)
- the length of a string or arraynow()
- the current time, as an internal numeric timestampSubstring(s, start, count)
,StartsWith(s, substring)
,EndsWith(s, substring)
,IndexOf(s, substring)
,LastIndexOf(s, substring)
- extract and look for substrings within stringsToIsoString(t)
- convert an internal numeric timestamp into an ISO 8601 string
Seq 2020.1 adds some new built-in functions:
Keys(o)
,Values(o)
- return an array of all keys or values in an objectToJson(o)
,FromJson(s)
- convert to and from literal JSONToUpper(s)
,ToLower(s)
- change the case of a string
Collection operations
Structured log data often includes array- or object-like values that are interesting for search and analysis.
Seq has a few convenient shortcuts that make it pleasant to deal with these. First up, a SQL-style in
operator makes it easy to match values in a literal array:
@Level in ['Warning', 'Error']
Or in an array attached to a log event:
'seq' in Post.Tags
The in
operator supports the ci
modifier, but is otherwise limited in the kinds of comparisons that can be performed. To check whether any tag in Post.Tags
starts with the literal 'seq'
would be a job for lambda expressions or loops in most languages. Remembering lambda or loop syntax when you're rushing to figure out why production is showing the fail-whale isn't fun, so instead Seq's query language implements wildcards in []
indexer operations on arrays and objects:
Post.Tags[?] like 'seq%'
The ?
wildcard is read as "any", while *
is "all". Wildcards come into their own when working with nested data.
Order.Shipments[*].Items[*].TaxAmount > 0
The expression above finds events with an Order
property where all items in all shipments were taxable.
Conditionals (new in Seq 2020.1)
Seq supports conditional expressions using if
/then
/else
syntax:
if Quantity = 0 then 'None' else 'Plenty'
Conditionals can be chained:
if Quantity = 0 then 'None' else if Quantity < 20 'Some' else 'Plenty'
Conditionals more commonly appear in queries than in search expressions.
Queries
We've seen how search expressions can be used to match events. The more sophisticated mode of interaction with Seq is by using queries to analyze structured logs.
All queries in Seq begin with the keyword select
, and all of them produce a result with rows and columns.
Seq is more than calcluator, though! 😄
Projections
The simplest useful queries project a list of columns out from event properties. Let's look the the fifty slowest API responses from a web app:
If you're following along and typing this into the Seq query editor,
Ctrl + Enter
will insert newlines.
The column list following select
supports aliases, for example RequestPath as endpoint
, and complex expressions, for example Substring(RequestPath, 0, IndexOf(RequestPath, '/')) as area
.
The from stream
directive specifies that the query reads from the stream of log events in the Seq event store.
The optional where
clause predicate supports complex expressions - everything we saw in the search expressions section above - with the exception that only SQL-style operators are allowed (and
vs. &&
), and text fragments are not allowed (so "operation timed out"
needs to be written as @Message like '%operation timed out%' ci
).
The order by
clause is optional, and supports both asc
ending and desc
ending orderings over the columns in the column list. Orderings can include expressions - coalesce(StatusCode, 500)
— but columns that are themselves complex expressions need to be aliased in order to refer to them in orderings.
The final clause, limit
, is also optional - but Seq will reject the query if its internal default row limit is hit, so it's necessary when queries produce large result sets, and to conserve server resources and network bandwith. The limit
clause works just like top
, if you're familiar with the Microsoft SQL Server way of doing things.
We're running this example over the "HTTP requests" signal so the result set is limited to consider only events that are matched by the signal.
Aggregations and grouping
Digging deeper into web application response times, our "top fifty" list is useful for finding outliers, but it's hard to tell whether the slow responses we're looking at are representative of real user experience on our site, or just freak occurrences caused by transient issues.
Finding trends in large numbers of events is a task for aggregate functions like count()
, min()
, max()
, mean()
and percentile()
. You can compute aggregates over the whole stream of events being examined, for example:
select count(*)
from stream
where Elapsed > 1000
In this case, it's more useful to see results grouped by the API endpoint that's being hit, so we'll look at 99th percentile response times and use group by
to slice the event stream up:
The result set has three columns, despite there being only one in the select
column list: groupings are also columns in the Seq query language, and event support labels (group by RawUrl as url
, new in 6.0), making exploratory log analysis quicker and more fluid.
If the result set is large, a limit
or having
clause — for example having p99th > 1000
— can trim it down and save some bandwidth.
Aggregate functions
The built-in fuctions we saw earlier operate on single values. Aggregate functions operate on sets of values; Seq provides:
any(p)
andall(p)
- evaluate totrue
if the argument predicate evaluates totrue
for any/all events in the setcount(p)
andcount(*)
- thecount
aggregation can be used to count all events (*
), or just those for which the argument predicate evaluates totrue
, for example `count(@Level = 'Error')distinct(e)
- collect unique values of the expressione
; a special formcount(distinct(e))
returns the count of distinct values rather than the values themselvesmean(e)
- the arithmetic mean,sum(e) / count(*)
; in aggregations likemean()
that operate on numbers, only numeric values fore
contribute to the result (ife
, for example, evaluates to a string, then it contributes neither to the sum nor the count of values)min(e)
,max(e)
- minimum and maximum values of an expressionpercentile(e, p)
- a nearest-rank percentile, which computes the value of expressione
under whichp
percent of value fallsum(e)
- the sum of values ofe
first(e)
,last(e)
- the first and last values ofe
within the set
A little aside that shows how the Seq query language is evolving: in Seq 5, an additional aggregate function names()
was needed to list all of the property names present on events in the stream.
In Seq 2020.1 the Keys()
function means this can be expressed succinctly using more fundamental language features:
Timeseries
Our query above shows the slowest responses over a time range. If we want to see how responses evolve over time, we can generate a timeseries by further grouping over time intervals with a time()
grouping.
The time()
grouping accepts a duration and groups events into to buckets of that size:
You can chart timeseries results in the Seq user interface by pressing the tiny chart button above the result set. If you group by more than one dimension, just remember to make the time()
grouping last:
Learning more
Well, that's certainly a lot to digest! You might have a few questions..........
While there's detailed documentation over on docs.datalust.co, helping Seq users to write clearer and more efficient queries is one of our favorite things to do! 😊 We lurk on the datalust/seq
Gitter channel, watch the seq-logging
tag on Stack Overflow, and also field questions here on the blog, on the Seq issue tracker, the Seq community forum and through our support address support@datalust.co
.
If you need a hand coming to grips with the Seq query language, get in touch, it would be great to hear from you.
We've adapted this article for the Seq documentation, and we'll keep that version up-to-date through future Seq releases.
Happy logging!