Exploding Collections 💥

TL;DR: Seq 2023.3 introduces the unnest set function and lateral cross joins, enabling natural queries over nested collections in log events.

Here's the essence of the problem: you log some basic information about order placement in your online coffee roasting service; the natural way to express that is for the "order placed" event to describe a collection of order items:

var orderId = "order-1234";
var orderItems = new[]
{
    // Name, Quantity
    new OrderItem("Dark Roast", 2),
    new OrderItem("Medium Roast", 1)
};

// Using Serilog: serialize the order items along with the order-placed event
Log.ForContext("OrderItems", orderItems, true)
    .Information("Order {OrderId} placed", orderId);

The resulting events carry OrderId and OrderItems properties like:

{
    "OrderId": "order-1234",
    "OrderItems": [
        { "Name": "Dark Roast", "Quantity": 2 },
        { "Name": "Medium Roast", "Quantity": 1 },
    ]
}

There are all sorts of questions we might want to ask about this data, along the lines of:

  1. Which orders included dark roast coffee?
  2. How much dark roast coffee was sold?
  3. Which orders included more than one of the same product?

Seq's query language has some limited support for lambda functions and collection wildcard comprehensions that go some way towards answering these, but as of Seq 2023.2 none could be expressed elegantly, and the latter two queries are not expressible at all.

As part of our committment to fully structured log data, we've added a new language feature in 2023.3 that makes this kind of query a breeze to write.

The unnest() set function

The first ingredient in expressing these queries is the unnest function, lifted from standard SQL.

In SQL, the unnest(arr) function accepts an array, arr, and returns a table with one row for every element in the array. Imagining unnest applied to the OrderItems array in an "order placed" event, the result will be something like:

unnest(OrderItems)
{ "Name": "Dark Roast", "Quantity": 2 }
{ "Name": "Medium Roast", "Quantity": 1 }

We've turned the elements of the array into rows. So far, so good?

Lateral cross joins

The next ingredient takes each log event in the stream, and joins it with the result of unnesting its OrderItems property.

This technique is called a "lateral cross join", and it's a little different from other join types you might be familiar with: instead of joining one whole rowset with another, a subquery is executed for each row in the left table, and the left row is repeated for every row in the right (subquery) table generated for the original row.

Lateral joins are a mouthful to explain, but the simple implementation behind the lateral keyword in Seq 2023.3 should be a bit more obvious in practice. Let's take a look at the three example queries we considered earlier.

1. Which orders included dark roast coffee?

This is the simplest of the three queries to write. Here it is, putting together lateral and unnest:

select OrderId
from stream
  lateral unnest(OrderItems) as Item
where Item.Name = 'Dark Roast'

And some results:

OrderId
"order-1234"
"order-7654"
"order-5678"

The first thing to notice is that lateral appears in the from clause, where you'd expect to find table-like and join-like things.

The second thing is that we give the result of the unnest(OrderItems) expression an alias, Item. You can think of this as the name of the column generated from the unnested array. Each value in that column will be a single element from OrderItems.

Third, the the rest of the properties on the underlying event are available in the select clause as before.

And finally, fourth, Item, the name of the column generated with unnest, can be used elsewhere in the query - the where clause in this example - just like any other property from the underlying event.

2. How much dark roast coffee was sold?

Impossible previously, this one is now fairly trivial, using the same type of query as above:

select sum(Item.Quantity)
from stream
  lateral unnest(OrderItems) as Item
where Item.Name = 'Dark Roast'

Results look like:

sum(Item.Quantity)
220

3. Which orders included more than one of the same product?

There are a few more moving parts to this one, because we need to group by both OrderId and Item.Name:

select sum(Item.Quantity) as Quantity
from stream
  lateral unnest(OrderItems) as Item
group by OrderId, Item.Name
having Quantity > 1

because Seq implicitly includes group key columns in results, this query produces a result like:

OrderId Item.Name sum(Item.Quantity)
"order-1234" "Dark Roast" 2
"order-5678" "Summer Special" 7

A short aside on query plans...

unnest() is the first significant language feature we've added since completely rewriting Seq's query engine for Seq 2023.

Being observability fanatics ourselves, one of the core goals of Seq 2023 was to make its internal workings easier for us to externalize, understand, and debug. By far the biggest win in this area is Seq's new explain keyword, which can be tacked on to the front of any query in order to print the computed query plan:

I've been eager to write about explain and the new query engine ever since we shipped it, but until that happens, I thought I'd include this brief pointer in case you're inclined to explore Seq's internals for yourself.

Query plans, as printed, are executed bottom-to-top: most often, a parallelized "layer" (pact is short for parallelization contract) will read from the persisted log records, perform some shaping and aggregation, and send results to a serial reducer layer that computes the final result - though this varies a lot depending on the query involved.

Query plans are rowset-oriented: data flowing between blocks and layers is represented as rows, and accessed using column accessors such as col(0), col(1), and so on.

The addition of unnest() really put this feature though its paces: explain queries uncovered a few bugs in early versions, and helped pinpoint several more as the feature came together.

Getting Seq 2023.3

Seq 2023.3 is ready to download at datalust.co/download and to pull from Docker Hub as datalust/seq.

If you have a perplexing collection-oriented query to write, or found a novel way to use unnest(), we'd love to hear from you!