Storage changes in Seq 6.0.3403-pre

One of the first major internal changes planned for Seq 6 just landed in the latest preview, 6.0.3403-pre.

Seq has historically written events into seven-day "extents": completely independent on-disk databases that cover a discrete range of the event store.

Extents played a crucial role in Seq's evolution through multiple different storage formats by allowing old and new formats to coexist side-by-side. This enabled the introduction of a new native storage engine in Seq 5, and we've had such great feedback on its stability and performance that we're continuing to invest deeply in it for Seq 6 and beyond.

Two of the major improvements we want to make in Seq 6 - built-in support for high availability, and deeper query processing support in the new storage engine - are pushing us away from splitting the event store into extents:

  • For HA, it's much cleaner to manage replica state for a single physical store, than for a large number of historical seven-day extents.
  • For our query engine, using a single store makes it possible to cache query information on the Rust side of the codebase, and more efficiently execute queries over long time ranges.

The latest Seq 6 preview automatically migrates all existing data from Extents/* under the Seq storage root to a single store in Stream/stream.flare. The design of the native storage engine means that this (potentially) enormous merge operation can be completed efficiently, since the format breaks stream data up into discrete files that can be linked into the single, larger store.

For example, a Seq 5 event store using the native storage engine will be laid out in a scheme resembling:

Extents/
    2020-01-01_2020-01-08/
        extent.1.metadata
        extent.08d704c98d31...39441bf.index
        extent.08d704c98d31...5953aa2a.span
        extent.flare
        ...
    2020-01-08_2020-01-15/
        extent.f27980337898...7e890bc2.tick
        extent.flare
        ...

Note the timestamped folders per seven-day extent. After the merge, a single stream will be created, resembling:

Stream/
    stream.1.metadata
    stream.08d704c98d31...39441bf.index
    stream.08d704c98d31...5953aa2a.span
    stream.f27980337898...7e890bc2.tick
    stream.flare
    ...

Most of the merge operation is accomplished efficiently using file moves and renames, although it's still necessary to update the store metadata. The migration process is fast enough to execute during server start-up and shouldn't cause noticeable downtime.

Events recorded by Seq 4 and earlier versions will still use the older ESENT storage format. When data resides in ESENT-backed extents the upgrade process is still automatic, but executes in the background while the server accepts new events and serves queries over the newer data.

The complete process looks like:

  1. At start-up, efficiently migrate extents using the native storage engine.
  2. If the most recent extent is ESENT-based, migrate it eagerly during start-up so that ingestion can begin as soon as possible; this may take several minutes if there's a lot of data to move, or if disk I/O is slow.
  3. Accept events/serve queries over new data, while remaining ESENT data is migrated in the background. A message will show in the Seq navigation bar while this process executes, and a service restart will be requested when it completes.
  4. On restart, merge in the migrated data, then start up normally.
  5. During the next indexing cycle, optimize storage and index the newly-migrated data. Queries over the migrated data will be slower until this process completes.

Start-up delays can be avoided by ensuring that the Seq instance has run version 5.x for at least a week before upgrading to Seq 6. If you're still running Seq version 4 (or earlier!) now is a great time to upgrade.

We've just migrated our own long-running Seq instance (with data stretching back through March 2014) and with a couple of hours of background processing, everything came over perfectly, and queries over signal indexes have made a huge difference to the performance of the machine. The final merge step took about an hour because of the large number of intermediate files generated in the conversion from ESENT; we'll have this optimized out in the next preview.

As you can tell, there are a few moving parts involved in the migration process. We're releasing this preview so that we can collect feedback on the experience and make sure the process is as smooth as possible for the final release version.

If you have an opportunity to try Seq 6, we'd love to hear how you go. You can grab the new version from datalust.co/download Happy logging!

Nicholas Blumhardt

Read more posts by this author.