The concept of log event "levels" is ubiquitous, appearing in practically every application logging library and language. We have an intuitive sense of what familiar levels like
error mean, but there's surprisingly little useful advice out there on how to use levels effectively. Here's one tip based on what I've learned building systems, and helping people to "log better" these last eight years:
Enable the same logging levels in development as you do in production.
The usual advice
Logging levels are usually considered to be in order of importance: turn on "unimportant" levels in development (
debug and the like), but enable only the "most important" levels (
error, etc.) in production, where resources like CPU time and disk space are precious.
We're led in this direction all the time; I started penning this post when I created a new .NET application (
dotnet new web) and realized that even the default
appsettings.Development.json implies that you'll want to set different levels for development and production:
Development-time overrides in the .NET 6 web application template.
There are several variations on this theme — turning off events in some environments, on in others — but for most situations, I think they're all similarly wrong.
The vital role of
To get a clear picture of why I steer away from this approach to logging levels, let's focus on the
info level for a moment. At first, it seems a bit of an anomaly, since it's neutral: it doesn't say anything particular about the events it's attached to, the way
As a developer, what does
info mean to me? This is the minimal information I need to understand the state of this system. What's the system actually doing right now? Information-level logs tell me that.
Fair? So then…
In development, should I run with more detailed logs so that I can watch how the app is executing, as I build it?
❌ Probably not: if you can't follow the internal state of the system using the logs you'll get in production, add more or better information-level logging. You'll be glad for having invested the extra effort the very next time you have a production issue to resolve.
The inverse, in production, should I turn off the
info level to save some (network | CPU | storage) capacity?
❌ Probably not: those events exist to make the system's state observable. When things go wrong you'll want to have captured those logs to minimize time-to-recovery.
So what are the roles of
The levels generally considered "above"
info are flags that make consuming a log stream easier. They're not something to turn on or off: for someone tasked with monitoring an app or debugging a problem, events tagged with these levels have special meaning and stand out as likely starting points for investigation.
And what are the roles of
These are the levels of last resort. Highly-detailed logging can be useful at development time when all other avenues (information-level logs, assertions, debugging, unit testing) have failed. Levels "below"
debug can be handy in rare circumstances, but for me this is far from the norm.
In production it's unlikely that logs can be collected at this level without annhiliating performance or eating up storage - so they're not as useful as they first appear. Don't get too attached to having debug logs available: in production, you probably won't. And the easiest way to avoid overattachment? Avoid using them at development time, too.
So then I need to very carefully consider what I log as
You got it 🙂.
There are lots of ways to improve the quality of informational log events without a big increase in storage or processing requirements.
- Structured log events in particular can carry additional properties very efficiently, leading to patterns like wide events.
- Bookended "starting x"/"finishing x" events can generally be replaced with a single "finishing x" that relies on exception handling to always run.
- Lots of events can be dropped out of information-level logs altogether, if the conditions they describe have other observable effects.
- Counters and summaries can help lift repetitive events out of tight loops.
Once you start really trying to write great information-level logs you'll find all kinds of low-hanging fruit.
But we need tera|peta|exa-scale!
If you manage a system at significant scale, you'll already have grown observability practices and strategies that work for you, no doubt more sophisticated than what we're talking about here. Levels are by no means the end of the story, and in busy systems you'll no doubt employ techniques like ingestion filters and source-specific level overrides to cut down noise. The important thing is to begin with well-designed, production-worthy logs that carry all the diagnostic information you need from your own applications and services.
If you find time to write about how you determine which events are worth recording and which are not, I'd love to read about it.
So should I use the same logging levels in development as in production?
Hope you find this info useful! (Pun only half-intended 😉.)