Rust web frameworks have subpar error reporting
- 3666 words
- 19 min
None of the major Rust web frameworks have a great error reporting story, according to
my personal definition of great.
I've been building production APIs in Rust for almost 6 years now, and I've been teaching people about
backend development in Rust for almost as long: I've always had to tweak, work-around or
actively fight the framework to get reliable and exhaustive error reporting.
Last year I bit the bullet and started building my own web framework, Pavex. I channeled my frustration into a different error reporting design. This post sums up the journey and the rationale behind it.
You can discuss this post on r/rust.
Table of contents
What are errors for?
So many different things can go wrong in a networked application: the database is down (or slow), the caller
sent invalid data, you ran out of file descriptors, etc.
Every time something goes wrong, two different concerns must be addressed: reacting and reporting.
Reacting
Whoever called your API is waiting for a response!
Your application needs to convert the error into a response, using a representation that the caller can understand.
For an HTTP API, this involves selecting the most appropriate status code (e.g. 500 Internal Server Error
or
400 Bad Request
) and, if required, a more detailed error message in the body (e.g. an explanation of which
field was invalid and why).
Reporting
At the same time, as an operator (i.e. the person responsible for keeping the application up and running), you need to have a mechanism to know that an error occurred. For example, you might track the percentage of 5xx errors to page an on-call engineer if it goes above a pre-defined threshold.
Knowing that an error occurred is not enough though: you need to know what went wrong.
When that engineer gets paged, or when you get to work in the morning, there has to be enough information
to troubleshoot the issue.
Modelling errors in Rust
Rust has two ways to model failures: panics and Result
.
Panics are primarily used for unrecoverable errors, so I won't discuss them much here—you need to recover and
send a response! Let's focus on Result
instead.
Result
is a type, an enum.
It has two variants: success (Ok
) or failure (Err
).
When a function can fail, it shows in its signature: it uses a Result
as its return type.
There's a lot to be said about good error design as a prerequisite to good error reporting, but that'd be too much of a detour. If you want to learn more about error design, check out this previous post of mine—it builds on the same principles.
The Error
trait
There are no constraints on the type of the Err
variant, but it's a good practice to use a type that implements
the std::error::Error
trait.
std::error::Error
is the cornerstone of Rust's error handling story.
It requires error types to:
- Implement the
Display
trait, as its user-facing representation - Implement the
Debug
trait, as its operator-facing representation - Provide a way to access the source of the error, if any
The last point is particularly important: error types are often wrappers around lower-level errors.
For example, a database connection error might be caused by a network error, which is in turn caused by a DNS resolution
issue. When troubleshooting, you want to be able to drill down into the chain of causes.
You can't fix that database connection error if your logs don't show that it was caused by a DNS resolution
issue in the first place!
Our benchmark
High-level requirements
Let's set some expectations to properly "benchmark" the error reporting story of different web frameworks.
At a high level, we want the following:
- All errors are logged, exactly once, with enough information to troubleshoot
- With a single log line, we can tell:
- If the request failed
- What error occurred
- What caused the error
It should be possible to ensure that these requirements are met with minimum room for error—it shouldn't be possible to forget to log an error, or to log it in a way that's inconsistent with the rest of the application.
I consider this the bare minimum telemetry setup for a production-grade application. I don't expect a web framework to provide this experience out of the box (although it'd be nice!), but I do expect it to provide the necessary hooks to build it myself.
Low-level requirements
We can convert this high-level specification into a set of concrete requirements:
- For every incoming request, there is an over-arching
tracing::Span
that captures the entire request lifecycle. I'll refer to this as the root span. - Every time an error occurs, the application emits a
tracing
event: - For the error that was converted into the HTTP response returned to the caller, we capture:
I've been using tracing
as the structured library of choice here,
but the same requirements can be expressed in terms of other logging libraries (and the framework should
be able to integrate with them!).
Frameworks
I'll start by reviewing how Actix Web and axum
, the two most popular web frameworks in the Rust ecosystem, fare against these
requirements1.
I'll then discuss Pavex's approach.
If you don't care about the details, you can skip to the conclusion to see how the frameworks compare.
axum
In axum
, the following components can fail:
- Request handlers
- Extractors
- Middlewares/arbitrary
tower
services
axum
's overall error handling approach is detailed in their documentation.
I'll focus on request handlers and extractors, as they're the most common error sources in applications.
Request handlers
In axum
, request handlers are asynchronous functions that return a type that implements
the IntoResponse
trait.
IntoResponse
IntoResponse
is a conversion trait: it specifies how to convert a type into an HTTP response.
pub trait IntoResponse {
fn into_response(self) -> Response<Body>;
}
Result
implements IntoResponse
, as long as
both the Ok
and Err
variants do.
Once IntoResponse::into_response
has been called (by the framework), the type is gone—self
is consumed.
From an error reporting perspective, this means that you can't manipulate the error anymore.
Extractors
Extractors are axum
's dependency injection mechanism.
They're used to extract data from the request (e.g. the query string, the request body, etc.) or to reject
the request if it doesn't meet certain criteria (e.g. it's missing an Authorization
header).
You define an extractor by implementing either the FromRequest
or the FromRequestParts
traits.
// Slightly simplified for exposition purposes
pub trait FromRequest<S>: Sized {
/// If the extractor fails it'll use this "rejection" type. A rejection is
/// a kind of error that can be converted into a response.
type Rejection: IntoResponse;
/// Perform the extraction.
async fn from_request(req: Request, state: &S) -> Result<Self, Self::Rejection>;
}
Error handling works similarly to request handlers: if the extractor fails, it must return an error type that implements
IntoResponse
. Therefore, it suffers from the same limitations.
Can axum
meet our requirements?
axum
provides no mechanism to execute logic between the request handler returning an error, and that very same
error being converted into an HTTP response via IntoResponse::into_response
.
The same is true for extractors.
If you want to log errors, you must do it:
- In your request handler/extractor
- Inside the
IntoResponse
implementation
Neither is ideal.
You don't have a single place where the logging logic lives2.
You end up with log statements spread out across the entire codebase.
It's easy for an error to slip through the cracks, unlogged, or for logging logic to evolve inconsistently over time.
Things get worse if you use error types defined in other crates—you can't add logging to their IntoResponse
implementation,
nor customize it if it's there. Perhaps they are emitting a tracing
error event, but they aren't using the same field names or they aren't
recording the source chain.
Out of the box, axum
comes quite short of meeting the telemetry requirements I laid down.
You can try to implement some mitigation strategies, described below, but neither is bullet-proof.
Workaround #1
You can try to wrap3 all your errors with a single custom error type (e.g. ErrorLogger<E>
).
You then implement IntoResponse
for the wrapper and add the logging logic there.
This still isn't a bulletproof solution:
- You may forget to wrap one of your errors with the custom error wrapper.
- You can no longer use extractors defined in other crates (including
axum
itself!). You need to wrap all third-party extractors to ensure they return a wrapped error.
This workaround, even if applied correctly, would still fail to meet all our requirements:
from inside IntoResponse
you can't access extractors, therefore
you have no way to reliably access the root span for the current request and attach error details to it.
Workaround #2
Later edit: this approach was suggested in the r/rust comment section.
The approach above can be refined using Response
's extensions.
You still need to wrap all errors with a custom wrapper,
but you don't eagerly log the error inside IntoResponse
.
You instead store4 the error in the extensions attached to the Response
.
A logging middleware then tries to extract the error type from the extensions to log it.
The middleware can access the root span, coming closer to meeting our requirements.
The underlying challenges remain unresolved: there is no reliable way to ensure you wrapped
all errors and you need to wrap all third-party extractors, including those defined in axum
itself.
Actix Web
In Actix Web, the following components can fail:
- Request handlers
- Extractors
- Middlewares/arbitrary Service implementations
Actix Web's overall error handling approach is detailed on their website.
Just like with axum
, I'll focus on request handlers and extractors,
as they're the most common error sources in applications.
Request handlers
In Actix Web, request handlers are asynchronous functions that return a type that implements
the Responder
trait.
Responder
pub trait Responder {
type Body: MessageBody + 'static;
// Required method
fn respond_to(self, req: &HttpRequest) -> HttpResponse<Self::Body>;
}
Responder
is a conversion trait: it specifies how to convert a type into an HTTP response.
Just like axum
's IntoResponse
, once Responder::respond_to
has been called (by the framework),
the type is gone—self
is consumed.
Result
implements Responder
, as long as:
- the
Ok
variant implementsResponder
- the
Err
variant implements theResponseError
trait
ResponseError
ResponseError
is another conversion trait, specialised for errors—it provides a cheap way
to check the status code of the resulting response without having to build it wholesale.
pub trait ResponseError: Debug + Display {
fn status_code(&self) -> StatusCode;
fn error_response(&self) -> HttpResponse<BoxBody>;
}
Notice one key detail: neither status_code
nor error_response
consume self
. They both take a reference to the
error type as input. You might be thinking: "It doesn't matter, Responder::respond_to
consumes self
anyway, so we can't
log the error anymore!"
But here comes the twist: HttpResponse::error
HttpResponse::error
In Actix Web, when an HttpResponse
is built from an error
(via HttpResponse::from_error
),
the error is stored as part of the response. You can still access the error after the response has been built!
Extractors
In Actix Web, extractors are types that implement the FromRequest
.
In terms of error handling, they work similarly to request handlers:
if the extractor fails, it must return an error type that can be converted into actix_web::Error
which is in turn converted into HttpResponse
via its ResponseError
implementation.
Can Actix Web meet our requirements?
Almost.
You can write an Actix Web middleware that checks if the current response bundles an error and, if so, log it.
That's exactly what I did in tracing-actix-web
.
tracing-actix-web
was indeed built to meet the requirements I set at the beginning of this post, but
it falls short: only the last error is going to be logged.
You can see why that's the case by following this scenario:
- A request handler returns an error
- The error is converted into an HTTP response and stored in the response
- The response passes through an unrelated middleware, which fails5 and builds a new response from the new error
- The logging middleware sees the final response and logs the last error
The logging middleware never gets a chance to see the first error since the corresponding response has been thrown away. This is unfortunately a fundamental limitation of Actix Web's current error handling design.
Pavex
Pavex is a new web framework I'm building. It's currently going through a private beta, but you can find the documentation here.
In Pavex, the following components can fail:
- Request handlers
- Constructors (i.e. our equivalent of extractors)
- Middlewares
You can find a detailed overview of the error handling story in the documentation.
Error requirements
There is only one requirement for errors in Pavex:
it must be possible to convert them into a pavex::Error
via pavex::Error::new
.
All errors that implement the std::error::Error
trait can be converted into a pavex::Error
,
as well as some other types
that can't implement it directly—e.g. anyhow::Error
or
eyre::Report
.
IntoResponse
Pavex, just like Actix Web and axum
, has a conversion trait that specifies how to convert a type into an HTTP response:
IntoResponse
.
pub trait IntoResponse {
// Required method
fn into_response(self) -> Response;
}
There's a key difference though: IntoResponse
is not implemented for Result
.
Error handlers
To convert an error into an HTTP response, you must register an error handler.
use pavex::blueprint::router::POST;
use pavex::blueprint::Blueprint;
use pavex::f;
pub fn blueprint() -> Blueprint {
let mut bp = Blueprint::new();
// The `handler` for the `/login` route returns a `Result`
bp.route(POST, "/login", f!(crate::core::handler))
// We specify which function should be called to
// convert the error into an HTTP response
.error_handler(f!(crate::core::login_error2response));
// [...]
}
An error handler is a function or method that takes a reference to the error type and returns a type that implements
IntoResponse
.
use pavex::http::StatusCode;
pub async fn login_error2response(e: &LoginError) -> StatusCode {
match e {
LoginError::InvalidCredentials => StatusCode::UNAUTHORIZED,
LoginError::DatabaseError => StatusCode::INTERNAL_SERVER_ERROR,
}
}
Error observers
After Pavex has generated an HTTP response from the error, using the error handler you registered, it converts
your concrete error type into a pavex::Error
and
invokes your error observers.
pub async fn log_error(e: &pavex::Error) {
tracing::error!("An error occurred: {}", e);
}
An error observer is a function or method that takes a reference to pavex::Error
as input
and returns nothing.
They are designed for error reporting—e.g. you can use them to log errors, increment a metric counter, etc.
You can register as many error observers as you want, and they will all be invoked in the order they were registered:
use pavex::blueprint::router::POST;
use pavex::blueprint::Blueprint;
use pavex::f;
pub fn blueprint() -> Blueprint {
let mut bp = Blueprint::new();
bp.error_observer(f!(crate::core::log_error));
// [...]
}
Can Pavex meet our requirements?
Yes!
Pavex invokes error observers for every error that occurs—by construction, you simply can't forget an error
along the way.
Error observers can take advantage of dependency injection, therefore they access the root span for the current request
and attach error details to it.
That's exactly what happens in the starter project generated by pavex new
, using the following error observer:
pub async fn log_error(e: &pavex::Error, root_span: &RootSpan) {
let source_chain = error_source_chain(e);
// Emit an error event
tracing::error!(
error.msg = %e,
error.details = ?e,
error.source_chain = %source_chain,
"An error occurred during request handling",
);
// Attach the error details to the root span
// If multiple errors occur, the details of the last one will "prevail"
root_span.record("error.msg", tracing::field::display(e));
root_span.record("error.details", tracing::field::debug(e));
root_span.record("error.source_chain", error_source_chain(e));
}
That's all you need to meet the requirements I set at the beginning of this post.
No workarounds, no sharp edges, no corner cases.
Conclusion
It is not possible to fully and reliably satisfy our telemetry requirements with
either axum
nor Actix Web.
Actix Web comes much closer though: that's why I still recommend Actix Web over axum
when people ask me for advice
on which Rust web framework to use for their next project.
Solid error reporting is that important to me.
Pavex, on the other hand, easily meets all the requirements.
It's not a coincidence: I've been building it with these requirements in mind from day one, making error reporting
a first-class concern. I'm confident to say that, right now, Pavex has the best error
reporting story in the Rust web ecosystem.
Nonetheless, there is no intrinsic limitation preventing Actix Web or axum
from converging to a similar design (or
perhaps a new one!) to resolve the issues I've highlighted in this post.
I sincerely hope that happens—the main advantage of having different frameworks is the constant cross-pollination
of ideas and the pressure to improve.
You can discuss this post on r/rust.
Subscribe to the newsletter if you want to be notified when a post is published!
You can also follow the development of Pavex on GitHub.
Footnotes
I originally wanted to include Rocket in this comparison, but I quickly realised that it doesn't
provide enough hooks to even wrap a tracing::Span
around the request-handling Future
.
That's a prerequisite to a correct implementation of structured logging, there's no point in going further
without it.
If you're using TraceLayer
, from tower_http
, you might be wondering: isn't that enough? Isn't that
the single place? Unfortunately, TraceLayer::on_failure
doesn't get to see the error, it only looks at the response
generated by the error!
There's another variation of this approach: you return the same error type (e.g. ApiError
) from
all your extractors, request handlers and middlewares. The two approaches are fundamentally equivalent.
What's stored inside Extensions
has to
be clonable. This can be solved by wrapping the original error inside an Arc
.
There's another issue with failures in Actix Web middlewares, but it'd take forever to get
into the details and explain it.
The TL;DR is that the invocation of the downstream portion of the middleware stack should return a Response
but it now returns a Result
, creating a weird separate track for errors that's hard to integrate with the
overall error handling story.