Rust web frameworks have subpar error reporting
- 3666 words
- 19 min
None of the major Rust web frameworks have a great error reporting story, according to
my personal definition of great.
I've been building production APIs in Rust for almost 6 years now, and I've been teaching people about 
backend development in Rust for almost as long: I've always had to tweak, work-around or 
actively fight the framework to get reliable and exhaustive error reporting.
Last year I bit the bullet and started building my own web framework, Pavex. I channeled my frustration into a different error reporting design. This post sums up the journey and the rationale behind it.
You can discuss this post on r/rust.
Table of contents
What are errors for?
So many different things can go wrong in a networked application: the database is down (or slow), the caller
sent invalid data, you ran out of file descriptors, etc.
Every time something goes wrong, two different concerns must be addressed: reacting and reporting.
Reacting
Whoever called your API is waiting for a response!
Your application needs to convert the error into a response, using a representation that the caller can understand.
For an HTTP API, this involves selecting the most appropriate status code (e.g. 500 Internal Server Error or 
400 Bad Request) and, if required, a more detailed error message in the body (e.g. an explanation of which
field was invalid and why).
Reporting
At the same time, as an operator (i.e. the person responsible for keeping the application up and running), you need to have a mechanism to know that an error occurred. For example, you might track the percentage of 5xx errors to page an on-call engineer if it goes above a pre-defined threshold.
Knowing that an error occurred is not enough though: you need to know what went wrong.
When that engineer gets paged, or when you get to work in the morning, there has to be enough information
to troubleshoot the issue.
Modelling errors in Rust
Rust has two ways to model failures: panics and Result.
Panics are primarily used for unrecoverable errors, so I won't discuss them much here—you need to recover and
send a response! Let's focus on Result instead.
Result is a type, an enum.
It has two variants: success (Ok) or failure (Err).
When a function can fail, it shows in its signature: it uses a Result as its return type.
There's a lot to be said about good error design as a prerequisite to good error reporting, but that'd be too much of a detour. If you want to learn more about error design, check out this previous post of mine—it builds on the same principles.
The Error trait
There are no constraints on the type of the Err variant, but it's a good practice to use a type that implements
the std::error::Error trait.
std::error::Error is the cornerstone of Rust's error handling story.
It requires error types to:
- Implement the Displaytrait, as its user-facing representation
- Implement the Debugtrait, as its operator-facing representation
- Provide a way to access the source of the error, if any
The last point is particularly important: error types are often wrappers around lower-level errors.
For example, a database connection error might be caused by a network error, which is in turn caused by a DNS resolution
issue. When troubleshooting, you want to be able to drill down into the chain of causes.
You can't fix that database connection error if your logs don't show that it was caused by a DNS resolution
issue in the first place!
Our benchmark
High-level requirements
Let's set some expectations to properly "benchmark" the error reporting story of different web frameworks.
At a high level, we want the following:
- All errors are logged, exactly once, with enough information to troubleshoot
- With a single log line, we can tell:
- If the request failed
- What error occurred
- What caused the error
 
It should be possible to ensure that these requirements are met with minimum room for error—it shouldn't be possible to forget to log an error, or to log it in a way that's inconsistent with the rest of the application.
I consider this the bare minimum telemetry setup for a production-grade application. I don't expect a web framework to provide this experience out of the box (although it'd be nice!), but I do expect it to provide the necessary hooks to build it myself.
Low-level requirements
We can convert this high-level specification into a set of concrete requirements:
- For every incoming request, there is an over-arching tracing::Spanthat captures the entire request lifecycle. I'll refer to this as the root span.
- Every time an error occurs, the application emits a tracingevent:
- For the error that was converted into the HTTP response returned to the caller, we capture:
I've been using tracing as the structured library of choice here,
but the same requirements can be expressed in terms of other logging libraries (and the framework should 
be able to integrate with them!).
Frameworks
I'll start by reviewing how Actix Web and axum, the two most popular web frameworks in the Rust ecosystem, fare against these
requirements1.
I'll then discuss Pavex's approach.
If you don't care about the details, you can skip to the conclusion to see how the frameworks compare.
axum
In axum, the following components can fail:
- Request handlers
- Extractors
- Middlewares/arbitrary towerservices
axum's overall error handling approach is detailed in their documentation.
I'll focus on request handlers and extractors, as they're the most common error sources in applications.
Request handlers
In axum, request handlers are asynchronous functions that return a type that implements 
the IntoResponse trait.
IntoResponse
IntoResponse is a conversion trait: it specifies how to convert a type into an HTTP response.
pub trait IntoResponse {
    fn into_response(self) -> Response<Body>;
}
Result implements IntoResponse, as long as
both the Ok and Err variants do.
Once IntoResponse::into_response has been called (by the framework), the type is gone—self is consumed.
From an error reporting perspective, this means that you can't manipulate the error anymore.
Extractors
Extractors are axum's dependency injection mechanism.
They're used to extract data from the request (e.g. the query string, the request body, etc.) or to reject
the request if it doesn't meet certain criteria (e.g. it's missing an Authorization header).
You define an extractor by implementing either the FromRequest
or the FromRequestParts traits.
// Slightly simplified for exposition purposes
pub trait FromRequest<S>: Sized {
    /// If the extractor fails it'll use this "rejection" type. A rejection is
    /// a kind of error that can be converted into a response.
    type Rejection: IntoResponse;
    /// Perform the extraction.
    async fn from_request(req: Request, state: &S) -> Result<Self, Self::Rejection>;
}
Error handling works similarly to request handlers: if the extractor fails, it must return an error type that implements
IntoResponse. Therefore, it suffers from the same limitations.
Can axum meet our requirements?
axum provides no mechanism to execute logic between the request handler returning an error, and that very same
error being converted into an HTTP response via IntoResponse::into_response.
The same is true for extractors.
If you want to log errors, you must do it:
- In your request handler/extractor
- Inside the IntoResponseimplementation
Neither is ideal.
You don't have a single place where the logging logic lives2. 
You end up with log statements spread out across the entire codebase.
It's easy for an error to slip through the cracks, unlogged, or for logging logic to evolve inconsistently over time.
Things get worse if you use error types defined in other crates—you can't add logging to their IntoResponse implementation,
nor customize it if it's there. Perhaps they are emitting a tracing error event, but they aren't using the same field names or they aren't 
recording the source chain.
Out of the box, axum comes quite short of meeting the telemetry requirements I laid down. 
You can try to implement some mitigation strategies, described below, but neither is bullet-proof.
Workaround #1
You can try to wrap3 all your errors with a single custom error type (e.g. ErrorLogger<E>). 
You then implement IntoResponse for the wrapper and add the logging logic there.
This still isn't a bulletproof solution:
- You may forget to wrap one of your errors with the custom error wrapper.
- You can no longer use extractors defined in other crates (including axumitself!). You need to wrap all third-party extractors to ensure they return a wrapped error.
This workaround, even if applied correctly, would still fail to meet all our requirements: 
from inside IntoResponse you can't access extractors, therefore
you have no way to reliably access the root span for the current request and attach error details to it.
Workaround #2
Later edit: this approach was suggested in the r/rust comment section.
The approach above can be refined using Response's extensions.
You still need to wrap all errors with a custom wrapper,
but you don't eagerly log the error inside IntoResponse. 
You instead store4 the error in the extensions attached to the Response. 
A logging middleware then tries to extract the error type from the extensions to log it.
The middleware can access the root span, coming closer to meeting our requirements.
The underlying challenges remain unresolved: there is no reliable way to ensure you wrapped 
all errors and you need to wrap all third-party extractors, including those defined in axum itself. 
Actix Web
In Actix Web, the following components can fail:
- Request handlers
- Extractors
- Middlewares/arbitrary Service implementations
Actix Web's overall error handling approach is detailed on their website.
Just like with axum, I'll focus on request handlers and extractors,
as they're the most common error sources in applications.
Request handlers
In Actix Web, request handlers are asynchronous functions that return a type that implements
the Responder trait.
Responder
pub trait Responder {
    type Body: MessageBody + 'static;
    // Required method
    fn respond_to(self, req: &HttpRequest) -> HttpResponse<Self::Body>;
}
Responder is a conversion trait: it specifies how to convert a type into an HTTP response.
Just like axum's IntoResponse, once Responder::respond_to has been called (by the framework), 
the type is gone—self is consumed.
Result implements Responder, as long as:
- the Okvariant implementsResponder
- the Errvariant implements theResponseErrortrait
ResponseError
ResponseError is another conversion trait, specialised for errors—it provides a cheap way
to check the status code of the resulting response without having to build it wholesale.
pub trait ResponseError: Debug + Display {
    fn status_code(&self) -> StatusCode;
    fn error_response(&self) -> HttpResponse<BoxBody>;
}
Notice one key detail: neither status_code nor error_response consume self. They both take a reference to the
error type as input. You might be thinking: "It doesn't matter, Responder::respond_to consumes self anyway, so we can't
log the error anymore!"
But here comes the twist: HttpResponse::error
HttpResponse::error
In Actix Web, when an HttpResponse is built from an error 
(via HttpResponse::from_error),
the error is stored as part of the response. You can still access the error after the response has been built!
Extractors
In Actix Web, extractors are types that implement the FromRequest.
In terms of error handling, they work similarly to request handlers:
if the extractor fails, it must return an error type that can be converted into actix_web::Error
which is in turn converted into HttpResponse via its ResponseError implementation.
Can Actix Web meet our requirements?
Almost.
You can write an Actix Web middleware that checks if the current response bundles an error and, if so, log it.
That's exactly what I did in tracing-actix-web.
tracing-actix-web was indeed built to meet the requirements I set at the beginning of this post, but
it falls short: only the last error is going to be logged.
You can see why that's the case by following this scenario:
- A request handler returns an error
- The error is converted into an HTTP response and stored in the response
- The response passes through an unrelated middleware, which fails5 and builds a new response from the new error
- The logging middleware sees the final response and logs the last error
The logging middleware never gets a chance to see the first error since the corresponding response has been thrown away. This is unfortunately a fundamental limitation of Actix Web's current error handling design.
Pavex
Pavex is a new web framework I'm building. It's currently going through a private beta, but you can find the documentation here.
In Pavex, the following components can fail:
- Request handlers
- Constructors (i.e. our equivalent of extractors)
- Middlewares
You can find a detailed overview of the error handling story in the documentation.
Error requirements
There is only one requirement for errors in Pavex:
it must be possible to convert them into a pavex::Error via pavex::Error::new.
All errors that implement the std::error::Error trait can be converted into a pavex::Error,
as well as some other types
that can't implement it directly—e.g. anyhow::Error or
eyre::Report.
IntoResponse
Pavex, just like Actix Web and axum, has a conversion trait that specifies how to convert a type into an HTTP response:
IntoResponse.
pub trait IntoResponse {
    // Required method
    fn into_response(self) -> Response;
}
There's a key difference though: IntoResponse is not implemented for Result.
Error handlers
To convert an error into an HTTP response, you must register an error handler.
use pavex::blueprint::router::POST;
use pavex::blueprint::Blueprint;
use pavex::f;
pub fn blueprint() -> Blueprint {
    let mut bp = Blueprint::new();
    // The `handler` for the `/login` route returns a `Result`
    bp.route(POST, "/login", f!(crate::core::handler))
        // We specify which function should be called to
        // convert the error into an HTTP response
        .error_handler(f!(crate::core::login_error2response));
    // [...]
}
An error handler is a function or method that takes a reference to the error type and returns a type that implements
IntoResponse.
use pavex::http::StatusCode;
pub async fn login_error2response(e: &LoginError) -> StatusCode  {
    match e {
        LoginError::InvalidCredentials => StatusCode::UNAUTHORIZED,
        LoginError::DatabaseError => StatusCode::INTERNAL_SERVER_ERROR,
    }
}
Error observers
After Pavex has generated an HTTP response from the error, using the error handler you registered, it converts 
your concrete error type into a pavex::Error and
invokes your error observers.
pub async fn log_error(e: &pavex::Error) {
    tracing::error!("An error occurred: {}", e);
}
An error observer is a function or method that takes a reference to pavex::Error as input 
and returns nothing.
They are designed for error reporting—e.g. you can use them to log errors, increment a metric counter, etc.
You can register as many error observers as you want, and they will all be invoked in the order they were registered:
use pavex::blueprint::router::POST;
use pavex::blueprint::Blueprint;
use pavex::f;
pub fn blueprint() -> Blueprint {
    let mut bp = Blueprint::new();
    bp.error_observer(f!(crate::core::log_error));
    // [...]
}
Can Pavex meet our requirements?
Yes!
Pavex invokes error observers for every error that occurs—by construction, you simply can't forget an error 
along the way.
Error observers can take advantage of dependency injection, therefore they access the root span for the current request
and attach error details to it.
That's exactly what happens in the starter project generated by pavex new, using the following error observer:
pub async fn log_error(e: &pavex::Error, root_span: &RootSpan) {
    let source_chain = error_source_chain(e);
    // Emit an error event
    tracing::error!(
        error.msg = %e,
        error.details = ?e,
        error.source_chain = %source_chain,
        "An error occurred during request handling",
    );
    // Attach the error details to the root span
    // If multiple errors occur, the details of the last one will "prevail"
    root_span.record("error.msg", tracing::field::display(e));
    root_span.record("error.details", tracing::field::debug(e));
    root_span.record("error.source_chain", error_source_chain(e));
}
That's all you need to meet the requirements I set at the beginning of this post.
No workarounds, no sharp edges, no corner cases.
Conclusion
It is not possible to fully and reliably satisfy our telemetry requirements with 
either axum nor Actix Web.
Actix Web comes much closer though: that's why I still recommend Actix Web over axum when people ask me for advice
on which Rust web framework to use for their next project.
Solid error reporting is that important to me.
Pavex, on the other hand, easily meets all the requirements.
It's not a coincidence: I've been building it with these requirements in mind from day one, making error reporting 
a first-class concern. I'm confident to say that, right now, Pavex has the best error
reporting story in the Rust web ecosystem.
Nonetheless, there is no intrinsic limitation preventing Actix Web or axum from converging to a similar design (or 
perhaps a new one!) to resolve the issues I've highlighted in this post.
I sincerely hope that happens—the main advantage of having different frameworks is the constant cross-pollination
of ideas and the pressure to improve.
You can discuss this post on r/rust.
Subscribe to the newsletter if you want to be notified when a post is published!
You can also follow the development of Pavex on GitHub.
Footnotes
I originally wanted to include Rocket in this comparison, but I quickly realised that it doesn't 
provide enough hooks to even wrap a tracing::Span around the request-handling Future.
That's a prerequisite to a correct implementation of structured logging, there's no point in going further 
without it.
If you're using TraceLayer, from tower_http, you might be wondering: isn't that enough? Isn't that
the single place? Unfortunately, TraceLayer::on_failure doesn't get to see the error, it only looks at the response
generated by the error! 
There's another variation of this approach: you return the same error type (e.g. ApiError) from
all your extractors, request handlers and middlewares. The two approaches are fundamentally equivalent.
What's stored inside Extensions has to
be clonable. This can be solved by wrapping the original error inside an Arc.
There's another issue with failures in Actix Web middlewares, but it'd take forever to get
into the details and explain it.
The TL;DR is that the invocation of the downstream portion of the middleware stack should return a Response
but it now returns a Result, creating a weird separate track for errors that's hard to integrate with the 
overall error handling story.