Pavex, progress report #3: nested routes and borrow checking
- 2813 words
- 15 min
π Hi!
It's Luca here, the author of "Zero to production in Rust".
This is progress report aboutpavex
, a new Rust web framework that I have been working on. It is currently in the early stages of development, working towards its first alpha release.Check out the announcement post to learn more about the vision!
Personal updates
April was stressful.
I resigned from AWS, started planning my relocation (UK->Italy) while finalising the details of the renovations for the apartment we bought in Italy. Pretty intense!
Nonetheless, I managed to squeeze in some time for pavex
βlet's talk about the progress!
You can comment this update on r/rust.
Table of Contents
What's new
Compile-time validation for route parameters
In March I added support for route parameters:
pub fn blueprint() -> Blueprint {
let mut bp = Blueprint::new();
// `home_id` is a route parameter!
// It will extract the corresponding segment out of incoming requests at runtime.
// E.g. `1` for `GET /home/1`
bp.route(GET, "/home/:home_id", f!(crate::get_home));
// [...]
}
pub fn get_home(
// You can then retrieve (and automatically deserialize!) the extracted route parameters
// in your request handler using the `RouteParams` extractor.
params: &RouteParams<HomeRouteParams>
) -> String {
format!("Welcome to {}", params.0.home_id)
}
#[derive(serde::Deserialize)]
pub struct HomeRouteParams {
pub home_id: u32,
}
What happens if you change the route template from /home/:home_id
to /home/:id
?
From a routing perspective, they're absolutely equivalent: a GET
request to /home/1
will match with both!
But the request handler will fail to extract route parameters from /home/:id
if you forget to change the field name in HomeRouteParams
from home_id
to id
. Even worse, the failure will happen at runtimeβif there are no tests for this endpoint, you might end up shipping broken code in production. I don't like that.
A new procedural macro comes to the rescue!
Instead of annotating HomeRouteParams
with #[derive(serde::Deserialize)]
, you can use #[RouteParams]
:
#[RouteParams]
pub struct HomeRouteParams {
pub home_id: u32,
}
If you now change /home/:home_id
to /home/:id
, you'll be greeted by this error when you try to re-generate your application code:
ERROR:
Γ `app::get_home` is trying to extract route parameters using `RouteParams<HomeRouteParams>`.
β Every struct field in `app::HomeRouteParams` must be named after one of the route parameters
| that appear in `/home/:id`:
β - `id`
β
β There is no route parameter named `home_id`, but there is a struct field named
β `home_id` in `app::HomeRouteParams`. This is going to cause a runtime error!
β
β ββ[src/lib.rs:43:1]
β 43 β ));
β 44 β bp.route(GET, "/home/:id", f!(crate::get_home));
β Β· ββββββββββββ¬ββββββ
β Β· The request handler asking for `RouteParams<app::HomeRouteParams>`
β 45 β
β β°ββββ
β help: Remove or rename the fields that do not map to a valid route parameter.
Quite cool, isn't it?
Let's unpack how it works under the hood!
serde::Deserialize
is what ties together the route template (/home/:home_id
) with the binding struct (HomeRouteParams
). Generally speaking, we can't make any assumptions on the deserialization logic: a developer is free to provide their own exotic implementation of serde::Deserialize
for HomeRouteParams
βe.g. it might be indeed looking for a route segment named id
which is then bound to the home_id
field.
If serde::Deserialize
is derived though, we can make assumptions: each field in the struct must be named as one of the route parameters defined in the route template. If that's not the case, deserialization is going to fail at runtime.
This is where #[RouteParams]
comes into the picture. It does two things:
- Derive
serde::Deserialize
for your type; - Implement
pavex_runtime::serialization::StructuralDeserialize
for your type.
StructuralDeserialize
is a marker trait:
pub trait StructuralDeserialize {}
It provides no functionality on its own. It's a way for us to tag a type and say "their implementation of serde::Deserialize
is derived"1. The pavex
compiler can then look it up!
When it processes the request handlers you registered, it looks at their input parameters: is there any RouteParams<T>
in there?
If there is one, pavex
checks if T
implements StructuralDeserialize
:
- if it does,
pavex
kicks off additional checks with respect to field naming; - if it doesn't,
pavex
assumes that you rolled your own implementation ofserde::Deserialize
and trusts that you know what you are doing.
The technique is inspired by Rust's standard libraryβStructuralEq and StructuralPartialEq play the same role for identifying derived implementation of Eq
and PartialEq
.
Nesting and encapsulation
Everything starts simple, including APIs.
You can easily keep your entire router and state in a single function when you are exposing 4 or 5 endpoints. Things get really messy when, over time, the API surface grows to tens (if not hundreds!) of routes with an intricate network of dependencies and middlewares.
Our brains are limitedβit's hard to keep too many different things in mind when working on a codebase2. That's what modules are for!
Modules empower us to segment our domain in units that are small enough to be reasoned about, encapsulating complexity behind an interface that abstracts away the nitty-gritty details.
Last month, pavex
had no mechanism for encapsulation. All routes, constructors and error handlers lived in a flat "namespace". That's optimal for a small microserviceβyou don't want to pay the cognitive price of abstractions you don't need.
But I want pavex
to be able to support your project as it grows in complexityβit should be the ideal foundation for building large monoliths in Rust3.
That's why I've added support for nesting:
pub fn blueprint() -> Blueprint {
let mut bp = Blueprint::new();
bp.constructor(f!(crate::db_connection_pool), Lifecycle::Singleton);
bp.nest_at("/admin", admin_blueprint());
bp.nest_at("/api", api_bp());
bp
}
pub fn admin_blueprint() -> Blueprint {
let mut bp = Blueprint::new();
bp.constructor(f!(crate::session_token), Lifecycle::RequestScoped);
bp.route(GET, "/", f!(crate::admin_dashboard));
// [...]
}
pub fn api_blueprint() -> Blueprint {
// [...]
}
You can decompose your application into smaller Blueprint
s, each focused on a subset of routes and constructors.
A nested Blueprint
inherits all the constructors registered against its parents: in our example, both /admin/*
and /api/*
request handlers can access the database connection pool returned by the top-level constructor.
The opposite, instead, is forbidden: constructors registered against a nested blueprint are not visible to its parent(s) nor to its siblings. Going back to the example above, /api/*
request handlers cannot access the session token returned by the constructor registered in admin_blueprint
.
This kind of encapsulation allows you to keep a close eye on the set of dependencies available to each part of your application.
nest_at
has another side-effect: it adds a prefix to all the routes registered by the nested blueprint. crate::admin_dashboard
will be invoked on GET /admin/
requests instead of GET /
.
Decomposition, though, does not always map cleanly to path prefixes. That's why pavex
provides another method, nest
, which has identical behaviour with respect to state encapsulation but does not add any route prefix.
Dealing with ambiguity
Nesting and encapsulation are cool on paper, but the devil is in the details.
What happens if api_blueprint
and admin_blueprint
try to register different constructors for the same singleton type, a u64
?
Singletons should be... well, singletonsβbuilt once and used for the entirety of the application lifetime. Which constructor should pavex
use? The one provided by api_blueprint
? Or the one provided by admin_blueprint
?
The answer is neither! This edge case is accounted for and we return a dedicated error:
ERROR:
Γ The constructor for a singleton must be registered once.
β You registered the same constructor for `u64` against 2 different nested
β blueprints.
β I don't know how to proceed: do you want to share the same singleton
β instance across all those nested blueprints, or do you want to create a
β new instance for each nested blueprint?
β
β ββ[src/lib.rs:10:1]
β 10 β let mut bp = Blueprint::new();
β 11 β bp.constructor(f!(crate::admin::singleton), Lifecycle::Singleton);
β Β· βββββββββββ¬βββββββββββββββ
β Β· β°ββ A constructor was registered here
β β°ββββ
β ββ[src/lib.rs:22:1]
β 22 β let mut bp = Blueprint::new();
β 23 β bp.constructor(f!(crate::api::singleton), Lifecycle::Singleton);
β Β· βββββββββββ¬βββββββββββββ
β Β· β°ββ A constructor was registered here
β β°ββββ
β help: If you want to share a single instance of `u64`, remove constructors
β for `u64` until there is only one left. It should be attached to a
β blueprint that is a parent of all the nested ones that need to use it.
β β
β ββ[src/lib.rs:5:1]
β 5 β pub fn blueprint() -> Blueprint {
β 6 β let mut bp = Blueprint::new();
β Β· βββββββββ¬βββββββ
β Β· β°ββ Register your constructor against this blueprint
β β°ββββ
β help: If you want different instances, consider creating separate newtypes
β that wrap a `u64`.
A similar reasoning applies if a nested blueprint tries to override the constructor registered by its parent for a singleton type.
The approach is different, instead, for request-scoped and transient types: nested blueprints can override the behaviour of their parentβe.g. register a different error handler for the same extractor.
Striking a balance between expressiveness and the principle of least surprise is tricky. I expect that I'll have to iterate further on this part of the API going forward, but I'm happy enough with this first version!
Borrow checking
pavex
is a code generatorβit takes as input a Blueprint
that describes your application and spits out Rust code that can serve incoming requests.
There is a key detail here: the Rust code that we generate must compile successfully, which in turn implies that it must satisfy the Rust borrow checker!
That's trickier than it soundsβit might or might not be possible to generate code that makes the borrow checker happy, depending on the shape of your dependency graph. Let's look at an example:
To invoke request_handler
, we need to build an instance of B
and an instance of C
. But their respective constructors want to take A
as input by value.
That can't beβthe borrow checker would reject the resulting code.
Last month, that's exactly what used to happen: pavex
would happily accept your Blueprint
and then emit code that didn't compile. Understanding why it didn't compile (and mapping it back to your registered constructors) was left as an exercise for the user.
That sucks, and I spent the better part of April fixing it.
If you try to pass a similar call graph to pavex
today, it gets rejected with an error:
ERROR:
Γ I can't generate code that will pass the borrow checker *and* match the
β instructions in your blueprint.
β There are 2 components that take `app::A` as an input parameter, consuming
β it by value. Since I'm not allowed to clone `app::A`, I can't resolve
β this conflict.
β
β help: Allow me to clone `app::A` in order to satisfy the borrow checker.
β You can do so by invoking `.cloning(CloningStrategy::CloneIfNecessary)`
β on the type returned by `.constructor`.
β β
β ββ[src/lib.rs:40:1]
β 40 β let mut bp = Blueprint::new();
β 41 β bp.constructor(f!(crate::build_a), Lifecycle::RequestScoped);
β Β· βββββββ¬ββββββββββ
β Β· β°ββ The constructor was registered here
β β°ββββ
β help: Considering changing the signature of the components that consume
β `app::A` by value.
β Would a shared reference, `&app::A`, be enough?
β β
β ββ[src/lib.rs:42:1]
β 42 β bp.constructor(f!(crate::build_b), Lifecycle::RequestScoped);
β 43 β bp.constructor(f!(crate::build_c), Lifecycle::RequestScoped);
β Β· βββββββ¬ββββββββββ
β Β· β°ββ One of the consuming constructors
β β°ββββ
β β
β ββ[src/lib.rs:41:1]
β 41 β bp.constructor(f!(crate::build_a), Lifecycle::RequestScoped);
β 42 β bp.constructor(f!(crate::build_b), Lifecycle::RequestScoped);
β Β· βββββββ¬ββββββββββ
β Β· β°ββ One of the consuming constructors
β β°ββββ
β help: If `app::A` itself cannot implement `Clone`, consider wrapping it in
β an `std::sync::Rc` or `std::sync::Arc`.
The borrow checker is a tricky beast on its own, so I put in a lot of effort in suggesting possible remediations.
The first is what I'd generally recommend: just Clone
it!
By default, pavex
doesn't inject .clone()
invocations. You need to explicitly tell the framework that it's OK to clone a type if needed:
pub fn blueprint() -> Blueprint {
let mut bp = Blueprint::new();
bp.constructor(f!(crate::build_a), Lifecycle::RequestScoped)
// π This allows `pavex` to sprinkle in `.clone()` calls where helpful
.cloning(CloningStrategy::CloneIfNecessary);
// [...]
}
That change is enough to fix the previous errorβthe call graph becomes:
pavex
's code generation is then smart enough to process the Clone::clone()
node before invoking build_b
, therefore producing code that passes the borrow checker π
Let's be clear: pavex
does not yet catch all possible borrow-checking issues ahead of code generation, but it does a fairly good job at catching the most common violations (e.g. borrow after moved) as well as some of the trickier ones (e.g. when control flow statements like match
are involved).
Its main blindspots are "hidden" borrowsβe.g. C
depends on B<'a>
, which stores a reference to A
as one of its fields, therefore implying that C
borrows from A
. It can be solved, there is no hard blocker thereβit's just a matter of putting in the work, something I plan to tackle in the mid-future.
Circular dependencies
Last but not least, I've done some bug squashing!
pavex
doesn't like circular dependencies, like in this call graph:
It used to handle circular dependencies very poorlyβit would hang, indefinitely, stuck in an infinite loop.
I have introduced an intermediate analysis step (called DependencyGraph
) to detect circular dependencies before they become an existential problem, removing the infinite loop and emitting a nice error as a result:
ERROR:
Γ The dependency graph cannot contain cycles, but I just found one!
β If I tried to build your dependencies, I would end up in an infinite loop.
β
β The cycle looks like this:
β
β - `build_b` depends on `app::C`, which is built by `build_c`
β - `build_c` depends on `app::A`, which is built by `build_a`
β - `build_a` depends on `app::B`, which is built by `build_b`
β
β help: Break the cycle! Remove one of the 'depends-on' relationship by
β changing the signature of one of the components in the cycle.
What's next?
First and foremost, some rest! I'll be off the grid for a few days, taking a little break.
Speaking of pavex
, there is one key feature that I've yet to implement: middlewares.
But they'll have to wait a bit longer. I am eager to kick the tires on pavex
βi.e. try to build a small project to see how it feels to develop with pavex
.
I'll probably be implementing the Realworld API specβI've done it in the past using actix-web
and it should give me a pretty solid measure of what needs to be done next for pavex
.
As a bonus, it'll help me to validate the design sketches for the middleware API. I have plenty of crazy-man notes spread around the house, full of boxes and arrows.
See you next month!
You can comment this update on r/rust.
Subscribe to the newsletter if you don't want to miss the next update!
You can also follow the development ofpavex
on GitHub.
As it happens, I found out a couple of days ago that there might be a way to determine if you derived serde::Deserialize
without having to introduce a marker trait. I'll investigate it further in the near future.
If the intersection of neuroscience and developer experience fascinates you, I strongly recommend checking out The Programmer's brain by Felienne Hermans.
Monoliths have a bad reputation, but they can be surprisingly effective in the right circumstances. As an industry, we often think in absolutesβ"Monolith? A gigantic spaghetti mess deployed on one big box"βreality is more nuanced. Powered by the right framework, it should be easy enough to deploy a monolithic application as a set of serverless functions, one for each endpoint. As long as they don't call into each other, you retain most of the benefits of a "traditional" monolith without many of its scalability/billing downsides. Food for thoughtβhybrid deployment strategies are definitely top of mind for me when thinking about pavex
's future directions.