Skip to content

aksh1618/zero2prod

Repository files navigation

Zero2Prod Learnings

General

Start with CI

  • For any project, set up the CI before anything else
  • A good CI setup includes:
    • Tests: The code must pass all tests. This also ensures that it compiles and builds properly.
    • Formatting: To prevent bikeshedding. Instead of auto-formatting, we can set push rejection for non formatted code, to allow for manual intervention
    • Linting: We can reject push in case any warnings are present
    • Code Coverage: As a tool for finding untested code

On code coverage

  • Code coverage is a great tool as long as it's not used for measuring quality, and only used as tool for finding untested parts of the codebase
  • More on this by Martin Fowler

Framework-Agnostic Integration tests

  • Many frameworks provide utilities to make writing integration tests simpler, but this approach has the disadvantage that moving to another framework then requires a rewrite of the test suite as well!
  • We should thus aim for black box testing as much as possible, to have portable integration tests
  • This can be a life-saver when going through a large rewrite or refactoring involving the frameworks used in the application

Logs are the wrong abstraction

  • As soon as we want to correlate our logs, we run into challenges:
    • Do we pass the correlation id to all the functions we call?
    • Do we rewrite middleware etc. on our own just to include the correlation id?
    • How do we include it in the logs of other crates we're using? Do we rewrite them as well?
  • Processes like backend application requests are best observed as a tree-like processing pipeline, with the processing broken down into steps, in-turn broken down into smaller substeps. For each node, we require things like duration and context association & propagation (request params, correlation id etc.)
  • Logs, however, are isolated timestamped events: they don't lend themselves well to this tree-like structure

Continuous Deployment

"We are particularly interested, in fact, on how the engineering practice of continuous deployment influences our design choices and development habits." Let's see!

  • Using profiles so that local & production can behave differently

Optimizing Docker image

  • Use .dockerignore: As of now I think it's better to do this than to COPY specific things, as it guards against someone forgetting to copy some additional directory down the line. That would cause more problems than the image size getting increased due to forgetting to add things to .dockerignore.
  • Use multi-stage builds: We can separate build and run stages in the dockerfile, so that the build stage doesn't contribute to the final image size. This however does require us to be careful about ensuring that we're copying everything that is required from the build stage to the run stage, however that's usually just one build binary/jar etc. so it's not that much of an issue.
  • Leverage docker layer caching for project dependencies: The dependencies of a project change much less often as compared to the source code itself. If we can separate installing/building dependencies into a separate layer before bringing in our actual source code then we can speed up the build process by a lot.
  • Mixing multi-stage builds & layer caching: Docker's layer cache is only invalidated for subsequent commands within the stage boundary. E.g. COPY . . in one stage will only invalidate the subsequent commands in that stage. We can use this to our advantage, much like cargo-chef does. By having a separate prepare command that depends on the working directory, producing a recipe.json, and a separate cook command that depends only on the recipe.json, it allows for using them in separate stages, ensuring cache invalidation only happens for prepare, but not for cook, which is the time/resource intensive step.

What should I validate?

  • Domain constraints: These will be constraints related to the domain. E.g. a field must not be empty.
  • Security constraints: Forms etc. are primary targets for malicious visitors - their inputs need to be sanitized against things like SQL injection, code injection etc. However, all security constraints can't be handled just via validation, and so we need a layered approach, having checks at multiple layers in our stack - input validation, parameterized queries in SQL, etc.

Where should I validate?

We should always keep in mind that software is a living artifact: holistic understanding of a system is the first victim of the passage of time.

You have the whole system in your head when writing it down for the first time, but the next developer touching it will not - at least not from the get-go. It is therefore possible for a load-bearing check in an obscure corner of the application to disappear (e.g. HTML escaping) leaving you exposed to a class of attacks (e.g. phishing).

Redundancy reduces risk.

Validation is a leaky cauldron, or, Parse, don't validate

  • Validating correctly requires a lot of redundancy - we can't take validation as a given in any part of the code, as the part that was currently doing the validation might change at any point in time. So we must validate in all our functions.
  • Validations are a short-lived, point-in-time operation - the result of the validation is not stored anywhere
  • Instead, we need a parsing functions - a function that creates a new era of
  • The talk to watch that covers and solidifies all these ideas: Domain Modeling Made Functional - Scott Wlaschin

Rust

Where should I put my tests?

  • Doc tests are a good place for ensuring that the docs remain up-to date
  • Embedded test modules
    • Good place for "iceberg" projects: projects with very small public API footprint, but a lot of internal complexity, as it might not be straight forward to exercise all possible edge cases via the limited public API
    • However, for projects with a lot of public API, it can be risky to rely on these as due to visibility rules they can access non-public API as well, leading to tests depending on implementation details
  • External tests folder
    • Can be used for library or web service projects which have a larger public API, as they have the same level of access as any external project, which is ideal for integration testing.
    • This prevents coupling with any internal implementation details and allows for more of a black-box testing
  • A post by matklad on internal details & performance implications of the three approaches:
    • For large projects it can be beneficial to use tests/it/*.rs instead of tests/*.rs, as each

When should my code panic?

  • Panics in Rust are used to deal with unrecoverable errors: failure modes which are not expected or that we have no meaningful way to recover from. E.g.: Host machine running out of memory or disk.
  • Rust's panics are not equivalent to exceptions in languages such as Python/C#/Java - they are not meant to be caught. While rust does provide a few utilities to do so, they are not recommended and should be used sparingly.
  • From a related reddit thread:

    […] If your Rust application panics in response to any user input, then the following should be true: your application has a bug, whether it be in a library or in the primary application code. - burntsushi

Why should I implement TryFrom?

  • There's no particular reason per se - we are "just" making our intent clearer, by spelling out "This is a type conversion!".
  • Following conventions like this helps future readers of our code to immediately spot them, as they will already be familiar with them.

Type Driven Development (or DDD) with Rust

  • Rust's rich type system can help us make incorrect usage patterns unrepresentable.
  • The new-type pattern can be used to create domain types from existing types.
  • Rust's visibility rules can help make it impossible to create domain types without satisfying any required constraints - private fields can be combined with a parsing constructor that validates all constraints.
  • Rust's ownership system allows providing safe access to inner types, either by consuming the domain type or taking a shared reference. The standard library provides the AsRef trait to improve ergonomics for the latter.

The newtype pattern

  • The pattern of wrapping existing types with new types, with Rust's tuple structs, is known as the newtype pattern:
    struct Milliseconds(u64);
  • Newtypes are a zero-cost abstraction! #[repr(transparent)] can be used to make the newtype have the same layout and ABI as the internal type.
  • Newtypes lend themselves very well to domain driven design - they can be used to define domain specific types wrapping existing types.
  • Newtypes can also be used to evolve an API in a backwards compatible manner - internal implementation details are hidden from consumers, and a limited subset of the API of the internal type can be exposed to allow for swapping out with another type in the future without a breaking change.
  • Newtypes are also the de-facto way to bypass the orphan rule - the foreign type can be wrapped in a newtype, then we can implement foreign traits for the new type, which itself is a local type.
  • The derive_more crate helps remove any annoyances & boilerplate code for the newtype pattern, providing derive macros for a wide range of traits, including conversion traits like AsRef.
  • Resources:

Getting along with the Orphan Rule

  • The newtype pattern can be used to bypass the orphan rule, through making foreign types local by wrapping them in local types.
  • Serde offers ways to derive De/Serialize for foreign types: https://serde.rs/remote-derive.html.

Rust Ecosystem

Actix

Spring Boot -> Actix Web Rosetta Stone

Spring Boot Actix Web
DI @Bean + @Autowired App::app_data + web::Data

#[tokio::main] vs #[actix_web::main]

  • tokio::main is also supported by actix, but actix_web::main is the recommended one, as it sets up some other things in addition to the tokio runtime.
  • If we need to use tokio's work stealing capabilities for something else, we can use tokio::main then, but we'll need to give up on websocket etc. functionality.
  • More at actix_web::rt

route() vs service() + [get()]

Runtime

  • actix-web spins up a worker process for each available core on the machine. Each worker runs its own copy of the application built by HttpServer calling the very same closure that HttpServer::new takes as argument.

Application State (& Dependency Injection)

  • actix-web uses a type-map to represent its application state: a HashMap that stores arbitrary data (using Any type) against their unique type identifier (obtained via TypeId::of)
  • When a new request comes in, it computes the type id of the type specified in the signature and checks if there is a record corresponding to it in the type-map. If there is one, it casts the retrieved Any value to the type specified and passes it to the handler
  • An object in the shared application state is actually shared only for that particular worker thread because of the nature of the actix runtime. If we want to have an actual "shared" state, we must create a shareable object, such as a web::Data or Arc<T>, outside the App factory. For example:
    struct AppStateWithCounter {
      counter: Mutex<i32>, // <- Mutex is necessary to mutate safely across threads
    }
    
    #[actix_web::main]
    async fn main() -> std::io::Result<()> {
        // Note: web::Data created _outside_ HttpServer::new closure
        let counter = web::Data::new(AppStateWithCounter {
            counter: Mutex::new(0),
        });
    
        HttpServer::new(move || {
            // move counter into the closure
            App::new()
                .app_data(counter.clone()) // <- register the created data
                .route("/", web::get().to(index))
        })
        .bind(("127.0.0.1", 8080))?
        .run()
        .await
    }

Serde

  • Serde is data format agnostic: it only provides the data model, traits etc. for serialization (Serialize, Serializer) & deserialization (Deserialize, Deserializer, Visitor). These are then implemented in data format crates such as serde_json
  • Serde has an internal data model consisting of 29 types, which is used for the IR. This includes its deserializer lifetimes which enables goodies like zero-copy deserialization
  • The mapping of a data format into the data model need not be straightforward: e.g. a platform specific construct such as OsString could be mapped into the data model by treating it as a Serde enum:
    enum OsString {
      Unix(Vec<u8>),
      Windows(Vec<u16>),
      // and other platforms
    }

    "The flexibility around mapping into the Serde data model is profound and powerful. When implementing Serialize and Deserialize, be aware of the broader context of your type that may make the most instinctive mapping not the best choice."

    Mapping into the data model - Serde

  • Rust's monomorphization means that each Serializer function is optimized by the compiler for each concrete Serializable type, making serde very efficient. However, this doesn't mean more performant implementations aren't possible -- specialized algorithmic choices specific to a data format could lead to more performance, e.g. simd-json.
  • Serde's macros enable all information required to (de)serialize a specific type for a specific data format to be available at compile-time. This is a necessity, as Rust doesn't provide runtime reflection, but has the nice benefit of incurring zero runtime overhead
  • An interesting implementation detail about serde is that the data model has no concrete struct/enum backing it -- it just manifests in the trait methods, which means a complete allocation of an intermediate structure is entirely avoided. More in a concise yet great dive into serde internals by Josh Mcguigan: Understanding Serde
  • serde-aux crate provides custom serializers/deserializers, such as deserialize_number_from_string. These can be used in field level attributes, like #[serde(deserialize_with = "deserialize_number_from_string")]
  • And, of course, the best resource for a deeper understanding of serde is the decrusting stream by Jon: Decrusting the serde crate

sqlx

  • sqlx supports connecting to a database in order to validate SQL queries at compile time! Of course this is a bit controversial, and it supports disabling this behaviour so it's disabled by default
  • sqlx provides sqlx-cli for stuff like preparing query metadata for offline compile-time verification, creating databases, adding & running migrations etc. More in sqlx-cli docs
  • sqlx's Executor trait, used to actually execute the queries, requires a mutable reference, allowing them to enforce the guarantee that only a single query at a time would run over the same DB connection -- as the rust compiler guarantees exclusive access in case of mutable reference. This is why &PgConnnection doesn't implement Executor, but &mut PgConnection does.

Logging & Tracing

  • log is the de-facto, lowest-common-denominator crate for logging in rust. It provides a facade in the form of macros for different logging levels (debug!, warn! etc.) along with a Log trait
  • env-logger is a simple Log implementation that takes config from env variables (and also in other ways) and logs to stdout/stderr
  • tracing is a production-grade tracing crate by tokio, providing span and event facades via it's macro based API along with a Subscriber trait.
  • tracing offers niceties such as KV support with easy Display/Debug usage, RAII for span closure via Drop, support for attaching spans to futures with Instrument trait etc.
  • tracing interops with log crate via it's log feature -- all log macros emit an event and all span/event traces are emitted as log records when no Subscriber implementation is provided
  • tracing-log crate provides interop in the opposite direction -- allowing a tracing subscriber to consume log records as though they were tracing events
  • tracing-subscriber crate provides utilities for implementing & composing tracint subscribers. In particular, it provides the Layer trait, a composable abstraction for building Subscribers, and the Registry struct which actually implements Subscriber and takes care of storing span data, recording span relationships & lifecycle, and exposing it to all layers. More in tracing-subscriber docs
  • tracing::instrument, an attribute macro part of tracing, allows for seamlessly attaching spans to functions while keeping the implementation clean.
    • It offers almost all features offered by the span macro, making it a drop-in replacement.
    • It includes the function arguments as fields by default, but offers skip_all argument to opt out of this: #[tracing::instrument(skip_all)]
  • Some other crates in the tracing ecosystem: tracing-actix-web, tracing-opentelemetry, tracing-error
  • Some resources on tracing:

secrecy

  • secrecy provides a wrapper type Secret<T> for secrets that prevents accidental printing or serialization
  • It makes the exposure of the wrapped value explicit by requiring a call to secrecy::expose_secret
  • It also offers niceties like clearing from memory on drop using the zeroize crate
  • In general having a wrapper type for secrets also has the benefit of clearly marking what parts of the domain are treated as sensitive information according to relevant regulations etc.

cargo-chef

  • cargo-chef is an ecosystem crate that fills the gap for independent dependencies building in rust -- something like pip install -r requirements.txt
  • The main use case this solves is allowing for faster Docker builds -- the dependencies build can be cached as a separate layer from the application build, so changes in the application don't require rebuilding the dependencies as well. In the zero2prod project with a single actix-web endpoint interacting with the DB via sqlx, using cargo-chef brought down the build time for code changes by 10x -- from ~73s to ~7s!
  • It provides two commands: prepare and cook. prepare creates a recipe.json, and cook uses it to build the dependencies. recipe.json currently looks something like this:
    {
      "skeleton": {
        "manifests": [
          {
            "relative_path": "Cargo.toml",
            "contents": "<contents of `Cargo.toml` except lints, with default properties (such as `[lib]`) filled in>",
            "targets": [
              {
                "path": "src/lib.rs",
                "kind": {
                  "Lib": {
                    "is_proc_macro": false
                  }
                },
                "name": "zero2prod"
              },
              {
                "path": "src/main.rs",
                "kind": "Bin",
                "name": "zero2prod"
              }
            ]
          }
        ],
        "config_file": "<contents of `.cargo/config.toml`>",
        "lock_file": "<contents of `Cargo.lock`>",
        "rust_toolchain_file": null
      }
    }
  • There is a long-standing issue in the cargo repo for native support for building just the dependencies, but until then cargo-chef does a great job at solving the problem.

Property Testing

  • quicktest is the veteran property testing crate in rust, built by none other than burntsushi. It's lightweight and is ideal for small applications.
  • proptest is a spiritual successor to quicktest, with better (and implemented by default) shrinking and much more. More on quicktest vs proptest.
  • There's a proptest book as well, which is quite a handy guide to proptest and property testing in general.

Postgres

MySQL -> Postgres Rosetta Stone

MySQL PostgreSQL
Default port 3306 5432
CLI binary mysql psql
Exit CLI binary exit; or quit; \q
List databases show databases; \l
Choose database use {db_name}; \c {db_name}
List tables show tables; \d or \d+ or \dt or \dt+
Describe table describe {table_name}; \d {table_name} or \d+ {table_name}
Show create table show create table {table_name}; (no command in psql) pg_dump -st {table_name} {dbname}
List indexes show index from {table_name}; \d {table_name} or select * from pg_indexes where tablename = '{table_name}';
List users or roles SELECT User, Host from mysql.user; \du
Show process list show processlist; select * from pg_stat_activity
Pretty print Suffix command with \G instead of ; \x (expanded display) followed by the command
Comment single line # (hash) -- (double dash)
Quotes Both ' (single) and " (double) for literals ' (single) for string literals, " (double) for identifiers
Case sensitivity Case-insensitive string comparisons Case-sensitive string comparisons

Sources: tipseason.com

Maintenance DB (postgres)

  • Postgres has a maintenance database called postgres: let's say we want to create a new database -- in order to create a new database, we need to connect to the Postgres instance, which requires connecting to a database that already exists. postgres is the database that already exists!
  • By default, every user can connect to the Postgres instance using this database. However, what they can do after connecting depends entirely on the access they have been provided.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors