Programming Rust

Fast, Safe Systems Development

by

  • On Amazon
  • ISBN: 978-1492052593
  • My Rating: 7/10

What is it about?

Programming Rust is a book about the Rust programming language.

My impression

I found Programming Rust an informative book that complements well the official book The Rust Programming Language. Sometimes I wished there was a bit more meat on the bone, especially toward the end of the book, with the advanced topics. Other times a simple link to the API documentation would have been enough.

My notes

Preface

In short, systems programming is resource-constrained programming. It is programming when every byte and every CPU cycle counts.

Systems Programmers Can Have Nice Things

The Rust language makes you a simple promise: if your program passes the compiler's checks, it is free of undefined behavior. Dangling pointers, double-frees, and null pointer dereferences are all caught at compile time. Array references are secured with a mix of compile-time and run-time checks, so there are no buffer overruns [...]. Further, Rust aims to be both safe and pleasant to use. In order to make stronger guarantees about your program's behavior, Rust imposes more restrictions on your code than C and C++ do, and these restrictions take practice and experience to get used to. But the language overall is flexible and expressive.

Rust's package manager and build tool, Cargo, makes it easy to use libraries published by others on Rust's public package repository, the crates.io website. You simply add the library's name and required version number to a file, and Cargo takes care of downloading the library, together with whatever other libraries it uses in turn, and linking the whole lot together.

A Tour of Rust

Rust's machine integer type names reflect their size and signedness: i32 is a signed 32-bit integer; u8 is an unsigned 8-bit integer (used for "byte" values), and so on. The isize and usize types hold pointer-sized signed and unsigned integers, 32 bits long on 32-bit platforms, and 64 bits long on 64-bit platforms. Rust also has two floating-point types, f32 and f64, which are the IEEE single- and double-precision floating-point types, like float and double in C and C++.

If a function body ends with an expression that is not followed by a semicolon, that's the function's return value. [...] It's typical in Rust to use this form to establish the function's value when control "falls off the end" of the function, and use return statements only for explicit early returns from the midst of a function.

The #[test] marker is an example of an attribute. Attributes are an open-ended system for marking functions and other declarations with extra information [...]. They're used to control compiler warnings and code style checks, include code conditionally (like #ifdef in C and C++), tell Rust how to interact with code written in other languages, and so on.

Functions that do anything that might fail, such as doing input or output or otherwise interacting with the operating system, can return Result types whose Ok variants carry successful results – the count of bytes transferred, the file opened, and so on – and whose Err variants carry an error code indicating what went wrong. Unlike most modern languages, Rust does not have exceptions: all errors are handled using either Result or panic [...].

Option is an enumerated type, often called an enum, because its definition enumerates several variants that a value of this type could be: for any type T, a value of type Option is either Some(v), where v is a value of type T, or None, indicating no T value is available. [...] Option is a generic type: you can use Option to represent an optional value of any type T you like.

[///] are documentation comments; the rustdoc utility knows how to parse them, together with the code they describe, and produce online documentation.

Fundamental Types

Compared to a dynamically typed language like JavaScript or Python, Rust requires more planning from you up front. You must spell out the types of function arguments and return values, struct fields, and a few other constructs. However, two features of Rust make this less trouble than you might expect:

  • Given the types that you do spell out, Rust's type inference will figure out most of the rest for you. In practice, there's often only one type that will work for a given variable or expression; when this is the case, Rust lets you leave out, or elide, the type.
  • Functions can be generic: a single function can work on values of many different types.

The footing of Rust's type system is a collection of fixed-width numeric types, chosen to match the types that almost all modern processors implement directly in hardware. Fixed-width numeric types can overflow or lose precision, but they are adequate for most applications and can be thousands of times faster than representations like arbitrary-precision integers and exact rationals.

Rust uses the u8 type for byte values. For example, reading data from a binary file or socket yields a stream of u8 values.

[The] precision [of the usize and isize types] matches the size of the address space on the target machine: they are 32 bits long on 32-bit architectures, and 64 bits long on 64-bit architectures. Rust requires array indices to be usize values. Values representing the sizes of arrays or vectors or counts of the number of elements in some data structure also generally have the usize type.

Integer literals in Rust can take a suffix indicating their type: 42u8 is a u8 value, and 1729isize is an isize.

The prefixes 0x, 0o, and 0b designate hexadecimal, octal, and binary literals.

To make long numbers more legible, you can insert underscores among the digits. For example, you can write the largest u32 value as 4_294_967_295. The exact placement of the underscores is not significant, so you can break hexadecimal or binary numbers into groups of four digits rather than three, as in 0xffff_ffff, or set off the type suffix from the digits, as in 127_u8.

Although numeric types and the char type are distinct, Rust does provide byte literals, character-like literals for u8 values: b'X' represents the ASCII code for the character X, as a u8 value. For example, since the ASCII code for A is 65, the literals b'A' and 65u8 are exactly equivalent.

You can convert from one integer type to another using the as operator.

When an integer arithmetic operation overflows, Rust panics, in a debug build. In a release build, the operation wraps around: it produces the value equivalent to the mathematically correct result modulo the range of the value.

When this default behavior isn't what you need, the integer types provide methods that let you spell out exactly what you want. [...] These integer arithmetic methods fall in four general categories:

  • Checked operations return an Option of the result: Some(v) if the mathematically correct result can be represented as a value of that type, or None if it cannot.
  • Wrapping operations return the value equivalent to the mathematically correct result modulo the range of the value. [...] this is how the ordinary arithmetic operators behave in release builds. The advantage of these methods is that they behave the same way in all builds.
  • Saturating operations return the representable value that is closest to the mathematically correct result. In other words, the result is "clamped" to the maximum and minimum values the type can represent.
  • Overflowing operations return a tuple (result, overflowed), where result is what the wrapping version of the function would return, and overflowed is a bool indicating whether an overflow occurred.

Every part of a floating-point number after the integer part is optional, but at least one of the fractional part, exponent, or type suffix must be present, to distinguish it from an integer literal. The fractional part may consist of a lone decimal point, so 5. is a valid floating-point constant.

The types f32 and f64 have associated constants for the IEEE-required special values like INFINITY, NEG_INFINITY (negative infinity), NAN (the not-a-number value), and MIN and MAX (the largest and smallest finite values).

Although a bool needs only a single bit to represent it, Rust uses an entire byte for a bool value in memory, so you can create a pointer to it.

Rust's character type char represents a single Unicode character, as a 32-bit value. Rust uses the char type for single characters in isolation, but uses the UTF-8 encoding for strings and streams of text. So, a String represents its text as a sequence of UTF-8 bytes, not as an array of characters.

A tuple is a pair, or triple, quadruple, quintuple, etc. (hence, n-tuple, or tuple), of values of assorted types. You can write a tuple as a sequence of elements, separated by commas and surrounded by parentheses. [...] Given a tuple value t, you can access its elements as t.0, t.1, and so on.

[A] commonly used tuple type is the zero-tuple (). This is traditionally called the unit type because it has only one value, also written (). Rust uses the unit type where there's no meaningful value to carry, but context requires some sort of type nonetheless.

It's easiest to get started by thinking of references as Rust's basic pointer type. At run time, a reference to an i32 is a single machine word holding the address of the i32, which may be on the stack or in the heap. The expression &x produces a reference to x; in Rust terminology, we say that it borrows a reference to x. Given a reference r, the expression *r refers to the value r points to.

Rust references come in two flavors:

  • &T: An immutable, shared reference. You can have many shared references to a given value at a time, but they are read-only [...].
  • &mut T: A mutable, exclusive reference. You can read and modify the value it points to [...]. But for as long as the reference exists, you may not have any other references of any kind to that value.

Rust uses this dichotomy between shared and mutable references to enforce a "single writer or multiple readers" rule: either you can read and write the value, or it can be shared by any number of readers, but never both at the same time. This separation, enforced by compile-time checks, is central to Rust's safety guarantees.

The simplest way to allocate a value in the heap is to use Box::new.

Rust also has the raw pointer types *mut T and *const T. [...] Using a raw pointer is unsafe, because Rust makes no effort to track what it points to. For example, raw pointers may be null, or they may point to memory that has been freed or that now contains a value of a different type. [...] However, you may only dereference raw pointers within an unsafe block. An unsafe block is Rust's opt-in mechanism for advanced language features whose safety is up to you.

Rust has three types for representing a sequence of values in memory:

  • The type [T; N] represents an array of N values, each of type T. An array's size is a constant determined at compile time and is part of the type; you can't append new elements or shrink an array.
  • The type Vec<T>, called a vector of Ts, is a dynamically allocated, growable sequence of values of type T. A vector's elements live on the heap, so you can resize vectors at will: push new elements onto them, append other vectors to them, delete elements, and so on.
  • The types &[T] and &mut [T], called a shared slice of Ts and mutable slice of Ts, are references to a series of elements that are a part of some other value, like an array or vector. You can think of a slice as a pointer to its first element, together with a count of the number of elements you can access starting at that point.

A Vec<T> consists of three values: a pointer to the heap-allocated buffer for the elements, which is created and owned by the Vec<T>; the number of elements that buffer has the capacity to store; and the number it actually contains now (in other words, its length). When the buffer has reached its capacity, adding another element to the vector entails allocating a larger buffer, copying the present contents into it, updating the vector's pointer and capacity to describe the new buffer, and finally freeing the old one.

If you know the number of elements a vector will need in advance, instead of Vec::new you can call Vec::with_capacity to create a vector with a buffer large enough to hold them all, right from the start; then, you can add the elements to the vector one at a time without causing any reallocation.

You can insert and remove elements wherever you like in a vector, although these operations shift all the elements after the affected position forward or backward, so they may be slow if the vector is long.

Whereas an ordinary reference is a non-owning pointer to a single value, a reference to a slice is a non-owning pointer to a range of consecutive values in memory. This makes slice references a good choice when you want to write a function that operates on either an array or a vector.

If one line of a string ends with a backslash, then the newline character and the leading whitespace on the next line are dropped.

A raw string is tagged with the lowercase letter r. All backslashes and whitespace characters inside a raw string are included verbatim in the string. No escape sequences are recognized. [...] You can't include a double-quote character in a raw string simply by putting a backslash in front of it [...]. However, there is a cure for that too. The start and end of a raw string can be marked with pound signs.

A string literal with the b prefix is a byte string. Such a string is a slice of u8 values – that is, bytes – rather than Unicode text.

Rust strings are sequences of Unicode characters, but they are not stored in memory as arrays of chars. Instead, they are stored using UTF-8, a variable-width encoding. Each ASCII character in a string is stored in one byte. Other characters take up multiple bytes.

A &str (pronounced "stir" or "string slice") is a reference to a run of UTF-8 text owned by someone else: it "borrows" the text. [...] Like other slice references, a &str is a fat pointer, containing both the address of the actual data and its length. You can think of a &str as being nothing more than a &[u8] that is guaranteed to hold well-formed UTF-8.

A String or &str's .len() method returns its length. The length is measured in bytes, not characters.

Ownership and Moves

In Rust [...] the concept of ownership is built into the language itself and enforced by compile-time checks. Every value has a single owner that determines its lifetime. When the owner is freed – dropped, in Rust terminology – the owned value is dropped too.

A variable owns its value. When control leaves the block in which the variable is declared, the variable is dropped, so its value is dropped along with it.

Rust's Box type serves as another example of ownership. A Box<T> is a pointer to a value of type T stored on the heap. Calling Box::new(v) allocates some heap space, moves the value v into it, and returns a Box pointing to the heap space. Since a Box owns the space it points to, when the Box is dropped, it frees the space too.

Just as variables own their values, structs own their fields, and tuples, arrays, and vectors own their elements.

In Rust, for most types, operations like assigning a value to a variable, passing it to a function, or returning it from a function don't copy the value: they move it. The source relinquishes ownership of the value to the destination and becomes uninitialized; the destination now controls the value's lifetime.

Assigning a value of a Copy type copies the value, rather than moving it. The source of the assignment remains initialized and usable, with the same value it had before. Passing Copy types to functions and constructors behaves similarly. The standard Copy types include all the machine integer and floating-point numeric types, the char and bool types, and a few others. A tuple or fixed-size array of Copy types is itself a Copy type. Only types for which a simple bit-for-bit copy suffices can be Copy.

[...] user-defined types being non-Copy is only the default. If all the fields of your struct are themselves Copy, then you can make the type Copy as well by placing the attribute #[derive(Copy, Clone)] above the definition [...].

Although most values have unique owners in typical Rust code, in some cases it's difficult to find every value a single owner that has the lifetime you need; you'd like the value to simply live until everyone's done using it. For these cases, Rust provides the reference-counted pointer types Rc and Arc.

The Rc and Arc types are very similar; the only difference between them is that an Arc is safe to share between threads directly – the name Arc is short for atomic reference count – whereas a plain Rc uses faster non-thread-safe code to update its reference count. If you don't need to share the pointers between threads, there's no reason to pay the performance penalty of an Arc, so you should use Rc; Rust will prevent you from accidentally passing one across a thread boundary.

For any type T, an Rc<T> value is a pointer to a heap-allocated T that has had a reference count affixed to it. Cloning an Rc<T> value does not copy the T; instead, it simply creates another pointer to it and increments the reference count.

References

Rust also has non-owning pointer types called references, which have no effect on their referents' lifetimes. In fact, it's rather the opposite: references must never outlive their referents. You must make it apparent in your code that no reference can possibly outlive the value it points to. To emphasize this, Rust refers to creating a reference to some value as borrowing the value: what you have borrowed, you must eventually return to its owner.

A reference lets you access a value without affecting its ownership. References come in two kinds:

  • A shared reference lets you read but not modify its referent. However, you can have as many shared references to a particular value at a time as you like. The expression &e yields a shared reference to e's value; if e has the type T, then &e has the type &T, pronounced "ref T". Shared references are Copy.
  • If you have a mutable reference to a value, you may both read and modify the value. However, you may not have any other references of any sort to that value active at the same time. The expression &mut e yields a mutable reference to e's value; you write its type as &mut T, which is pronounced "ref mute T". Mutable references are not Copy.

When we pass a value to a function in a way that moves ownership of the value to the function, we say that we have passed it by value. If we instead pass the function a reference to the value, we say that we have passed the value by reference.

In Rust, references are created explicitly with the & operator, and dereferenced explicitly with the * operator.

Since references are so widely used in Rust, the . operator implicitly dereferences its left operand, if needed. [...] The . operator can also implicitly borrow a reference to its left operand, if needed for a method call.

Rust permits references to references. [...] The . operator follows as many references as it takes to find its target.

Like the . operator, Rust's comparison operators "see through" any number of references.

If you actually want to know whether two references point to the same memory, you can use std::ptr::eq, which compares them as addresses.

Rust references are never null. [...] There is no default initial value for a reference (you can't use any variable until it's been initialized, regardless of its type) and Rust won't convert integers to references (outside of unsafe code), so you can't convert zero into a reference.

[...] Rust lets you borrow a reference to the value of any sort of expression at all. [...] In situations like this, Rust simply creates an anonymous variable to hold the expression's value and makes the reference point to that.

[...] Rust also includes two kinds of fat pointers, two-word values carrying the address of some value, along with some further information necessary to put the value to use. A reference to a slice is a fat pointer, carrying the starting address of the slice and its length. [...] Rust's other kind of fat pointer is a trait object, a reference to a value that implements a certain trait. A trait object carries a value's address and a pointer to the trait's implementation appropriate to that value, for invoking the trait's methods. [...] Aside from carrying this extra data, slice and trait object references behave just like the other sorts of references [...].

You can't borrow a reference to a local variable and take it out of the variable's scope.

Rust tries to assign each reference type in your program a lifetime that meets the constraints imposed by how it is used. A lifetime is some stretch of your program for which a reference could be safe to use: a statement, an expression, the scope of some variable, or the like. Lifetimes are entirely figments of Rust's compile-time imagination. At run time, a reference is nothing but an address; it's lifetime is part of its type and has no run-time representation.

[...] if you have a variable x, then a reference to x must not outlive x itself [...]. Beyond the point where x goes out of scope, the reference would be a dangling pointer. We say that the variable's lifetime must contain or enclose that of the reference borrowed from it.

[...] if you store a reference in a variable r, the reference's type must be good for the entire lifetime of the variable, from its initialization until its last use [...]. If the reference can't live at least as long as the variable does, then at some point r will be a dangling pointer. We say that the reference's lifetime must contain or enclose the variable's.

Expressions

Blocks are the most general kind of expression. A block produces a value and can be used anywhere a value is needed.

[...] a block may contain any number of declarations. The most common are let declarations, which declare local variables. [...] A let declaration can declare a variable without initializing it. The variable can then be initialized with a later assignment. This is occasionally useful, because sometimes a variable should be initialized from the middle of some sort of control flow construct.

You may occasionally see code that seems to redeclare an existing variable, like this:

for line in file.lines() {
    let line = line?;
    ...
}
The let declaration creates a new, second variable, of a different type. [...] Its definition supersedes the first's for the rest of the block. This is called shadowing and is very common in Rust programs.

A block can also contain item declarations. An item is simply any declaration that could appear globally in a program or module, such as a fn, struct, or use.

When an fn is declared inside a block, its scope is the entire block – that is, it can be used throughout the enclosing block. But a nested fn cannot access local variables or arguments that happen to be in scope.

It's never strictly necessary to use if let, because match can do everything if let can do. An if let expression is shorthand for a match with just one pattern.

Loops are expressions in Rust, but the value of a while or for loop is always (), so their value isn't very useful. A loop expression can produce a value if you specify one.

Within the body of a loop, you can give break an expression, whose value becomes that of the loop.

Functions don't have to have an explicit return expression. The body of a function works like a block expression: if the last expression isn't followed by a semicolon, its value is the function's return value. In fact, this is the preferred way to supply a function's return value in Rust.

Expressions that don't finish normally are assigned the special type !, and they're exempt from the rules about types having to match. You can see ! in the function signature of std::process::exit(). [...] The ! means that exit() never returns. It's a divergent function.

Rust has closures, lightweight function-like values. A closure usually consists of an argument list, given between vertical bars, followed by an expression: let is_even = |x| x % 2 == 0;. Rust infers the argument types and return type.

Error Handling

Ordinary errors are handled using the Result type. Results typically represent problems caused by things outside the program, like erroneous input, a network outage, or a permission problem. That such situations occur is not up to us; even a bug-free program will encounter them from time to time. [...] Panic is for the other kind of error, the kind that should never happen.

In most places where we try something that could fail, we don't want to catch and handle the error immediately. [...] Instead, if an error occurs, we usually want to let our caller deal with it. We want errors to propagate up the call stack. Rust has a ? operator that does this. You can add a ? to any expression that produces a Result [...]. The behavior of ? depends on whether this function returns a success result or an error result:

  • On success, it unwraps the Result to get the success value inside.
  • On error, it immediately returns from the enclosing function, passing the error result up the call chain. To ensure that this works, ? can only be used on a Result in functions that have a Result return type.

? also works similarly with the Option type. In a function that returns Option, you can use ? to unwrap a value and return early in the case of None.

All of the standard library error types can be converted to the type Box<dyn std::error::Error + Send + Sync + 'static>. This is a bit of a mouthful, but dyn std::error::Error represents "any error", and Send + Sync + 'static makes it safe to pass between threads, which you'll often want.

Crates and Modules

Rust programs are made of crates. Each crate is a complete, cohesive unit: all the source code for a single library or executable, plus any associated tests, examples, tools, configuration, and other junk.

When we run cargo build, Cargo starts by downloading source code for the specified versions of these crates from crates.io. Then, it reads those crates' Cargo.toml files, downloads their dependencies, and so on recursively.

The collection of all these dependency relationships, which tells Cargo everything it needs to know about what crates to build and in what order, is known as the dependency graph of the crate.

Once it has the source code, Cargo compiles all the crates. It runs rustc, the Rust compiler, once for each crate in the project's dependency graph. When compiling libraries, Cargo uses the --crate-type lib option. This tells rustc not to look for a main() function but instead to produce an .rlib file containing compiled code that can be used to create binaries and other .rlib files.

When compiling a program, Cargo uses --crate-type bin, and the result is a binary executable for the target platform [...].

[...] cargo build --release produces an optimized build. Release builds run faster, but they take longer to compile, they don't check for integer overflow, they skip debug_assert!() assertions, and the stack traces they generate on panic are generally less reliable.

To evolve without breaking existing code, Rust uses editions. The 2015 edition of Rust is compatible with Rust 1.0. The 2018 edition changed async and await into keywords, streamlined the module system, and introduced various other language changes that are incompatible with the 2015 edition. Each crate indicates which edition of Rust it is written in with a line like this in the [package] section atop its Cargo.toml file: edition = "2018". If that keyword is absent, the 2015 edition is assumed [...].

Rust promises that the compiler will always accept all extant editions of the language, and programs can freely mix crates written in different editions.

Whereas crates are about code sharing between projects, modules are about code organization within a project. They act as Rust's namespaces, containers for the functions, types, constants, and so on that make up your Rust program or library.

[A] function [...] marked pub (crate) [...] is available anywhere inside this crate, but isn't exposed as part of the external interface. It can't be used by other crates [...].

Anything that isn't marked pub is private and can only be used in the same module in which it is defined, or any child modules. [...] Marking an item as pub is often known as "exporting" that item.

Modules can nest, and it's fairly common to see a module that's just a collection of submodules.

These three options – modules in their own file, modules in their own directory with a mod.rs, and modules in their own file with a supplementary directory containing submodules – give the module system enough flexibility to support almost any project structure you might desire.

The keywords super and crate have a special meaning in paths: super refers to the parent module, and crate refers to the crate containing the current module.

Submodules can access private items in their parent modules with use super::*.

[...] the standard library std is automatically linked with every project. This means you can always go with use std::whatever or refer to std items by name, like std::mem::swap() inline in your code. Furthermore, a few particularly handy names, like Vec and Result, are included in the standard prelude and automatically imported. Rust behaves as though every module, including the root module, started with the following import: use std::prelude::v1::*;.

Any item in a Rust program can be decorated with attributes. Attributes are Rust's catchall syntax for writing miscellaneous instructions and advice to the compiler.

Conditional compilation is [a] feature that's written using an attribute, namely, #[cfg].

To attach an attribute to a whole crate, add it at the top of the main.rs or lib.rs file, before any items, and write #! instead of # [...].

[...] the #![feature] attribute is used to turn on unstable features of the Rust language and libraries, features that are experimental, and therefore might have bugs or might be changed or removed in the future.

[...] a simple unit testing framework is built into Rust. Tests are ordinary functions marked with the #[test] attribute.

Integration tests are .rs files that live in a tests directory alongside your project's src directory. When you run cargo test, Cargo compiles each integration test as a separate, standalone crate, linked with your library and the Rust test harness.

The command cargo doc creates HTML documentation for your library.

[...] when Rust sees comments that start with three slashes, it treats them as a #[doc] attribute instead. [...] When you compile a library or binary, these attributes don't change anything, but when you generate documentation, doc comments on public features are included in the output. Likewise, comments starting with //! are treated as #![doc] attributes and are attached to the enclosing feature, typically a module or crate.

The content of a doc comment is treated as Markdown [...]. You can also include HTML tags, which are copied verbatim into the formatted documentation. One special feature of doc comments in Rust is that Markdown links can use Rust item paths, like leaves::Leaf, instead of relative URLs, to indicate what they refer to. Cargo will look up what the path refers to and substitute a link to the right place in the right documentation page.

[...] an interesting thing happens when you include a block of code in a doc comment. Rust automatically turns it into a test. [...] When you run tests in a Rust library crate, Rust checks that all the code that appears in your documentation actually runs and works. It does this by taking each block of code that appears in a doc comment, compiling it as a separate executable crate, linking it with your library, and running it.

The idea behind doc-tests is not to put all your tests into comments. Rather, you write the best possible documentation, and Rust makes sure the code samples in your documentation actually compile and run.

There are several ways to specify dependencies [...] First of all, you may want to use dependencies that aren't published on crates.io at all. One way to do this is by specifying a Git repository URL [...]. Another alternative is to specify a directory that contains the crate's source code: image = { path = "vendor/image" }.

When you write something like image = "0.13.0" in your Cargo.toml file, Cargo interprets this rather loosely. It uses the most recent version of image that is considered compatible with version 0.13.0. The compatibility rules are adapted from Semantic Versioning.

  • A version number that starts with 0.0 is so raw that Cargo never assumes it's compatible with any other version.
  • A version number that starts with 0.x, where x is nonzero, is considered compatible with other point releases in the 0.x series.
  • Once a project reaches 1.0, only new major versions break compatibility. So if you ask for version 2.0.1, Cargo might use 2.17.99 instead, but not 3.0.

The first time you build a project, Cargo outputs a Cargo.lock file that records the exact version of every crate it used. Later builds will consult this file and continue to use the same versions. Cargo upgrades to newer versions only when you tell it to, either by manually bumping up the version number in your Cargo.toml file or by running cargo update. [...] cargo update only upgrades to the latest versions that are compatible with what you've specified in Cargo.toml.

[...] if your project is an executable, you should commit Cargo.lock to version control. That way, everyone who builds your project will consistently get the same versions. The history of your Cargo.lock file will record your dependency updates.

Structs

A struct assembles several values of assorted types together into a single value so you can deal with them as a unit. Given a struct, you can read and modify its individual components. And a struct can have methods associated with it that operate on its components.

Rust has three kinds of struct types, named-field, tuple-like, and unit-like, which differ in how you refer to their components: a named-field struct gives a name to each component, whereas a tuple-like struct identifies them by the order in which they appear. Unit-like structs have no components at all [...].

When creating a named-field struct value, you can use another struct of the same type to supply values for fields you omit. In a struct expression, if the named fields are followed by .. EXPR, then any fields not mentioned take their values from EXPR [...].

An impl block is simply a collection of fn definitions, each of which becomes a method on the struct type named at the top of the block.

Functions defined in an impl block are called associated functions, since they're associated with a specific type.

Rust passes a method the value it's being called on as its first argument, which must have the special name self. Since self's type is obviously the one named at the top of the impl block, or a reference to that, Rust lets you omit the type, and write self, &self, or &mut self [...].

An impl block for a given type can also define functions that don't take self as an argument at all. These are still associated functions, sine they're in an impl block, but they're not methods, since they don't take a self argument. To distinguish them from methods, we call them type-associated functions.

[...] Rust structs can be generic, meaning that their definition is a template into which you can plug whatever types you like.

You can read the <T> in Queue<T> as "for any element type T...".

In generic struct definitions, the type names used in <angle brackets> are called type parameters.

You can read the line impl<T> Queue<T> as something like, "for any type T, here are some associated functions available on Queue<T>". Then, you can use the type parameter T as a type in the associated function definitions. The syntax may look a bit redundant, but the impl<T> makes it clear that the impl block covers any type T, which distinguishes it from an impl block written for one specific kind of Queue [...].

A Cell<T> is a struct that contains a single private value of type T. The only special thing about a Cell is that you can get and set the field even if you don't have mut access to the Cell itself.

Like Cell<T>, RefCell<T> is a generic type that contains a single value of type T. Unlike Cell, RefCell supports borrowing references to its T value.

[...] normally, when you borrow a reference to a variable, Rust checks at compile time to ensure that you're using the reference safely. If the checks fail, you get a compiler error. RefCell enforces the same rule using run-time checks. So if you're breaking the rules, you get a panic (or an Err, for try_borrow and try_borrow_mut).

Enums and Patterns

In memory, values of C-style enums are stored as integers. [...] By default, Rust stores C-style enums using the smallest built-in integer type that can accomodate them.

In all, Rust has three kinds of enum variant, echoing the three kinds of struct [...]. Variants with no data correspond to unit-like structs. Tuple variants look and function just like tuple structs. Struct variants have curly braces and named fields. A single enum can have variants of all three kinds.

In memory, enums with data are stored as a small integer tag, plus enough memory to hold all the fields of the largest variant. The tag field is for Rust's internal use. It tells which constructor created the value and therefore which fields it has.

The thing to remember is that patterns and expressions are natural opposites. The expression (x, y) makes two values into a new tuple, but the pattern (x, y) does the opposite: it matches a tuple and breaks out the two values. It's the same with &. In an expression, & creates a reference. In a pattern, & matches a reference.

Traits and Generics

Rust supports polymorphism with two related features: traits and generics.

Traits are Rust's take on interfaces or abstract base classes.

Generics are the other flavor of polymorphism in Rust. Like a C++ template, a generic function or type can be used with values of many different types.

The <T: Ord> in [fn min<T: Ord>(value1: T, value2: T) -> T] means that min can be used with arguments of any type T that implements the Ord trait – that is, any ordered type. A requirement like this is called a bound, because it sets limits on which types T could possibly be. The compiler generates custom machine code for each type T that you actually use.

Generics and traits are closely related: generic functions use traits in bounds to spell out what types of arguments they can be applied to.

A trait is a feature that any given type may or may not support. Most often, a trait represents a capability: something a type can do.

There is one unusual rule about trait methods: the trait itself must be in scope. Otherwise, all its methods are hidden. [...] Rust has this rule because [...] you can use traits to add new methods to any type – even standard library types like u32 and str. Third-party crates can do the same thing. Clearly, this could lead to naming conflicts! But since Rust makes you import the traits you plan to use, crates are free to take advantage of this superpower.

The reason Clone and Iterator methods work without any special imports is that they're always in scope by default: they're part of the standard prelude, names that Rust automatically imports into every module.

A reference to a trait type, like writer [in let writer: &mut dyn Write = &mut buf;], is called a trait object. Like any other reference, a trait object points to some value, it has a lifetime, and it can be either mut or shared. What makes a trait object different is that Rust usually doesn't know the type of the referent at compile time. So a trait object includes a little extra information about the referent's type. This is strictly for Rust's own use behind the scenes: when you call writer.write(data), Rust needs the type information to dynamically call the right write method depending on the type of *writer.

In memory, a trait object is a fat pointer consisting of a pointer to the value, plus a pointer to a table representing that value's type. Each trait object therefore takes up two machine words [...]. C++ has this kind of run-time type information as well. It's called a virtual table, or vtable. In Rust, as in C++, the vtable is generated once, at compile time, and shared by all objects of the same type.

[<W: Write>] is a type parameter. It means that throughout the body of this function, W stands for some type that implements the Write trait. Type parameters are usually single uppercase letters, by convention.

[...] Rust infers the type W from the type of the argument. This process is known as monomorphization, and the compiler handles it all automatically.

The choice of whether to use trait objects or generic code is subtle. Since both features are based on traits, they have a lot in common. Trait objects are the right choice whenever you need a collection of values of mixed types, all together. [...] Another possible reason to use trait objects is to reduce the total amount of compiled code. Rust may have to compile a generic function many times, once for each type it's used with. This could make the binary large [...].

[...] generics have three important advantages over trait objects, with the result that in Rust, generics are the more common choice. The first advantage is speed. Note the absence of the dyn keyword in generic function signatures. Because you specify the types at compile time, either explicitly or through type inference, the compiler knows exactly which [...] method to call. The dyn keyword isn't used because there are no trait objects – and thus no dynamic dispatch – involved. [...] The second advantage of generics is that not every trait can support trait objects. Traits support several features, such as associated functions, that work only with generics: they rule out trait objects entirely. [...] The third advantage of generics is that it's easy to bound a generic type parameter with several traits at once [...]. Trait objects can't do this: types like &mut (dyn Debug + Hash + Eq) aren't supported in Rust.

Defining a trait is simple. Give it a name and list the type signatures of the trait methods. [...] To implement a trait, use the syntax impl TraitName for Type. [...] Everything defined in a trait impl must actually be a feature of the trait; if we wanted to add a helper method [...], we would have to define it in a separate impl block.

Rust lets you implement any trait on any type, as long as either the trait or the type is introduced in the current crate. [...] This is called the orphan rule. It helps Rust ensure that trait implementations are unique. Your code can't impl Write for u8, because both Write and u8 are defined in the standard library. If Rust let crates do that, there could be multiple implementations of Write for u8, in different crates, and Rust would have no reasonable way to decide which implementation to use for a given method call.

A trait can use the keyword Self as a type. [...] Using Self as the return type [...] means that the type of x.clone() is the same as the type of x, whatever that might be.

A trait that uses the Self type is incompatible with trait objects.

We can declare that a trait is an extension of another trait. [...] The phrase trait Creature: Visible means that all creatures are visible. Every type that implements Creature must also implement the Visible trait.

[...] a subtrait does not inherit the associated items of its supertrait; each trait still needs to be in scope if you want to call its methods. In fact, Rust's subtraits are really just a shorthand for a bound on Self.

In most object-oriented languages, interfaces can't include static methods or constructors, but traits can include type-associated functions, Rust's analog to static methods.

Rust has a standard Iterator trait, defined like this:

pub trait Iterator {
  type Item;

  fn next(&mut self) -> Option<Self::Item>;
  ...
}
The first feature of this trait, type Item;, is an associated type. Each type that implements Iterator must specify what type of item it produces. The second feature, the next() method, uses the associated type in its return value. next() returns an Option<Self::Item>: either Some(item), the next value in the sequence, or None when there are no more values to visit. The type is written as Self::Item, not just plain Item, because Item is a feature of each type of iterator, not a standalone type.

Associated types are perfect for cases where each implementation has one specific related type [...].

Generic traits get a special dispensation when it comes to the orphan rule: you can implement a foreign trait for a foreign type, so long as one of the trait's type parameters is a type defined in the current crate. So, if you've defined WindowSize yourself, you can implement Mul<WindowSize> for f64, even though you didn't define either Mul or f64.

impl Trait allows us to "erase" the type of a return value, specifying only the trait or traits it implements, without dynamic dispatch or a heap allocation.

Using impl Trait means that you can change the actual type being returned in the future as long as it still implements [the trait], and any code calling the function will continue to compile without an issue. This provides a lot of flexibility for library authors, because only the relevant functionality is encoded in the type signature.

Like structs and enums, traits can have associated constants. You can declare a trait with an associated constant using the same syntax as for a struct or enum. [...] Associated consts in traits have a special power, though. Like associated types and functions, you can declare them but not give them a value. [...] Then, implementors of the trait can define these values.

Operator Overloading

You can make your own types support arithmetic and other operators, too, just by implementing a few built-in traits. This is called operator overloading [...].

In Rust, the expression a + b is actually shorthand for a.add(b), a call to the add method of the standard library's std::ops::Add trait.

[...] Rust has two unary operators that you can customize [...]. All of Rust's signed numeric types implement std::ops::Neg, for the unary negation operator -; the integer types and bool implement std::ops::Not, for the unary complement operator !.

A compound assignment expression is one like x += y or x &= y: it takes two operands, performs some operation on them like addition or a bitwise AND, and stores the result back in the left operand. In Rust, the value of a compound assignment expression is always (), never the value stored. Many languages have operators like these and usually define them as shorthand for expressions like x = x + y or x = x & y. However, Rust doesn't take that approach. Instead, x += y is shorthand for the method call x.add_assign(y), where add_assign is the sole method of the std::ops::AddAssign trait.

The built-in trait for a compound assignment operator is completely independent of the built-in trait for the corresponding binary operator. Implementing std::ops::Add does not automatically implement std::ops::AddAssign; if you want Rust to permit your type as the lefthand operand of a += operator, you must implement AddAssign yourself.

Rust's equality operators, == and !=, are shorthand for calls to the std::cmp::PartialEq trait's eq and ne methods.

Since the ne method has a default definition, you only need to define eq to implement the PartialEq trait [...].

Implementations of PartialEq are almost always of the form shown here: they compare each field of the left operand to the corresponding field of the right. These get tedious to write, and equality is a common operation to support, so if you ask, Rust will generate an implementation of PartialEq for you automatically. Simply add PartialEq to the type definition's derive attribute [...].

Rust specifies the behavior of the ordered comparison operators <, >, <=, and >= all in terms of a single trait, std::cmp::PartialOrd. [...] The only method of PartialOrd you must implement yourself is partial_cmp.

If you know that values of two types are always ordered with respect to each other, then you can implement the stricter std::cmp::Ord trait. [...] The cmp method here simply returns an Ordering, instead of an Option<Ordering> like partial_cmp: cmp always declares its arguments equal or indicates their relative order. Almost all types that implement PartialOrd should also implement Ord. In the standard library, f32 and f64 are the only exceptions to this rule.

You can specify how an indexing expression like a[i] works on your type by implementing the std::ops::Index and std::ops::IndexMut traits. Arrays support the [] operator directly, but on any other type, the expression a[i] is normally shorthand for *a.index(i), where index is a method of the std::ops::Index trait. However, if the expression is being assigned to or borrowed mutably, it's instead shorthand for *a.index_mut(i), a call to the method of the std::ops::IndexMut trait.

Utility Traits

When a value's owner goes away, we say that Rust drops the value. Dropping a value entails freeing whatever other values, heap storage, and system resources the value owns. Drops occur under a variety of circumstances: when a variable goes out of scope; at the end of an expression statement, when you truncate a vector, removing elements from its end; and so on. For the most part, Rust handles dropping values for you automatically.

[...] if you want, you can customize how Rust drops values of your type by implementing the std::ops::Drop trait. [...] When a value is dropped, if it implements std::ops::Drop, Rust calls its drop method, before proceeding to drop whatever values its fields or elements own, as it normally would. This implicit invocation of drop is the only way to call that method; if you try to invoke it explicitly yourself, Rust flags that as an error.

A sized type is one whose values all have the same size in memory. Almost all types in Rust are sized: every u64 takes eight bytes, every (f32, f32, f32) tuple twelve. Even enums are sized: no matter which variant is actually present, an enum always occupies enough space to hold its largest variant.

All sized types implement the std::marker::Sized trait, which has no methods or associated types. Rust implements it automatically for all types to which it applies; you can't implement it yourself. The only use for Sized is as a bound for type variables: a bound like T: Sized requires T to be a type whose size is known at compile time. Traits of this sort are called marker traits, because the Rust language itself uses them to mark certain types as having characteristics of interest.

Rust can't store unsized values in variables or pass them as arguments. You can only deal with them through pointers like &str or Box<dyn Write>, which themselves are sized. [...] a pointer to an unsized value is always a fat pointer, two words wide [...].

The std::clone::Clone trait is for types that can make copies of themselves. [...] The clone method should construct an independent copy of self and return it.

Cloning a value usually entails allocating copies of anything it owns, as well, so a clone can be expensive, in both time and memory. [...] This is why Rust doesn't just clone values automatically, but instead requires you to make an explicit method call.

[...] a type is Copy if it implements the std::marker::Copy marker trait [...].

[...] because Copy is a marker trait with special meaning to the language, Rust permits a type to implement Copy only if a shallow byte-for-byte copy is all it needs. Types that own any other resources, like heap buffers or operating system handles, cannot implement Copy. Any type that implements the Drop trait cannot be Copy. Rust presumes that if a type needs special cleanup code, it must also require special copying code and thus can't be Copy.

You can specify how dereferencing operators like * and . behave on your types by implementing the std::ops::Deref and std::ops::DerefMut traits. Pointer types like Box<T> and Rc<T> implement these traits so that they can behave as Rust's built-in pointer types do.

The deref and deref_mut methods take a &Self reference and return a &Self::Target reference. Target should be something that Self contains, owns, or refers to: for Box<Complex> the Target type is Complex.

Since deref takes a &Self reference and returns a &Self::Target reference, Rust uses this to automatically convert references of the former type into the latter. In other words, if inserting a deref call would prevent a type mismatch, Rust inserts one for you. Implementing DerefMut enables the corresponding conversion for mutable references. These are called the deref coercions: one type is being "coerced" into behaving as another.

Some types have a reasonably obvious default value: the default vector or string is empty, the default number is zero, the default Option is None, and so on. Types like this can implement the std::default::Default trait. [...] The default method simply returns a fresh value of type Self.

When a type implements AsRef<T>, that means you can borrow a &T from it efficiently. AsMut is the analogue for mutable references.

The std::borrow::Borrow trait is similar to AsRef: if a type implements Borrow<T>, then its borrow method efficiently borrows a &T from it. But Borrow imposes more restrictions: a type should implement Borrow<T> only when a &T hashes and compares the same way as the value it's borrowed from.

The std::convert::From and std::convert::Into traits represent conversions that consume a value of one type and return a value of another. Whereas the AsRef and AsMut traits borrow a reference of one type from another, From and Into take ownership of their argument, transform it, and then return ownership of the result back to the caller.

Although the traits simply provide two ways to do the same thing, they lend themselves to different uses. You generally use Into to make your functions more flexible in the arguments they accept. [...] The From trait, however, plays a different role. The from method serves as a generic constructor for producing an instance of a type from some other single value.

Given an appropriate From implementation, the standard library automatically implements the corresponding Into trait. When you define your own type, if it has single-argument constructors, you should write them as implementations of From<T> for the appropriate types; you'll get the corresponding Into implementations for free.

From and Into are infallible traits – their API requires that conversions will not fail.

TryFrom and TryInto are the fallible cousins of From and Into and are similarly reciprocal; implementing TryFrom means that TryInto is implemented as well.

Given a reference, the usual way to produce an owned copy of its referent is to call clone, assuming the type implements std::clone::Clone. But what if you want to clone a &str or a &[i32]? What you probably want is a String or a Vec<i32>, but Clone's definition doesn't permit that: by definition, cloning a &T must always return a value of type T, and str and [u8] are unsized; they aren't even types that a function could return. The std::borrow::ToOwned trait provides a slightly looser way to convert a reference to an owned value. [...] Unlike clone, which must return exactly Self, to_owned can return anything you could borrow a &Self from [...].

[...] in some cases you cannot decide whether to borrow or own until the program is running; the std::borrow::Cow type (for "clone on write") provides one way to do this. [...] A Cow<B> either borrows a shared reference to a B or owns a value from which we could borrow such a reference.

Closures

A closure can use data that belongs to an enclosing function.

The move keyword tells Rust that a closure doesn't borrow the variables it uses: it steals them.

In fact, every closure you write has its own type, because a closure may contain data: values either borrowed or stolen from enclosing scopes. This could be any number of variables, in any combination of types. So every closure has an ad hoc type created by the compiler, large enough to hold that data. No two closures have exactly the same type. But every closure implements an Fn trait [...].

Since every closure has its own type, code that works with closures usually needs to be generic [...].

Closures that drop values [...] are not allowed to have Fn. They are, quite literally, no Fn at all. They implement a less powerful trait, FnOnce, the trait of closures that can be called once. The first time you call a FnOnce closure, the closure itself is used up.

Rust considers non-mut values safe to share across threads. But it wouldn't be safe to share non-mut closures that contain mut data: calling such a closure from multiple threads could lead to all sorts of race conditions as multiple threads try to read and write the same data at the same time. Therefore, Rust has one more category of closure, FnMut, the category of closures that write.

Every Fn meets the requirements for FnMut, and every FnMut meets the requirements for FnOnce. [...] they're not three separate categories. Instead, Fn() is a subtrait of FnMut(), which is a subtrait of FnOnce(). This makes Fn the most exclusive and most powerful category. FnMut and FnOnce are broader categories that include closures with usage restrictions.

[...] closures are represented as structs that contain either the values (for move closures) or references to the values (for non-move closures) of the variables they capture. The rules for Copy and Clone on closures are just like the Copy and Clone rules for regular structs. A non-move closure that doesn't mutate variables holds only shared references, which are both Clone and Copy, so that closure is both Clone and Copy as well. [...] On the other hand, a non-move closure that does mutate values has mutable references within its internal representation. Mutable references are neither Clone nor Copy, so neither is a closure that uses them. [...] For a move closure, the rules are even simpler. If everything a move closure captures is Copy, it's Copy. If everything it captures is Clone, it's Clone.

Iterators

An iterator is a value that produces a sequence of values, typically for a loop to operate on.

An iterator is any value that implements the std::iter::Iterator trait. [...] The next method either returns Some(v), where v is the iterator's next value, or returns None to indicate the end of the sequence.

If there's a natural way to iterate over some type, that type can implement std::iter::IntoIterator, whose into_iter method takes a value and returns an iterator over it. [...] We call any type that implements IntoIterator an iterable, because it's something you could iterate over if you asked.

Under the hood, every for loop is just shorthand for calls to IntoIterator and Iterator methods. [...] The for loop uses IntoIterator::into_iter to convert its operand [...] into an iterator and then calls Iterator::next repeatedly. Each time that returns Some(element), the for loop executes its body; and if it returns None, the loop finishes.

Most collection types provide iter and iter_mut methods that return the natural iterators over the type, producing a shared or mutable reference to each item. Array slices like &[T] and &mut [T] have iter and iter_mut methods too. These methods are the most common way to get an iterator, if you're not going to let a for loop take care of it for you.

One simple and general way to produce a sequence of values is to provide a closure that returns them. Given a function returning Option<T>, std::iter::from_fn returns an iterator that simply calls the function to produce its items.

If each item depends on the one before, the std::iter::successors function works nicely. You provide an initial item and a function that takes one item and returns an Option of the next. If it returns None, the iteration ends.

Many collection types provide a drain method that takes a mutable reference to the collection and returns an iterator that passes ownership of each element to the consumer. However, unlike the into_iter() method, which takes the collection by value and consumes it, drain merely borrows a mutable reference to the collection, and when the iterator is dropped, it removes any remaining elements from the collection and leaves it empty.

The Iterator trait's map adapter lets you transform an iterator by applying a closure to its items. The filter adapter lets you filter out items from an iterator, using a closure to decide which to keep and which to drop.

[...] simply calling an adapter on an iterator doesn't consume any items; it just returns a new iterator, ready to produce its own items by drawing from the first iterator as needed. In a chain of adapters, the only way to make any work actually get done is to call next on the final iterator.

The filter_map adapter is similar to map except that it lets its closure either transform the item into a new item (as map does) or drop the item from the iteration. [...] When the closure returns None, the item is dropped from the iteration; when it returns Some(b), then b is the next item the filter_map iterator produces.

You can think of the flat_map adapter as continuing in the same vein as map and filter_map, except that now the closure can return not just one item (as with map) or zero or one items (as with filter_map), but a sequence of any number of items. The flat_map iterator produces the concatenation of the sequences the closure returns.

The flatten adapter concatenates an iterator's items, assuming each item is itself an iterable.

The Iterator trait's take and take_while adapters let you end an iteration after a certain number of items or when a closure decides to cut things off.

The Iterator trait's skip and skip_while methods are the complement of take and take_while: they drop a certain number of items from the beginning of an iteration, or drop items until a closure finds one acceptable, and then pass the remaining items through unchanged.

A peekable iterator lets you peek at the next item that will be produced without actually consuming it. You can turn any iterator into a peekable iterator by calling the Iterator trait's peekable method.

Once an Iterator has returned None, the trait doesn't specify how it ought to behave if you call its next method again. Most iterators just return None again, but not all. [...] The fuse adapter takes any iterator and produces one that will definitely continue to return None once it has done so the first time.

Some iterators are able to draw items from both ends of the sequence. You can reverse such iterators by using the rev adapter. [...] Such iterators can implement the std::iter::DoubleEndedIterator trait, which extends Iterator.

The inspect adapter is handy for debugging pipelines of iterator adapters, but it isn't used much in production code. It simply applies a closure to a shared reference to each item and then passes the item through. The closure can't affect the items, but it can do things like print them or make assertions about them.

The chain adapter appends one iterator to another. More precisely, i1.chain(i2) returns an iterator that draws items from i1 until it's exhausted and then draws items from i2.

The Iterator trait's enumerate adapter attaches a running index to the sequence, taking an iterator that produces items A, B, C, ... and returning an iterator that produces pairs (0, A), (1, B), (2, C), ....

The zip adapter combines two iterators into a single iterator that produces pairs holding one value from each iterator [...]. The zipped iterator ends when either of the two underlying iterators ends.

An iterator's by_ref method borrows a mutable reference to the iterator so that you can apply adapters to the reference. When you're done consuming items from these adapters, you drop them, the borrow ends, and you regain access to your original iterator.

The cloned adapter takes an iterator that produces references and returns an iterator that produces values cloned from those references, much like iter.map(|item| item.clone()). Naturally, the referent type must implement Clone. [...] The copied adapter is the same idea, but more restrictive: the referent type must implement Copy.

The cycle adapter returns an iterator that endlessly repeats the sequence produced by the underlying iterator. The underlying iterator must implement std::clone::Clone so that cycle can save its initial state and reuse it each time the cycle starts again.

The count method draws items from an iterator until it returns None and tells you how many it got.

The sum and product methods compute the sum or product of the iterator's items, which must be integers or floating-point numbers.

The min and max methods on Iterator return the least or greatest item the iterator produces. The iterator's item type must implement std::cmp::Ord so that items can be compared with one another.

The max_by and min_by methods return the maximum or minimum item the iterator produces, as determined by a comparison function you provide.

The max_by_key and min_by_key methods on Iterator let you select the maximum or minimum item as determined by a closure applied to each item.

Although iterators do not support Rust's comparison operators, they do provide methods like eq and lt that do the same job, drawing pairs of items from the iterators and comparing them until a decision can be reached.

The any and all methods apply a closure to each item the iterator produces and return true if the closure returns true for any item, or for all the items.

The position method applies a closure to each item from the iterator and returns the index of the first item for which the closure returns true. More precisely, it returns an Option of the index: if the closure returns true for no item, position returns None. [...] The rposition method is the same, except that it searches from the right.

The fold method is a very general tool for accumulating some sort of result over the entire sequence of items an iterator produces. Given an initial value, which we'll call the accumulator, and a closure, fold repeatedly applies the closure to the current accumulator and the next item from the iterator. The value the closure returns is taken as the new accumulator, to be passed to the closure with the next item. The final accumulator value is what fold itself returns. [...] The rfold method is the same as fold, except that it requires a double-ended iterator, and processes its items from last to first.

The try_fold method is the same as fold, except that the process of iteration can exit early, without consuming all the values from the iterator. The closure you pass to try_fold must return a Result: if it returns Err(e), try_fold returns immediately with Err(e) as its value. Otherwise, it continues folding with the success value. The closure can also return an Option: returning None exits early, and the result is an Option of the folded value. [...] The try_rfold method, as its name suggests, is the same as try_fold, except that it draws values from the back, instead of the front, and requires a double-ended iterator.

The nth method takes an index n, skips that many items from the iterator, and returns the next item, or None if the sequence ends before that point. [...] The nth_back method is much the same, except that it draws from the back of a double-ended iterator.

The last method returns the last item the iterator produces, or None if it's empty.

The find method draws items from an iterator, returning the first item for which the given closure returns true, or None if the sequence ends before a suitable item is found. [...] The rfind method is similar, but it requires a double-ended iterator and searches values from back to front [...].

[find_map] is just like find, except that instead of returning bool, the closure should return an Option of some value. find_map returns the first Option that is Some.

[collect] can build any kind of collection from Rust's standard library, as long as the iterator produces a suitable item type.

If a type implements the std::iter::Extend trait, then its extend method adds an iterable's items to the collection.

The partition method divides an iterator's items among two collections, using a closure to decide where each item belongs.

The for_each method simply applies a closure to each item. [...] If your closure needs to be fallible or exit early, you can use try_for_each.

Collections

The easiest way to create a vector is to use the vec! macro.

[...] a vector has three fields: the length, the capacity, and a pointer to a heap allocation where the elements are stored.

All of a vector's elements are stored in a contiguous, heap-allocated chunk of memory. The capacity of a vector is the maximum number of elements that would fit in this chunk. Vec normally manages the capacity for you, automatically allocating a larger buffer and moving the elements into it when more space is needed.

Vec supports efficiently adding and removing elements only at the end. When a program needs a place to store values that are "waiting in line", Vec can be slow. Rust's std::collections::VecDeque<T> is a deque (pronounced "deck"), a double-ended queue. It supports efficient add and remove operations at both the front and the back.

Like a Vec, [VecDeque] has a single heap allocation where elements are stored. Unlike Vec, the data does not always start at the beginning of this region, and it can "wrap around" the end [...]. VecDeque has private fields [...] that it uses to remember where in the buffer the data begins and ends.

A BinaryHeap is a collection whose elements are kept loosely organized so that the greatest value always bubbles up to the front of the queue.

[...] BinaryHeap is not limited to numbers. It can hold any type of value that implements the Ord built-in trait. This makes BinaryHeap useful as a work queue. You can define a task struct that implements Ord on the basis of priority so that higher-priority tasks are Greater than lower-priority tasks. Then, create a BinaryHeap to hold all pending tasks. Its .pop() method will always return the most important item [...].

[...] BinaryHeap is iterable, and it has an .iter() method, but the iterators produce the heap's elements in an arbitrary order, not from greatest to least. To consume values from a BinaryHeap in order of priority, use a while loop.

A map is a collection of key-value pairs (called entries). No two entries have the same key, and the entries are kept organized so that if you have a key, you can efficiently look up the corresponding value in a map.

Rust offers two map types: HashMap<K, V> and BTreeMap<K, V>. The two share many of the same methods; the difference is in how the two keep entries arranged for fast lookup.

A HashMap stores the keys and values in a hash table, so it requires a key type K that implements Hash and Eq [...]. All keys, values, and cached hash codes are stored in a single heap-allocated table. Adding entries eventually forces the HashMap to allocate a larger table and move all the data into it.

A BTreeMap stores the entries in order by key, in a tree structure, so it requires a key type K that implements Ord.

A map can also be queried using square brackets: map[&key]. That is, maps implement the Index built-in trait. However, this panics if there is not already an entry for the given key, like an out-of-bounds array access, so use this syntax only if the entry you're looking up is sure to be populated.

Sets are collections of values arranged for fast membership testing. [...] A set never contains multiple copies of the same value.

[...] behind the scenes, a set is like a map with only keys, rather than key-value pairs. In fact, Rust's two set types, HashSet<T> and BTreeSet<T>, are implemented as thin wrappers around HashMap<T, ()> and BTreeMap<T, ()>.

&set1 & &set2 returns a new set that's the intersection of set1 and set2.

&set1 | &set2 returns a new set containing [...] values that are in either set1 or set2.

&set1 - &set2 returns a new set containing [values that in set1 but not in set2].

&set1 ^ &set2 returns a new set containing [values that are in either set1 or set2, but not both].

std::hash::Hash is the standard library trait for hashable types. HashMap keys and HashSet elements must implement both Hash and Eq.

One principle of the standard library is that a value should have the same hash code regardless of where you store it or how you point to it. Therefore, a reference has the same hash code as the value it refers to, and a Box has the same hash code as the boxed value. A vector vec has the same hash code as the slice containing all its data, &vec[..]. A String has the same hash code as a &str with the same characters.

Rust's default hashing algorithm is a well-known algorithm called SipHash-1-3. SipHash is fast, and it's very good at minimizing hash collisions. In fact, it's a cryptographic algorithm: there's no known efficient way to generate SipHash-1-3 collisions.

Strings and Text

The Rust String and str types represent text using the UTF-8 encoding form. UTF-8 encodes a character as a sequence of one to four bytes.

Since UTF-8 encodes code points 0 through 0x7f as nothing more than the bytes 0 through 0x7f, a range of bytes holding ASCII text is valid UTF-8. And if a string of UTF-8 includes only characters from ASCII, the reverse is also true: the UTF-8 encoding is valid ASCII.

Unicode stores characters in the order in which they would normally be written or read, so the initial bytes of a string holding, say, Hebrew text encode the character that would be written at the right.

A Rust char is a 32-bit value holding a Unicode code point. A char is guaranteed to fall in the range from 0 to 0xd7ff or in the range 0xe000 to 0x10ffff; all the methods for creating and manipulating char values ensure that this is true.

Rust's String and str types are guaranteed to hold only well-formed UTF-8. The library ensures this by restricting the ways you can create String and str values and the operations you can perform on them, such that the values are well-formed when introduced and remain so as you work with them. All their methods protect this guarantee: no safe operation on them can introduce ill-formed UTF-8.

Rust places text-handling methods on either str or String depending on whether the method needs a resizable buffer or is content just to use the text in place. Since String dereferences to &str, every method defined on str is directly available on String as well.

A String is implemented as a wrapper around a Vec<u8> that ensures the vector's contents are always well-formed UTF-8.

When a standard library function needs to search, match, split, or trim text, it accepts several different types to represent what to look for. [...] These types are called patterns, and most operations support them. [...] The standard library supports four main kinds of patterns:

  • A char as a pattern matches that character.
  • A String or &str or &&str as a pattern matches a substring equal to the pattern.
  • A FnMut(char) -> bool closure as a pattern matches a single character for which the closure returns true.
  • A &[char] as a pattern (not a &str, but a slice of char values) matches any single character that appears in the list.

The external regex crate is Rust's official regular expression library.

Input and Output

Rust's standard library features for input and output are organized around three traits, Read, BufRead, and Write:

  • Values that implement Read have methods for byte-oriented input. They're called readers.
  • Values that implement BufRead are buffered readers. They support all the methods of Read, plus methods for reading lines of text and so forth.
  • Values that implement Write support both byte-oriented and UTF-8 text output. They're called writers.

Rust strings are always valid Unicode. Filenames are almost always Unicode in practice, but Rust has to copy somehow with the rare case where they aren't. This is why Rust has std::ffi::OsStr and OsString.

OsStr is a string type that's a superset of UTF-8. Its job is to be able to represent all filenames, command-line arguments, and environment variables on the current system, whether they're valid Unicode or not.

Path is exactly like OsStr, but it adds many handy filename-related methods [...]. Use Path for both absolute and relative paths. For an individual component of a path, use OsStr.

[...] for each string type, there's a corresponding owning type: a String owns a heap-allocated str, a std::ffi::OsString owns a heap-allocated OsStr, and a std::path::PathBuf owns a heap-allocated Path.

Concurrency

The simplest use cases for threads arise when we have several completely independent tasks that we'd like to do at once. For example, suppose we're doing natural language processing on a large corpus of documents. [...] Since each document is processed separately, it's relatively easy to speed this task up by splitting the corpus into chunks and processing each chunk on a separate thread [...]. This pattern is called fork-join parallelism. To fork is to start a new thread, and to join a thread is to wait for it to finish.

The function std::thread::spawn starts a new thread. [...] It takes one argument, an FnOnce closure or function. Rust starts a new thread to run the code of that closure or function. The new thread is a real operating system thread with its own stack [...].

Joining threads is often necessary for correctness, because a Rust program exits as soon as main returns, even if other threads are still running. Destructors are not called; the extra threads are just killed. If this isn't what you want, be sure to join any threads you care about before returning from main.

A channel is a one-way conduit for sending values from one thread to another. In other words, it's a thread-safe queue.

[Channels are] something like Unix pipes: one end is for sending data, and the other is for receiving. The two ends are typically owned by two different threads. But whereas Unix pipes are for sending bytes, channels are for sending Rust values. sender.send(item) puts a single value into the channel; receiver.recv() removes one. Ownership is transferred from the sending thread to the receiving thread. If the channel is empty, receiver.recv() blocks until a value is sent.

The mpsc part of std::sync::mpsc stands for multiproducer, single-consumer, a terse description of the kind of communication Rust's channels provide.

[...] Rust's full thread safety story hinges on two built-in traits, std::marker::Send and std::marker::Sync.

  • Types that implement Send are safe to pass by value to another thread. They can be moved across threads.
  • Types that implement Sync are safe to pass by non-mut reference to another thread. They can be shared across threads.
By safe here, we mean [...]: free from data races and other undefined behavior.

A mutex (or lock) is used to force multiple threads to take turns when accessing certain data.

Unlike C++, in Rust the protected data is stored inside the Mutex.

Safe Rust code cannot trigger a data race, a specific kind of bug where multiple threads read and write the same memory concurrently, producing meaningless results.

Valid Rust programs can't have data races, but they can still have other race conditions – situations where a program's behavior depends on timing among threads and may therefore vary from run to run. Some race conditions are benign. Some manifest as general flakiness and incredibly hard-to-fix bugs. Using mutexes in an unstructured way invites race conditions. It's up to you to make sure they're benign.

Whereas a mutex has a single lock method, a read/write lock has two locking methods, read and write. The RwLock::write method is like Mutex::lock. It waits for exclusive, mut access to the protected data. The RwLock::read method provides non-mut access, with the advantage that it is less likely to have to wait, because many threads can safely read at once. With a mutex, at any given moment, the protected data has only one reader or writer (or none). With a read/write lock, it can have either one writer or many readers, much like Rust references generally.

Programs can use condition variables to build their own [blocking API]. In Rust, the std::sync::Condvar type implements condition variables. A Condvar has methods .wait() and .notify_all(); .wait() blocks until some other thread calls .notify_all().

The std::sync::atomic module contains atomic types for lock-free concurrent programming.

Instead of the usual arithmetic and logical operators, atomic types expose methods that perform atomic operations, individual loads, stores, exchanges, and arithmetic operations that happen safely, as a unit, even if other threads are also performing atomic operations that touch the same memory location.

Asynchronous Programming

Rust's approach to supporting asynchronous operations is to introduce a trait, std::future::Future. [...] A Future represents an operation that you can test for completion. A future's poll method never waits for the operation to finish: it always returns immediately. If the operation is complete, poll returns Poll::Ready(output), where output is its final result. Otherwise, it returns Pending. If and when the future is worth polling again, it promises to let us know by invoking a waker, a callback function supplied in the Context.

This is the general pattern: the asynchronous version of any function takes the same arguments as the synchronous version, but the return type has a Future wrapped around it.

Unlike an ordinary function, when you call an asynchronous function, it returns immediately, before the body begins execution at all. Obviously, the call's final return value hasn't been computed yet; what you get is a future of its final value.

You don't need to adjust an asynchronous function's return type; Rust automatically treats async fn f(...) -> T as a function that returns a future of a T, not a T directly.

The future's specific type is generated automatically by the compiler, based on the function's body and arguments. This type doesn't have a name; all you know about it is that it implements Future<Output=R>, where R is the async function's return type.

An await expression takes ownership of the future and then polls it. If it's ready, then the future's final value is the value of the await expression, and execution continues. Otherwise, it returns the Poll::Pending to its own caller.

The ability to suspend execution mid-function and then resume later is unique to async functions. When an ordinary function returns, its stack frame is gone for good. Since await expressions depend on the ability to resume, you can only use them inside async functions.

In addition to asynchronous functions, Rust also supports asynchronous blocks. Whereas an ordinary block statement returns the value of its last expression, an async block returns a future of the value of its last expression. [...] An async block looks like an ordinary block statement, preceded by the async keyword.

In Rust [...] an async call does nothing until you pass it to a function like block_on, spawn, or spawn_local that will poll it and drive the work to completion. These functions, called executors, play the role that other languages cover with a global event loop.

Macros

Macros are a kind of shorthand. During compilation, before types are checked and long before any machine code is generated, each macro call is expanded – that is, it's replaced with some Rust code.

Macro calls are always marked with an exclamation point, so they stand out when you're reading code, and they can't be called accidentally when you meant to call a function.

macro_rules! is the main way to define macros in Rust.

A macro defined with macro_rules! works entirely by pattern matching. The body of a macro is just a series of rules:

( pattern1 ) => ( template1 );
( pattern2 ) => ( template2 );
...

Macro patterns are a mini-language within Rust. They're essentially regular expressions for matching code. But where regular expressions operate on characters, patterns operate on tokens – the numbers, names, punctuation marks, and so forth that are the building blocks of Rust programs.

Macro templates aren't much different from any of a dozen template languages commonly used in web programming. The only difference [...] is that the output is Rust code.

[...] the syntax $( PATTERN ),* is used to match any comma-separated list, where each item in the list matches PATTERN. The * here has the same meaning as in regular expressions ("0 or more") [...]. You can also use + to require at least one match, or ? for zero or one match.

The first job in writing any complex macro is figuring out how to match, or parse, the desired input.

Procedural macros support extending the #[derive] attribute to handle custom derivations [...] as well as creating custom attributes and new macros that are invoked just like the macro_rules! macros [...].

What makes a procedural macro "procedural" is that it's implemented as a Rust function, not a declarative rule set. This function interacts with the compiler through a thin layer of abstraction and can be arbitrarily complex.

Unsafe Code

Unsafe code lets you tell Rust, "I am opting to use features whose safety you cannot guarantee." By marking off a block or function as unsafe, you acquire the ability to call unsafe functions in the standard library, dereference unsafe pointers, and call functions written in other languages like C and C++, among other powers. Rust's other safety checks still apply: type checks, lifetime checks, and bounds checks on indices all occur normally. Unsafe code just enables a small set of additional features.

An unsafe feature is one that imposes a contract: rules that Rust cannot enforce automatically, but which you must nonetheless follow to avoid undefined behavior. A contract goes beyond the usual type checks and lifetime checks, imposing further rules specific to that unsafe feature. Typically, Rust itself doesn't know about the contract at all; it's just explained in the feature's documentation.

When you use unsafe features, you, as the programmer, bear the responsibility for checking that your code adheres to their contracts.

[...] by forcing you to write an unsafe block or function, Rust makes sure you have acknowledged that your code may have additional rules to follow.

An unsafe block looks just like an ordinary Rust block preceded by the unsafe keyword, with the difference that you can use unsafe features in the block.

An unsafe function definition looks like an ordinary function definition preceded by the unsafe keyword. The body of an unsafe function is automatically considered an unsafe block. You may call unsafe functions only within unsafe blocks. This means that marking a function unsafe warns its callers that the function has a contract they must satisfy to avoid undefined behavior.

Essentially, Rust's type checker, borrow checker, and other static checks are inspecting your program and trying to construct proof that it cannot exhibit undefined behavior. When Rust compiles your program successfully, that means it succeeded in proving your code sound. An unsafe block is a gap in this proof: "This code," you are saying to Rust, "is fine, trust me." Whether your claim is true could depend on any part of the program that influences what happens in the unsafe block, and the consequences of being wrong could appear anywhere influenced by the unsafe block. Writing the unsafe keyword amounts to a reminder that you are not getting the full benefit of the language's safety checks.

An unsafe trait is a trait that has a contract Rust cannot check or enforce that implementers must satisfy to avoid undefined behavior. To implement an unsafe trait, you must mark the implementation as unsafe. It is up to you to understand the trait's contract and make sure your type satisfies it.

A raw pointer in Rust is an unconstrained pointer. You can use raw pointers to form all sorts of structures that Rust's checked pointer types cannot, like doubly linked lists or arbitrary graphs of objects. But because raw pointers are so flexible, Rust cannot tell whether you are using them safely or not, so you can dereference them only in an unsafe block.

There are two kinds of raw pointers:

  • A *mut T is a raw pointer to a T that permits modifying its referent.
  • A *const T is a raw pointer to a T that only permits reading its referent.

Rust lays out the elements of an array, slice, or vector as a single contiguous block of memory [...]. Elements are regularly spaced, so that if each element occupies size bytes, then the ith element starts with the i * sizeth byte. One nice consequence of this is that if you have two raw pointers to elements of an array, comparing the pointers gives the same results as comparing the elements' indices: if i < j, then a raw pointer to the ith element is less than a raw pointer to the jth element.

Rust provides many useful abstractions, but ultimately, the software you write is just pushing bytes around. Unions are one of Rust's most powerful features for manipulating those bytes and choosing how they are interpreted. For instance, any collection of 32 bits – 4 bytes – can be interpreted as an integer or as a floating-point number. Either interpretation is valid, though interpreting data meant for one as the other will likely result in nonsense.

Where the fields of a struct refer to different positions in memory, the fields of a union refer to different interpretations of the same sequence of bits. Assigning to a different field simply means overwriting some or all of those bits, in accordance with an appropriate type.

While constructing a union or assigning to its fields is completely safe, reading from any field of a union is always unsafe. [...] This is because, unlike enums, unions don't have a tag. The compiler adds no additional bits to tell variants apart.

Foreign Functions

Rust's foreign function interface (FFI) lets Rust code call functions written in C, and in some cases C++. Since most operating systems offer C interfaces, Rust's foreign function interface allows immediate access to all sorts of low-level facilities.

The common denominator of Rust and C is machine language, so in order to anticipate what Rust values look like to C code, or vice versa, you need to consider their machine-level representations.

[...] Rust's std::os::raw module defines a set of Rust types that are guaranteed to have the same representation as certain C types.

For defining Rust struct types compatible with C structs, you can use the #[repr(C)] attribute. Placing #[repr(C)] above a struct definition asks Rust to lay out the struct's fields in memory the same way a C compiler would lay out the analogous C struct type.

An extern block declares functions or variables defined in some other library that the final Rust executable will be linked with.

To use functions provided by a particular library, you can place a #[link] attribute atop the extern block that names the library Rust should link the executable with.

You can tell Rust where to search for libraries by writing a build script, Rust code that Cargo compiles and runs at build time. [...] To create your build script, add a file named build.rs in the same directory as the Cargo.toml file [...].