Programming Rust
Fast, Safe Systems Development
by Jim Blandy, Jason Orendorff & Leonora F.S. Tindall
- On Amazon
- ISBN: 978-1492052593
What is it about?
Programming Rust is a book about the Rust programming language.
My impression
I found Programming Rust an informative book that complements well the official book The Rust Programming Language. Sometimes I wished there was a bit more meat on the bone, especially toward the end of the book, with the advanced topics. Other times a simple link to the API documentation would have been enough.
My notes
Preface
In short, systems programming is resource-constrained programming. It is programming when every byte and every CPU cycle counts.
Systems Programmers Can Have Nice Things
The Rust language makes you a simple promise: if your program passes the compiler's checks, it is free of undefined behavior. Dangling pointers, double-frees, and null pointer dereferences are all caught at compile time. Array references are secured with a mix of compile-time and run-time checks, so there are no buffer overruns [...]. Further, Rust aims to be both safe and pleasant to use. In order to make stronger guarantees about your program's behavior, Rust imposes more restrictions on your code than C and C++ do, and these restrictions take practice and experience to get used to. But the language overall is flexible and expressive.
Rust's package manager and build tool, Cargo, makes it easy to use libraries published by others on Rust's public package repository, the crates.io website. You simply add the library's name and required version number to a file, and Cargo takes care of downloading the library, together with whatever other libraries it uses in turn, and linking the whole lot together.
A Tour of Rust
Rust's machine integer type names reflect their size and signedness: i32
is a signed 32-bit integer; u8
is an unsigned 8-bit integer (used for "byte" values), and so on. The isize
and usize
types hold pointer-sized signed and unsigned integers, 32 bits long on 32-bit platforms, and 64 bits long on 64-bit platforms. Rust also has two floating-point types, f32
and f64
, which are the IEEE single- and double-precision floating-point types, like float
and double
in C and C++.
If a function body ends with an expression that is not followed by a semicolon, that's the function's return value. [...] It's typical in Rust to use this form to establish the function's value when control "falls off the end" of the function, and use return
statements only for explicit early returns from the midst of a function.
The #[test]
marker is an example of an attribute. Attributes are an open-ended system for marking functions and other declarations with extra information [...]. They're used to control compiler warnings and code style checks, include code conditionally (like #ifdef
in C and C++), tell Rust how to interact with code written in other languages, and so on.
Functions that do anything that might fail, such as doing input or output or otherwise interacting with the operating system, can return Result
types whose Ok
variants carry successful results – the count of bytes transferred, the file opened, and so on – and whose Err
variants carry an error code indicating what went wrong. Unlike most modern languages, Rust does not have exceptions: all errors are handled using either Result
or panic [...].
Option
is an enumerated type, often called an enum, because its definition enumerates several variants that a value of this type could be: for any type T
, a value of type Option
is either Some(v)
, where v
is a value of type T
, or None
, indicating no T
value is available. [...] Option
is a generic type: you can use Option
to represent an optional value of any type T
you like.
[///
] are documentation comments; the rustdoc
utility knows how to parse them, together with the code they describe, and produce online documentation.
Fundamental Types
Compared to a dynamically typed language like JavaScript or Python, Rust requires more planning from you up front. You must spell out the types of function arguments and return values, struct fields, and a few other constructs. However, two features of Rust make this less trouble than you might expect:
- Given the types that you do spell out, Rust's type inference will figure out most of the rest for you. In practice, there's often only one type that will work for a given variable or expression; when this is the case, Rust lets you leave out, or elide, the type.
- Functions can be generic: a single function can work on values of many different types.
The footing of Rust's type system is a collection of fixed-width numeric types, chosen to match the types that almost all modern processors implement directly in hardware. Fixed-width numeric types can overflow or lose precision, but they are adequate for most applications and can be thousands of times faster than representations like arbitrary-precision integers and exact rationals.
Rust uses the u8
type for byte values. For example, reading data from a binary file or socket yields a stream of u8
values.
[The] precision [of the usize
and isize
types] matches the size of the address space on the target machine: they are 32 bits long on 32-bit architectures, and 64 bits long on 64-bit architectures. Rust requires array indices to be usize
values. Values representing the sizes of arrays or vectors or counts of the number of elements in some data structure also generally have the usize
type.
Integer literals in Rust can take a suffix indicating their type: 42u8
is a u8
value, and 1729isize
is an isize
.
The prefixes 0x
, 0o
, and 0b
designate hexadecimal, octal, and binary literals.
To make long numbers more legible, you can insert underscores among the digits. For example, you can write the largest u32
value as 4_294_967_295
. The exact placement of the underscores is not significant, so you can break hexadecimal or binary numbers into groups of four digits rather than three, as in 0xffff_ffff
, or set off the type suffix from the digits, as in 127_u8
.
Although numeric types and the char
type are distinct, Rust does provide byte literals, character-like literals for u8
values: b'X'
represents the ASCII code for the character X
, as a u8
value. For example, since the ASCII code for A
is 65, the literals b'A'
and 65u8
are exactly equivalent.
You can convert from one integer type to another using the as
operator.
When an integer arithmetic operation overflows, Rust panics, in a debug build. In a release build, the operation wraps around: it produces the value equivalent to the mathematically correct result modulo the range of the value.
When this default behavior isn't what you need, the integer types provide methods that let you spell out exactly what you want. [...] These integer arithmetic methods fall in four general categories:
-
Checked operations return an
Option
of the result:Some(v)
if the mathematically correct result can be represented as a value of that type, orNone
if it cannot. - Wrapping operations return the value equivalent to the mathematically correct result modulo the range of the value. [...] this is how the ordinary arithmetic operators behave in release builds. The advantage of these methods is that they behave the same way in all builds.
- Saturating operations return the representable value that is closest to the mathematically correct result. In other words, the result is "clamped" to the maximum and minimum values the type can represent.
-
Overflowing operations return a tuple
(result, overflowed)
, whereresult
is what the wrapping version of the function would return, andoverflowed
is abool
indicating whether an overflow occurred.
Every part of a floating-point number after the integer part is optional, but at least one of the fractional part, exponent, or type suffix must be present, to distinguish it from an integer literal. The fractional part may consist of a lone decimal point, so 5.
is a valid floating-point constant.
The types f32
and f64
have associated constants for the IEEE-required special values like INFINITY
, NEG_INFINITY
(negative infinity), NAN
(the not-a-number value), and MIN
and MAX
(the largest and smallest finite values).
Although a bool
needs only a single bit to represent it, Rust uses an entire byte for a bool
value in memory, so you can create a pointer to it.
Rust's character type char
represents a single Unicode character, as a 32-bit value. Rust uses the char
type for single characters in isolation, but uses the UTF-8 encoding for strings and streams of text. So, a String
represents its text as a sequence of UTF-8 bytes, not as an array of characters.
A tuple is a pair, or triple, quadruple, quintuple, etc. (hence, n-tuple, or tuple), of values of assorted types. You can write a tuple as a sequence of elements, separated by commas and surrounded by parentheses. [...] Given a tuple value t
, you can access its elements as t.0
, t.1
, and so on.
[A] commonly used tuple type is the zero-tuple ()
. This is traditionally called the unit type because it has only one value, also written ()
. Rust uses the unit type where there's no meaningful value to carry, but context requires some sort of type nonetheless.
It's easiest to get started by thinking of references as Rust's basic pointer type. At run time, a reference to an i32
is a single machine word holding the address of the i32
, which may be on the stack or in the heap. The expression &x
produces a reference to x
; in Rust terminology, we say that it borrows a reference to x. Given a reference r
, the expression *r
refers to the value r
points to.
Rust references come in two flavors:
-
&T
: An immutable, shared reference. You can have many shared references to a given value at a time, but they are read-only [...]. -
&mut T
: A mutable, exclusive reference. You can read and modify the value it points to [...]. But for as long as the reference exists, you may not have any other references of any kind to that value.
Rust uses this dichotomy between shared and mutable references to enforce a "single writer or multiple readers" rule: either you can read and write the value, or it can be shared by any number of readers, but never both at the same time. This separation, enforced by compile-time checks, is central to Rust's safety guarantees.
The simplest way to allocate a value in the heap is to use Box::new
.
Rust also has the raw pointer types *mut T
and *const T
. [...] Using a raw pointer is unsafe, because Rust makes no effort to track what it points to. For example, raw pointers may be null, or they may point to memory that has been freed or that now contains a value of a different type. [...] However, you may only dereference raw pointers within an unsafe
block. An unsafe
block is Rust's opt-in mechanism for advanced language features whose safety is up to you.
Rust has three types for representing a sequence of values in memory:
- The type
[T; N]
represents an array ofN
values, each of typeT
. An array's size is a constant determined at compile time and is part of the type; you can't append new elements or shrink an array. - The type
Vec<T>
, called a vector of Ts, is a dynamically allocated, growable sequence of values of typeT
. A vector's elements live on the heap, so you can resize vectors at will: push new elements onto them, append other vectors to them, delete elements, and so on. - The types
&[T]
and&mut [T]
, called a shared slice of Ts and mutable slice of Ts, are references to a series of elements that are a part of some other value, like an array or vector. You can think of a slice as a pointer to its first element, together with a count of the number of elements you can access starting at that point.
A Vec<T>
consists of three values: a pointer to the heap-allocated buffer for the elements, which is created and owned by the Vec<T>
; the number of elements that buffer has the capacity to store; and the number it actually contains now (in other words, its length). When the buffer has reached its capacity, adding another element to the vector entails allocating a larger buffer, copying the present contents into it, updating the vector's pointer and capacity to describe the new buffer, and finally freeing the old one.
If you know the number of elements a vector will need in advance, instead of Vec::new
you can call Vec::with_capacity
to create a vector with a buffer large enough to hold them all, right from the start; then, you can add the elements to the vector one at a time without causing any reallocation.
You can insert and remove elements wherever you like in a vector, although these operations shift all the elements after the affected position forward or backward, so they may be slow if the vector is long.
Whereas an ordinary reference is a non-owning pointer to a single value, a reference to a slice is a non-owning pointer to a range of consecutive values in memory. This makes slice references a good choice when you want to write a function that operates on either an array or a vector.
If one line of a string ends with a backslash, then the newline character and the leading whitespace on the next line are dropped.
A raw string is tagged with the lowercase letter r
. All backslashes and whitespace characters inside a raw string are included verbatim in the string. No escape sequences are recognized. [...] You can't include a double-quote character in a raw string simply by putting a backslash in front of it [...]. However, there is a cure for that too. The start and end of a raw string can be marked with pound signs.
A string literal with the b
prefix is a byte string. Such a string is a slice of u8
values – that is, bytes – rather than Unicode text.
Rust strings are sequences of Unicode characters, but they are not stored in memory as arrays of char
s. Instead, they are stored using UTF-8, a variable-width encoding. Each ASCII character in a string is stored in one byte. Other characters take up multiple bytes.
A &str
(pronounced "stir" or "string slice") is a reference to a run of UTF-8 text owned by someone else: it "borrows" the text. [...] Like other slice references, a &str
is a fat pointer, containing both the address of the actual data and its length. You can think of a &str
as being nothing more than a &[u8]
that is guaranteed to hold well-formed UTF-8.
A String
or &str
's .len()
method returns its length. The length is measured in bytes, not characters.
Ownership and Moves
In Rust [...] the concept of ownership is built into the language itself and enforced by compile-time checks. Every value has a single owner that determines its lifetime. When the owner is freed – dropped, in Rust terminology – the owned value is dropped too.
A variable owns its value. When control leaves the block in which the variable is declared, the variable is dropped, so its value is dropped along with it.
Rust's Box
type serves as another example of ownership. A Box<T>
is a pointer to a value of type T
stored on the heap. Calling Box::new(v)
allocates some heap space, moves the value v
into it, and returns a Box
pointing to the heap space. Since a Box
owns the space it points to, when the Box
is dropped, it frees the space too.
Just as variables own their values, structs own their fields, and tuples, arrays, and vectors own their elements.
In Rust, for most types, operations like assigning a value to a variable, passing it to a function, or returning it from a function don't copy the value: they move it. The source relinquishes ownership of the value to the destination and becomes uninitialized; the destination now controls the value's lifetime.
Assigning a value of a Copy
type copies the value, rather than moving it. The source of the assignment remains initialized and usable, with the same value it had before. Passing Copy
types to functions and constructors behaves similarly. The standard Copy
types include all the machine integer and floating-point numeric types, the char
and bool
types, and a few others. A tuple or fixed-size array of Copy
types is itself a Copy
type. Only types for which a simple bit-for-bit copy suffices can be Copy
.
[...] user-defined types being non-Copy
is only the default. If all the fields of your struct are themselves Copy
, then you can make the type Copy
as well by placing the attribute #[derive(Copy, Clone)]
above the definition [...].
Although most values have unique owners in typical Rust code, in some cases it's difficult to find every value a single owner that has the lifetime you need; you'd like the value to simply live until everyone's done using it. For these cases, Rust provides the reference-counted pointer types Rc
and Arc
.
The Rc
and Arc
types are very similar; the only difference between them is that an Arc
is safe to share between threads directly – the name Arc
is short for atomic reference count – whereas a plain Rc
uses faster non-thread-safe code to update its reference count. If you don't need to share the pointers between threads, there's no reason to pay the performance penalty of an Arc
, so you should use Rc
; Rust will prevent you from accidentally passing one across a thread boundary.
For any type T
, an Rc<T>
value is a pointer to a heap-allocated T
that has had a reference count affixed to it. Cloning an Rc<T>
value does not copy the T
; instead, it simply creates another pointer to it and increments the reference count.
References
Rust also has non-owning pointer types called references, which have no effect on their referents' lifetimes. In fact, it's rather the opposite: references must never outlive their referents. You must make it apparent in your code that no reference can possibly outlive the value it points to. To emphasize this, Rust refers to creating a reference to some value as borrowing the value: what you have borrowed, you must eventually return to its owner.
A reference lets you access a value without affecting its ownership. References come in two kinds:
- A shared reference lets you read but not modify its referent. However, you can have as many shared references to a particular value at a time as you like. The expression
&e
yields a shared reference toe
's value; ife
has the typeT
, then&e
has the type&T
, pronounced "refT
". Shared references areCopy
. - If you have a mutable reference to a value, you may both read and modify the value. However, you may not have any other references of any sort to that value active at the same time. The expression
&mut e
yields a mutable reference toe
's value; you write its type as&mut T
, which is pronounced "ref muteT
". Mutable references are notCopy
.
When we pass a value to a function in a way that moves ownership of the value to the function, we say that we have passed it by value. If we instead pass the function a reference to the value, we say that we have passed the value by reference.
In Rust, references are created explicitly with the &
operator, and dereferenced explicitly with the *
operator.
Since references are so widely used in Rust, the .
operator implicitly dereferences its left operand, if needed. [...] The .
operator can also implicitly borrow a reference to its left operand, if needed for a method call.
Rust permits references to references. [...] The .
operator follows as many references as it takes to find its target.
Like the .
operator, Rust's comparison operators "see through" any number of references.
If you actually want to know whether two references point to the same memory, you can use std::ptr::eq
, which compares them as addresses.
Rust references are never null. [...] There is no default initial value for a reference (you can't use any variable until it's been initialized, regardless of its type) and Rust won't convert integers to references (outside of unsafe
code), so you can't convert zero into a reference.
[...] Rust lets you borrow a reference to the value of any sort of expression at all. [...] In situations like this, Rust simply creates an anonymous variable to hold the expression's value and makes the reference point to that.
[...] Rust also includes two kinds of fat pointers, two-word values carrying the address of some value, along with some further information necessary to put the value to use. A reference to a slice is a fat pointer, carrying the starting address of the slice and its length. [...] Rust's other kind of fat pointer is a trait object, a reference to a value that implements a certain trait. A trait object carries a value's address and a pointer to the trait's implementation appropriate to that value, for invoking the trait's methods. [...] Aside from carrying this extra data, slice and trait object references behave just like the other sorts of references [...].
You can't borrow a reference to a local variable and take it out of the variable's scope.
Rust tries to assign each reference type in your program a lifetime that meets the constraints imposed by how it is used. A lifetime is some stretch of your program for which a reference could be safe to use: a statement, an expression, the scope of some variable, or the like. Lifetimes are entirely figments of Rust's compile-time imagination. At run time, a reference is nothing but an address; it's lifetime is part of its type and has no run-time representation.
[...] if you have a variable x
, then a reference to x
must not outlive x
itself [...]. Beyond the point where x
goes out of scope, the reference would be a dangling pointer. We say that the variable's lifetime must contain or enclose that of the reference borrowed from it.
[...] if you store a reference in a variable r
, the reference's type must be good for the entire lifetime of the variable, from its initialization until its last use [...]. If the reference can't live at least as long as the variable does, then at some point r
will be a dangling pointer. We say that the reference's lifetime must contain or enclose the variable's.
Expressions
Blocks are the most general kind of expression. A block produces a value and can be used anywhere a value is needed.
[...] a block may contain any number of declarations. The most common are let
declarations, which declare local variables. [...] A let
declaration can declare a variable without initializing it. The variable can then be initialized with a later assignment. This is occasionally useful, because sometimes a variable should be initialized from the middle of some sort of control flow construct.
You may occasionally see code that seems to redeclare an existing variable, like this:
for line in file.lines() {
let line = line?;
...
}
The let
declaration creates a new, second variable, of a different type. [...] Its definition supersedes the first's for the rest of the block. This is called shadowing and is very common in Rust programs.
A block can also contain item declarations. An item is simply any declaration that could appear globally in a program or module, such as a fn
, struct
, or use
.
When an fn
is declared inside a block, its scope is the entire block – that is, it can be used throughout the enclosing block. But a nested fn
cannot access local variables or arguments that happen to be in scope.
It's never strictly necessary to use if let
, because match
can do everything if let
can do. An if let
expression is shorthand for a match
with just one pattern.
Loops are expressions in Rust, but the value of a while
or for
loop is always ()
, so their value isn't very useful. A loop
expression can produce a value if you specify one.
Within the body of a loop
, you can give break
an expression, whose value becomes that of the loop.
Functions don't have to have an explicit return
expression. The body of a function works like a block expression: if the last expression isn't followed by a semicolon, its value is the function's return value. In fact, this is the preferred way to supply a function's return value in Rust.
Expressions that don't finish normally are assigned the special type !
, and they're exempt from the rules about types having to match. You can see !
in the function signature of std::process::exit()
. [...] The !
means that exit()
never returns. It's a divergent function.
Rust has closures, lightweight function-like values. A closure usually consists of an argument list, given between vertical bars, followed by an expression: let is_even = |x| x % 2 == 0;
. Rust infers the argument types and return type.
Error Handling
Ordinary errors are handled using the Result
type. Result
s typically represent problems caused by things outside the program, like erroneous input, a network outage, or a permission problem. That such situations occur is not up to us; even a bug-free program will encounter them from time to time. [...] Panic is for the other kind of error, the kind that should never happen.
In most places where we try something that could fail, we don't want to catch and handle the error immediately. [...] Instead, if an error occurs, we usually want to let our caller deal with it. We want errors to propagate up the call stack. Rust has a ?
operator that does this. You can add a ?
to any expression that produces a Result
[...]. The behavior of ?
depends on whether this function returns a success result or an error result:
- On success, it unwraps the
Result
to get the success value inside. - On error, it immediately returns from the enclosing function, passing the error result up the call chain. To ensure that this works,
?
can only be used on aResult
in functions that have aResult
return type.
?
also works similarly with the Option
type. In a function that returns Option
, you can use ?
to unwrap a value and return early in the case of None
.
All of the standard library error types can be converted to the type Box<dyn std::error::Error + Send + Sync + 'static>
. This is a bit of a mouthful, but dyn std::error::Error
represents "any error", and Send + Sync + 'static
makes it safe to pass between threads, which you'll often want.
Crates and Modules
Rust programs are made of crates. Each crate is a complete, cohesive unit: all the source code for a single library or executable, plus any associated tests, examples, tools, configuration, and other junk.
When we run cargo build
, Cargo starts by downloading source code for the specified versions of these crates from crates.io. Then, it reads those crates' Cargo.toml files, downloads their dependencies, and so on recursively.
The collection of all these dependency relationships, which tells Cargo everything it needs to know about what crates to build and in what order, is known as the dependency graph of the crate.
Once it has the source code, Cargo compiles all the crates. It runs rustc
, the Rust compiler, once for each crate in the project's dependency graph. When compiling libraries, Cargo uses the --crate-type lib
option. This tells rustc
not to look for a main()
function but instead to produce an .rlib file containing compiled code that can be used to create binaries and other .rlib files.
When compiling a program, Cargo uses --crate-type bin
, and the result is a binary executable for the target platform [...].
[...] cargo build --release
produces an optimized build. Release builds run faster, but they take longer to compile, they don't check for integer overflow, they skip debug_assert!()
assertions, and the stack traces they generate on panic are generally less reliable.
To evolve without breaking existing code, Rust uses editions. The 2015 edition of Rust is compatible with Rust 1.0. The 2018 edition changed async
and await
into keywords, streamlined the module system, and introduced various other language changes that are incompatible with the 2015 edition. Each crate indicates which edition of Rust it is written in with a line like this in the [package]
section atop its Cargo.toml file: edition = "2018"
. If that keyword is absent, the 2015 edition is assumed [...].
Rust promises that the compiler will always accept all extant editions of the language, and programs can freely mix crates written in different editions.
Whereas crates are about code sharing between projects, modules are about code organization within a project. They act as Rust's namespaces, containers for the functions, types, constants, and so on that make up your Rust program or library.
[A] function [...] marked pub (crate)
[...] is available anywhere inside this crate, but isn't exposed as part of the external interface. It can't be used by other crates [...].
Anything that isn't marked pub
is private and can only be used in the same module in which it is defined, or any child modules. [...] Marking an item as pub
is often known as "exporting" that item.
Modules can nest, and it's fairly common to see a module that's just a collection of submodules.
These three options – modules in their own file, modules in their own directory with a mod.rs, and modules in their own file with a supplementary directory containing submodules – give the module system enough flexibility to support almost any project structure you might desire.
The keywords super
and crate
have a special meaning in paths: super
refers to the parent module, and crate
refers to the crate containing the current module.
Submodules can access private items in their parent modules with use super::*
.
[...] the standard library std
is automatically linked with every project. This means you can always go with use std::whatever
or refer to std
items by name, like std::mem::swap()
inline in your code. Furthermore, a few particularly handy names, like Vec
and Result
, are included in the standard prelude and automatically imported. Rust behaves as though every module, including the root module, started with the following import: use std::prelude::v1::*;
.
Any item in a Rust program can be decorated with attributes. Attributes are Rust's catchall syntax for writing miscellaneous instructions and advice to the compiler.
Conditional compilation is [a] feature that's written using an attribute, namely, #[cfg]
.
To attach an attribute to a whole crate, add it at the top of the main.rs or lib.rs file, before any items, and write #!
instead of #
[...].
[...] the #![feature]
attribute is used to turn on unstable features of the Rust language and libraries, features that are experimental, and therefore might have bugs or might be changed or removed in the future.
[...] a simple unit testing framework is built into Rust. Tests are ordinary functions marked with the #[test]
attribute.
Integration tests are .rs files that live in a tests directory alongside your project's src directory. When you run cargo test
, Cargo compiles each integration test as a separate, standalone crate, linked with your library and the Rust test harness.
The command cargo doc
creates HTML documentation for your library.
[...] when Rust sees comments that start with three slashes, it treats them as a #[doc]
attribute instead. [...] When you compile a library or binary, these attributes don't change anything, but when you generate documentation, doc comments on public features are included in the output. Likewise, comments starting with //!
are treated as #![doc]
attributes and are attached to the enclosing feature, typically a module or crate.
The content of a doc comment is treated as Markdown [...]. You can also include HTML tags, which are copied verbatim into the formatted documentation. One special feature of doc comments in Rust is that Markdown links can use Rust item paths, like leaves::Leaf
, instead of relative URLs, to indicate what they refer to. Cargo will look up what the path refers to and substitute a link to the right place in the right documentation page.
[...] an interesting thing happens when you include a block of code in a doc comment. Rust automatically turns it into a test. [...] When you run tests in a Rust library crate, Rust checks that all the code that appears in your documentation actually runs and works. It does this by taking each block of code that appears in a doc comment, compiling it as a separate executable crate, linking it with your library, and running it.
The idea behind doc-tests is not to put all your tests into comments. Rather, you write the best possible documentation, and Rust makes sure the code samples in your documentation actually compile and run.
There are several ways to specify dependencies [...] First of all, you may want to use dependencies that aren't published on crates.io at all. One way to do this is by specifying a Git repository URL [...]. Another alternative is to specify a directory that contains the crate's source code: image = { path = "vendor/image" }
.
When you write something like image = "0.13.0"
in your Cargo.toml file, Cargo interprets this rather loosely. It uses the most recent version of image
that is considered compatible with version 0.13.0. The compatibility rules are adapted from Semantic Versioning.
- A version number that starts with 0.0 is so raw that Cargo never assumes it's compatible with any other version.
- A version number that starts with 0.x, where x is nonzero, is considered compatible with other point releases in the 0.x series.
- Once a project reaches 1.0, only new major versions break compatibility. So if you ask for version 2.0.1, Cargo might use 2.17.99 instead, but not 3.0.
The first time you build a project, Cargo outputs a Cargo.lock file that records the exact version of every crate it used. Later builds will consult this file and continue to use the same versions. Cargo upgrades to newer versions only when you tell it to, either by manually bumping up the version number in your Cargo.toml file or by running cargo update
. [...] cargo update
only upgrades to the latest versions that are compatible with what you've specified in Cargo.toml.
[...] if your project is an executable, you should commit Cargo.lock to version control. That way, everyone who builds your project will consistently get the same versions. The history of your Cargo.lock file will record your dependency updates.
Structs
A struct assembles several values of assorted types together into a single value so you can deal with them as a unit. Given a struct, you can read and modify its individual components. And a struct can have methods associated with it that operate on its components.
Rust has three kinds of struct types, named-field, tuple-like, and unit-like, which differ in how you refer to their components: a named-field struct gives a name to each component, whereas a tuple-like struct identifies them by the order in which they appear. Unit-like structs have no components at all [...].
When creating a named-field struct value, you can use another struct of the same type to supply values for fields you omit. In a struct expression, if the named fields are followed by .. EXPR
, then any fields not mentioned take their values from EXPR
[...].
An impl
block is simply a collection of fn
definitions, each of which becomes a method on the struct type named at the top of the block.
Functions defined in an impl
block are called associated functions, since they're associated with a specific type.
Rust passes a method the value it's being called on as its first argument, which must have the special name self
. Since self
's type is obviously the one named at the top of the impl
block, or a reference to that, Rust lets you omit the type, and write self
, &self
, or &mut self
[...].
An impl
block for a given type can also define functions that don't take self
as an argument at all. These are still associated functions, sine they're in an impl
block, but they're not methods, since they don't take a self
argument. To distinguish them from methods, we call them type-associated functions.
[...] Rust structs can be generic, meaning that their definition is a template into which you can plug whatever types you like.
You can read the <T>
in Queue<T>
as "for any element type T
...".
In generic struct definitions, the type names used in <angle brackets> are called type parameters.
You can read the line impl<T> Queue<T>
as something like, "for any type T
, here are some associated functions available on Queue<T>
". Then, you can use the type parameter T
as a type in the associated function definitions. The syntax may look a bit redundant, but the impl<T>
makes it clear that the impl
block covers any type T
, which distinguishes it from an impl
block written for one specific kind of Queue
[...].
A Cell<T>
is a struct that contains a single private value of type T
. The only special thing about a Cell
is that you can get and set the field even if you don't have mut
access to the Cell
itself.
Like Cell<T>
, RefCell<T>
is a generic type that contains a single value of type T
. Unlike Cell
, RefCell
supports borrowing references to its T
value.
[...] normally, when you borrow a reference to a variable, Rust checks at compile time to ensure that you're using the reference safely. If the checks fail, you get a compiler error. RefCell
enforces the same rule using run-time checks. So if you're breaking the rules, you get a panic (or an Err
, for try_borrow
and try_borrow_mut
).
Enums and Patterns
In memory, values of C-style enums are stored as integers. [...] By default, Rust stores C-style enums using the smallest built-in integer type that can accomodate them.
In all, Rust has three kinds of enum variant, echoing the three kinds of struct [...]. Variants with no data correspond to unit-like structs. Tuple variants look and function just like tuple structs. Struct variants have curly braces and named fields. A single enum can have variants of all three kinds.
In memory, enums with data are stored as a small integer tag, plus enough memory to hold all the fields of the largest variant. The tag field is for Rust's internal use. It tells which constructor created the value and therefore which fields it has.
The thing to remember is that patterns and expressions are natural opposites. The expression (x, y)
makes two values into a new tuple, but the pattern (x, y)
does the opposite: it matches a tuple and breaks out the two values. It's the same with &
. In an expression, &
creates a reference. In a pattern, &
matches a reference.
Traits and Generics
Rust supports polymorphism with two related features: traits and generics.
Traits are Rust's take on interfaces or abstract base classes.
Generics are the other flavor of polymorphism in Rust. Like a C++ template, a generic function or type can be used with values of many different types.
The <T: Ord>
in [fn min<T: Ord>(value1: T, value2: T) -> T
] means that min
can be used with arguments of any type T
that implements the Ord
trait – that is, any ordered type. A requirement like this is called a bound, because it sets limits on which types T
could possibly be. The compiler generates custom machine code for each type T
that you actually use.
Generics and traits are closely related: generic functions use traits in bounds to spell out what types of arguments they can be applied to.
A trait is a feature that any given type may or may not support. Most often, a trait represents a capability: something a type can do.
There is one unusual rule about trait methods: the trait itself must be in scope. Otherwise, all its methods are hidden. [...] Rust has this rule because [...] you can use traits to add new methods to any type – even standard library types like u32
and str
. Third-party crates can do the same thing. Clearly, this could lead to naming conflicts! But since Rust makes you import the traits you plan to use, crates are free to take advantage of this superpower.
The reason Clone
and Iterator
methods work without any special imports is that they're always in scope by default: they're part of the standard prelude, names that Rust automatically imports into every module.
A reference to a trait type, like writer
[in let writer: &mut dyn Write = &mut buf;
], is called a trait object. Like any other reference, a trait object points to some value, it has a lifetime, and it can be either mut
or shared. What makes a trait object different is that Rust usually doesn't know the type of the referent at compile time. So a trait object includes a little extra information about the referent's type. This is strictly for Rust's own use behind the scenes: when you call writer.write(data)
, Rust needs the type information to dynamically call the right write
method depending on the type of *writer
.
In memory, a trait object is a fat pointer consisting of a pointer to the value, plus a pointer to a table representing that value's type. Each trait object therefore takes up two machine words [...]. C++ has this kind of run-time type information as well. It's called a virtual table, or vtable. In Rust, as in C++, the vtable is generated once, at compile time, and shared by all objects of the same type.
[<W: Write>
] is a type parameter. It means that throughout the body of this function, W
stands for some type that implements the Write
trait. Type parameters are usually single uppercase letters, by convention.
[...] Rust infers the type W
from the type of the argument. This process is known as monomorphization, and the compiler handles it all automatically.
The choice of whether to use trait objects or generic code is subtle. Since both features are based on traits, they have a lot in common. Trait objects are the right choice whenever you need a collection of values of mixed types, all together. [...] Another possible reason to use trait objects is to reduce the total amount of compiled code. Rust may have to compile a generic function many times, once for each type it's used with. This could make the binary large [...].
[...] generics have three important advantages over trait objects, with the result that in Rust, generics are the more common choice. The first advantage is speed. Note the absence of the dyn
keyword in generic function signatures. Because you specify the types at compile time, either explicitly or through type inference, the compiler knows exactly which [...] method to call. The dyn
keyword isn't used because there are no trait objects – and thus no dynamic dispatch – involved. [...] The second advantage of generics is that not every trait can support trait objects. Traits support several features, such as associated functions, that work only with generics: they rule out trait objects entirely. [...] The third advantage of generics is that it's easy to bound a generic type parameter with several traits at once [...]. Trait objects can't do this: types like &mut (dyn Debug + Hash + Eq)
aren't supported in Rust.
Defining a trait is simple. Give it a name and list the type signatures of the trait methods. [...] To implement a trait, use the syntax impl TraitName for Type
. [...] Everything defined in a trait impl
must actually be a feature of the trait; if we wanted to add a helper method [...], we would have to define it in a separate impl
block.
Rust lets you implement any trait on any type, as long as either the trait or the type is introduced in the current crate. [...] This is called the orphan rule. It helps Rust ensure that trait implementations are unique. Your code can't impl Write for u8
, because both Write
and u8
are defined in the standard library. If Rust let crates do that, there could be multiple implementations of Write
for u8
, in different crates, and Rust would have no reasonable way to decide which implementation to use for a given method call.
A trait can use the keyword Self
as a type. [...] Using Self
as the return type [...] means that the type of x.clone()
is the same as the type of x
, whatever that might be.
A trait that uses the Self
type is incompatible with trait objects.
We can declare that a trait is an extension of another trait. [...] The phrase trait Creature: Visible
means that all creatures are visible. Every type that implements Creature
must also implement the Visible
trait.
[...] a subtrait does not inherit the associated items of its supertrait; each trait still needs to be in scope if you want to call its methods. In fact, Rust's subtraits are really just a shorthand for a bound on Self
.
In most object-oriented languages, interfaces can't include static methods or constructors, but traits can include type-associated functions, Rust's analog to static methods.
Rust has a standard Iterator
trait, defined like this:
pub trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
...
}
The first feature of this trait, type Item;
, is an associated type. Each type that implements Iterator
must specify what type of item it produces. The second feature, the next()
method, uses the associated type in its return value. next()
returns an Option<Self::Item>
: either Some(item)
, the next value in the sequence, or None
when there are no more values to visit. The type is written as Self::Item
, not just plain Item
, because Item
is a feature of each type of iterator, not a standalone type.
Associated types are perfect for cases where each implementation has one specific related type [...].
Generic traits get a special dispensation when it comes to the orphan rule: you can implement a foreign trait for a foreign type, so long as one of the trait's type parameters is a type defined in the current crate. So, if you've defined WindowSize
yourself, you can implement Mul<WindowSize>
for f64
, even though you didn't define either Mul
or f64
.
impl Trait
allows us to "erase" the type of a return value, specifying only the trait or traits it implements, without dynamic dispatch or a heap allocation.
Using impl Trait
means that you can change the actual type being returned in the future as long as it still implements [the trait], and any code calling the function will continue to compile without an issue. This provides a lot of flexibility for library authors, because only the relevant functionality is encoded in the type signature.
Like structs and enums, traits can have associated constants. You can declare a trait with an associated constant using the same syntax as for a struct or enum. [...] Associated consts in traits have a special power, though. Like associated types and functions, you can declare them but not give them a value. [...] Then, implementors of the trait can define these values.
Operator Overloading
You can make your own types support arithmetic and other operators, too, just by implementing a few built-in traits. This is called operator overloading [...].
In Rust, the expression a + b
is actually shorthand for a.add(b)
, a call to the add
method of the standard library's std::ops::Add
trait.
[...] Rust has two unary operators that you can customize [...]. All of Rust's signed numeric types implement std::ops::Neg
, for the unary negation operator -
; the integer types and bool
implement std::ops::Not
, for the unary complement operator !
.
A compound assignment expression is one like x += y
or x &= y
: it takes two operands, performs some operation on them like addition or a bitwise AND, and stores the result back in the left operand. In Rust, the value of a compound assignment expression is always ()
, never the value stored. Many languages have operators like these and usually define them as shorthand for expressions like x = x + y
or x = x & y
. However, Rust doesn't take that approach. Instead, x += y
is shorthand for the method call x.add_assign(y)
, where add_assign
is the sole method of the std::ops::AddAssign
trait.
The built-in trait for a compound assignment operator is completely independent of the built-in trait for the corresponding binary operator. Implementing std::ops::Add
does not automatically implement std::ops::AddAssign
; if you want Rust to permit your type as the lefthand operand of a +=
operator, you must implement AddAssign
yourself.
Rust's equality operators, ==
and !=
, are shorthand for calls to the std::cmp::PartialEq
trait's eq
and ne
methods.
Since the ne
method has a default definition, you only need to define eq
to implement the PartialEq
trait [...].
Implementations of PartialEq
are almost always of the form shown here: they compare each field of the left operand to the corresponding field of the right. These get tedious to write, and equality is a common operation to support, so if you ask, Rust will generate an implementation of PartialEq
for you automatically. Simply add PartialEq
to the type definition's derive
attribute [...].
Rust specifies the behavior of the ordered comparison operators <
, >
, <=
, and >=
all in terms of a single trait, std::cmp::PartialOrd
. [...] The only method of PartialOrd
you must implement yourself is partial_cmp
.
If you know that values of two types are always ordered with respect to each other, then you can implement the stricter std::cmp::Ord
trait. [...] The cmp
method here simply returns an Ordering
, instead of an Option<Ordering>
like partial_cmp
: cmp
always declares its arguments equal or indicates their relative order. Almost all types that implement PartialOrd
should also implement Ord
. In the standard library, f32
and f64
are the only exceptions to this rule.
You can specify how an indexing expression like a[i]
works on your type by implementing the std::ops::Index
and std::ops::IndexMut
traits. Arrays support the []
operator directly, but on any other type, the expression a[i]
is normally shorthand for *a.index(i)
, where index
is a method of the std::ops::Index
trait. However, if the expression is being assigned to or borrowed mutably, it's instead shorthand for *a.index_mut(i)
, a call to the method of the std::ops::IndexMut
trait.
Utility Traits
When a value's owner goes away, we say that Rust drops the value. Dropping a value entails freeing whatever other values, heap storage, and system resources the value owns. Drops occur under a variety of circumstances: when a variable goes out of scope; at the end of an expression statement, when you truncate a vector, removing elements from its end; and so on. For the most part, Rust handles dropping values for you automatically.
[...] if you want, you can customize how Rust drops values of your type by implementing the std::ops::Drop
trait. [...] When a value is dropped, if it implements std::ops::Drop
, Rust calls its drop
method, before proceeding to drop whatever values its fields or elements own, as it normally would. This implicit invocation of drop
is the only way to call that method; if you try to invoke it explicitly yourself, Rust flags that as an error.
A sized type is one whose values all have the same size in memory. Almost all types in Rust are sized: every u64
takes eight bytes, every (f32, f32, f32)
tuple twelve. Even enums are sized: no matter which variant is actually present, an enum always occupies enough space to hold its largest variant.
All sized types implement the std::marker::Sized
trait, which has no methods or associated types. Rust implements it automatically for all types to which it applies; you can't implement it yourself. The only use for Sized
is as a bound for type variables: a bound like T: Sized
requires T
to be a type whose size is known at compile time. Traits of this sort are called marker traits, because the Rust language itself uses them to mark certain types as having characteristics of interest.
Rust can't store unsized values in variables or pass them as arguments. You can only deal with them through pointers like &str
or Box<dyn Write>
, which themselves are sized. [...] a pointer to an unsized value is always a fat pointer, two words wide [...].
The std::clone::Clone
trait is for types that can make copies of themselves. [...] The clone
method should construct an independent copy of self
and return it.
Cloning a value usually entails allocating copies of anything it owns, as well, so a clone
can be expensive, in both time and memory. [...] This is why Rust doesn't just clone values automatically, but instead requires you to make an explicit method call.
[...] a type is Copy
if it implements the std::marker::Copy
marker trait [...].
[...] because Copy
is a marker trait with special meaning to the language, Rust permits a type to implement Copy
only if a shallow byte-for-byte copy is all it needs. Types that own any other resources, like heap buffers or operating system handles, cannot implement Copy
. Any type that implements the Drop
trait cannot be Copy
. Rust presumes that if a type needs special cleanup code, it must also require special copying code and thus can't be Copy
.
You can specify how dereferencing operators like *
and .
behave on your types by implementing the std::ops::Deref
and std::ops::DerefMut
traits. Pointer types like Box<T>
and Rc<T>
implement these traits so that they can behave as Rust's built-in pointer types do.
The deref
and deref_mut
methods take a &Self
reference and return a &Self::Target
reference. Target
should be something that Self
contains, owns, or refers to: for Box<Complex>
the Target
type is Complex
.
Since deref
takes a &Self
reference and returns a &Self::Target
reference, Rust uses this to automatically convert references of the former type into the latter. In other words, if inserting a deref
call would prevent a type mismatch, Rust inserts one for you. Implementing DerefMut
enables the corresponding conversion for mutable references. These are called the deref coercions: one type is being "coerced" into behaving as another.
Some types have a reasonably obvious default value: the default vector or string is empty, the default number is zero, the default Option
is None
, and so on. Types like this can implement the std::default::Default
trait. [...] The default
method simply returns a fresh value of type Self
.
When a type implements AsRef<T>
, that means you can borrow a &T
from it efficiently. AsMut
is the analogue for mutable references.
The std::borrow::Borrow
trait is similar to AsRef
: if a type implements Borrow<T>
, then its borrow
method efficiently borrows a &T
from it. But Borrow
imposes more restrictions: a type should implement Borrow<T>
only when a &T
hashes and compares the same way as the value it's borrowed from.
The std::convert::From
and std::convert::Into
traits represent conversions that consume a value of one type and return a value of another. Whereas the AsRef
and AsMut
traits borrow a reference of one type from another, From
and Into
take ownership of their argument, transform it, and then return ownership of the result back to the caller.
Although the traits simply provide two ways to do the same thing, they lend themselves to different uses. You generally use Into
to make your functions more flexible in the arguments they accept. [...] The From
trait, however, plays a different role. The from
method serves as a generic constructor for producing an instance of a type from some other single value.
Given an appropriate From
implementation, the standard library automatically implements the corresponding Into
trait. When you define your own type, if it has single-argument constructors, you should write them as implementations of From<T>
for the appropriate types; you'll get the corresponding Into
implementations for free.
From
and Into
are infallible traits – their API requires that conversions will not fail.
TryFrom
and TryInto
are the fallible cousins of From
and Into
and are similarly reciprocal; implementing TryFrom
means that TryInto
is implemented as well.
Given a reference, the usual way to produce an owned copy of its referent is to call clone
, assuming the type implements std::clone::Clone
. But what if you want to clone a &str
or a &[i32]
? What you probably want is a String
or a Vec<i32>
, but Clone
's definition doesn't permit that: by definition, cloning a &T
must always return a value of type T
, and str
and [u8]
are unsized; they aren't even types that a function could return. The std::borrow::ToOwned
trait provides a slightly looser way to convert a reference to an owned value. [...] Unlike clone
, which must return exactly Self
, to_owned
can return anything you could borrow a &Self
from [...].
[...] in some cases you cannot decide whether to borrow or own until the program is running; the std::borrow::Cow
type (for "clone on write") provides one way to do this. [...] A Cow<B>
either borrows a shared reference to a B
or owns a value from which we could borrow such a reference.
Closures
A closure can use data that belongs to an enclosing function.
The move
keyword tells Rust that a closure doesn't borrow the variables it uses: it steals them.
In fact, every closure you write has its own type, because a closure may contain data: values either borrowed or stolen from enclosing scopes. This could be any number of variables, in any combination of types. So every closure has an ad hoc type created by the compiler, large enough to hold that data. No two closures have exactly the same type. But every closure implements an Fn
trait [...].
Since every closure has its own type, code that works with closures usually needs to be generic [...].
Closures that drop values [...] are not allowed to have Fn
. They are, quite literally, no Fn
at all. They implement a less powerful trait, FnOnce
, the trait of closures that can be called once. The first time you call a FnOnce
closure, the closure itself is used up.
Rust considers non-mut
values safe to share across threads. But it wouldn't be safe to share non-mut
closures that contain mut
data: calling such a closure from multiple threads could lead to all sorts of race conditions as multiple threads try to read and write the same data at the same time. Therefore, Rust has one more category of closure, FnMut
, the category of closures that write.
Every Fn
meets the requirements for FnMut
, and every FnMut
meets the requirements for FnOnce
. [...] they're not three separate categories. Instead, Fn()
is a subtrait of FnMut()
, which is a subtrait of FnOnce()
. This makes Fn
the most exclusive and most powerful category. FnMut
and FnOnce
are broader categories that include closures with usage restrictions.
[...] closures are represented as structs that contain either the values (for move
closures) or references to the values (for non-move
closures) of the variables they capture. The rules for Copy
and Clone
on closures are just like the Copy
and Clone
rules for regular structs. A non-move
closure that doesn't mutate variables holds only shared references, which are both Clone
and Copy
, so that closure is both Clone
and Copy
as well. [...] On the other hand, a non-move
closure that does mutate values has mutable references within its internal representation. Mutable references are neither Clone
nor Copy
, so neither is a closure that uses them. [...] For a move
closure, the rules are even simpler. If everything a move
closure captures is Copy
, it's Copy
. If everything it captures is Clone
, it's Clone
.
Iterators
An iterator is a value that produces a sequence of values, typically for a loop to operate on.
An iterator is any value that implements the std::iter::Iterator
trait. [...] The next
method either returns Some(v)
, where v
is the iterator's next value, or returns None
to indicate the end of the sequence.
If there's a natural way to iterate over some type, that type can implement std::iter::IntoIterator
, whose into_iter
method takes a value and returns an iterator over it. [...] We call any type that implements IntoIterator
an iterable, because it's something you could iterate over if you asked.
Under the hood, every for
loop is just shorthand for calls to IntoIterator
and Iterator
methods. [...] The for
loop uses IntoIterator::into_iter
to convert its operand [...] into an iterator and then calls Iterator::next
repeatedly. Each time that returns Some(element)
, the for
loop executes its body; and if it returns None
, the loop finishes.
Most collection types provide iter
and iter_mut
methods that return the natural iterators over the type, producing a shared or mutable reference to each item. Array slices like &[T]
and &mut [T]
have iter
and iter_mut
methods too. These methods are the most common way to get an iterator, if you're not going to let a for
loop take care of it for you.
One simple and general way to produce a sequence of values is to provide a closure that returns them. Given a function returning Option<T>
, std::iter::from_fn
returns an iterator that simply calls the function to produce its items.
If each item depends on the one before, the std::iter::successors
function works nicely. You provide an initial item and a function that takes one item and returns an Option
of the next. If it returns None
, the iteration ends.
Many collection types provide a drain
method that takes a mutable reference to the collection and returns an iterator that passes ownership of each element to the consumer. However, unlike the into_iter()
method, which takes the collection by value and consumes it, drain
merely borrows a mutable reference to the collection, and when the iterator is dropped, it removes any remaining elements from the collection and leaves it empty.
The Iterator
trait's map
adapter lets you transform an iterator by applying a closure to its items. The filter
adapter lets you filter out items from an iterator, using a closure to decide which to keep and which to drop.
[...] simply calling an adapter on an iterator doesn't consume any items; it just returns a new iterator, ready to produce its own items by drawing from the first iterator as needed. In a chain of adapters, the only way to make any work actually get done is to call next
on the final iterator.
The filter_map
adapter is similar to map
except that it lets its closure either transform the item into a new item (as map
does) or drop the item from the iteration. [...] When the closure returns None
, the item is dropped from the iteration; when it returns Some(b)
, then b
is the next item the filter_map
iterator produces.
You can think of the flat_map
adapter as continuing in the same vein as map
and filter_map
, except that now the closure can return not just one item (as with map
) or zero or one items (as with filter_map
), but a sequence of any number of items. The flat_map
iterator produces the concatenation of the sequences the closure returns.
The flatten
adapter concatenates an iterator's items, assuming each item is itself an iterable.
The Iterator
trait's take
and take_while
adapters let you end an iteration after a certain number of items or when a closure decides to cut things off.
The Iterator
trait's skip
and skip_while
methods are the complement of take
and take_while
: they drop a certain number of items from the beginning of an iteration, or drop items until a closure finds one acceptable, and then pass the remaining items through unchanged.
A peekable iterator lets you peek at the next item that will be produced without actually consuming it. You can turn any iterator into a peekable iterator by calling the Iterator
trait's peekable
method.
Once an Iterator
has returned None
, the trait doesn't specify how it ought to behave if you call its next
method again. Most iterators just return None
again, but not all. [...] The fuse
adapter takes any iterator and produces one that will definitely continue to return None
once it has done so the first time.
Some iterators are able to draw items from both ends of the sequence. You can reverse such iterators by using the rev
adapter. [...] Such iterators can implement the std::iter::DoubleEndedIterator
trait, which extends Iterator
.
The inspect
adapter is handy for debugging pipelines of iterator adapters, but it isn't used much in production code. It simply applies a closure to a shared reference to each item and then passes the item through. The closure can't affect the items, but it can do things like print them or make assertions about them.
The chain
adapter appends one iterator to another. More precisely, i1.chain(i2)
returns an iterator that draws items from i1
until it's exhausted and then draws items from i2
.
The Iterator
trait's enumerate
adapter attaches a running index to the sequence, taking an iterator that produces items A
, B
, C
, ... and returning an iterator that produces pairs (0, A)
, (1, B)
, (2, C)
, ....
The zip
adapter combines two iterators into a single iterator that produces pairs holding one value from each iterator [...]. The zipped iterator ends when either of the two underlying iterators ends.
An iterator's by_ref
method borrows a mutable reference to the iterator so that you can apply adapters to the reference. When you're done consuming items from these adapters, you drop them, the borrow ends, and you regain access to your original iterator.
The cloned
adapter takes an iterator that produces references and returns an iterator that produces values cloned from those references, much like iter.map(|item| item.clone())
. Naturally, the referent type must implement Clone
. [...] The copied
adapter is the same idea, but more restrictive: the referent type must implement Copy
.
The cycle
adapter returns an iterator that endlessly repeats the sequence produced by the underlying iterator. The underlying iterator must implement std::clone::Clone
so that cycle
can save its initial state and reuse it each time the cycle starts again.
The count
method draws items from an iterator until it returns None
and tells you how many it got.
The sum
and product
methods compute the sum or product of the iterator's items, which must be integers or floating-point numbers.
The min
and max
methods on Iterator
return the least or greatest item the iterator produces. The iterator's item type must implement std::cmp::Ord
so that items can be compared with one another.
The max_by
and min_by
methods return the maximum or minimum item the iterator produces, as determined by a comparison function you provide.
The max_by_key
and min_by_key
methods on Iterator
let you select the maximum or minimum item as determined by a closure applied to each item.
Although iterators do not support Rust's comparison operators, they do provide methods like eq
and lt
that do the same job, drawing pairs of items from the iterators and comparing them until a decision can be reached.
The any
and all
methods apply a closure to each item the iterator produces and return true
if the closure returns true
for any item, or for all the items.
The position
method applies a closure to each item from the iterator and returns the index of the first item for which the closure returns true
. More precisely, it returns an Option
of the index: if the closure returns true
for no item, position
returns None
. [...] The rposition
method is the same, except that it searches from the right.
The fold
method is a very general tool for accumulating some sort of result over the entire sequence of items an iterator produces. Given an initial value, which we'll call the accumulator, and a closure, fold
repeatedly applies the closure to the current accumulator and the next item from the iterator. The value the closure returns is taken as the new accumulator, to be passed to the closure with the next item. The final accumulator value is what fold
itself returns. [...] The rfold
method is the same as fold
, except that it requires a double-ended iterator, and processes its items from last to first.
The try_fold
method is the same as fold
, except that the process of iteration can exit early, without consuming all the values from the iterator. The closure you pass to try_fold
must return a Result
: if it returns Err(e)
, try_fold
returns immediately with Err(e)
as its value. Otherwise, it continues folding with the success value. The closure can also return an Option
: returning None
exits early, and the result is an Option
of the folded value. [...] The try_rfold
method, as its name suggests, is the same as try_fold
, except that it draws values from the back, instead of the front, and requires a double-ended iterator.
The nth
method takes an index n
, skips that many items from the iterator, and returns the next item, or None
if the sequence ends before that point. [...] The nth_back
method is much the same, except that it draws from the back of a double-ended iterator.
The last
method returns the last item the iterator produces, or None
if it's empty.
The find
method draws items from an iterator, returning the first item for which the given closure returns true
, or None
if the sequence ends before a suitable item is found. [...] The rfind
method is similar, but it requires a double-ended iterator and searches values from back to front [...].
[find_map
] is just like find
, except that instead of returning bool
, the closure should return an Option
of some value. find_map
returns the first Option
that is Some
.
[collect
] can build any kind of collection from Rust's standard library, as long as the iterator produces a suitable item type.
If a type implements the std::iter::Extend
trait, then its extend
method adds an iterable's items to the collection.
The partition
method divides an iterator's items among two collections, using a closure to decide where each item belongs.
The for_each
method simply applies a closure to each item. [...] If your closure needs to be fallible or exit early, you can use try_for_each
.
Collections
The easiest way to create a vector is to use the vec!
macro.
[...] a vector has three fields: the length, the capacity, and a pointer to a heap allocation where the elements are stored.
All of a vector's elements are stored in a contiguous, heap-allocated chunk of memory. The capacity of a vector is the maximum number of elements that would fit in this chunk. Vec
normally manages the capacity for you, automatically allocating a larger buffer and moving the elements into it when more space is needed.
Vec
supports efficiently adding and removing elements only at the end. When a program needs a place to store values that are "waiting in line", Vec
can be slow. Rust's std::collections::VecDeque<T>
is a deque (pronounced "deck"), a double-ended queue. It supports efficient add and remove operations at both the front and the back.
Like a Vec
, [VecDeque
] has a single heap allocation where elements are stored. Unlike Vec
, the data does not always start at the beginning of this region, and it can "wrap around" the end [...]. VecDeque
has private fields [...] that it uses to remember where in the buffer the data begins and ends.
A BinaryHeap
is a collection whose elements are kept loosely organized so that the greatest value always bubbles up to the front of the queue.
[...] BinaryHeap
is not limited to numbers. It can hold any type of value that implements the Ord
built-in trait. This makes BinaryHeap
useful as a work queue. You can define a task struct that implements Ord
on the basis of priority so that higher-priority tasks are Greater
than lower-priority tasks. Then, create a BinaryHeap
to hold all pending tasks. Its .pop()
method will always return the most important item [...].
[...] BinaryHeap
is iterable, and it has an .iter()
method, but the iterators produce the heap's elements in an arbitrary order, not from greatest to least. To consume values from a BinaryHeap
in order of priority, use a while
loop.
A map is a collection of key-value pairs (called entries). No two entries have the same key, and the entries are kept organized so that if you have a key, you can efficiently look up the corresponding value in a map.
Rust offers two map types: HashMap<K, V>
and BTreeMap<K, V>
. The two share many of the same methods; the difference is in how the two keep entries arranged for fast lookup.
A HashMap
stores the keys and values in a hash table, so it requires a key type K
that implements Hash
and Eq
[...]. All keys, values, and cached hash codes are stored in a single heap-allocated table. Adding entries eventually forces the HashMap
to allocate a larger table and move all the data into it.
A BTreeMap
stores the entries in order by key, in a tree structure, so it requires a key type K
that implements Ord
.
A map can also be queried using square brackets: map[&key]
. That is, maps implement the Index
built-in trait. However, this panics if there is not already an entry for the given key
, like an out-of-bounds array access, so use this syntax only if the entry you're looking up is sure to be populated.
Sets are collections of values arranged for fast membership testing. [...] A set never contains multiple copies of the same value.
[...] behind the scenes, a set is like a map with only keys, rather than key-value pairs. In fact, Rust's two set types, HashSet<T>
and BTreeSet<T>
, are implemented as thin wrappers around HashMap<T, ()>
and BTreeMap<T, ()>
.
&set1 & &set2
returns a new set that's the intersection of set1
and set2
.
&set1 | &set2
returns a new set containing [...] values that are in either set1
or set2
.
&set1 - &set2
returns a new set containing [values that in set1
but not in set2
].
&set1 ^ &set2
returns a new set containing [values that are in either set1
or set2
, but not both].
std::hash::Hash
is the standard library trait for hashable types. HashMap
keys and HashSet
elements must implement both Hash
and Eq
.
One principle of the standard library is that a value should have the same hash code regardless of where you store it or how you point to it. Therefore, a reference has the same hash code as the value it refers to, and a Box
has the same hash code as the boxed value. A vector vec
has the same hash code as the slice containing all its data, &vec[..]
. A String
has the same hash code as a &str
with the same characters.
Rust's default hashing algorithm is a well-known algorithm called SipHash-1-3. SipHash is fast, and it's very good at minimizing hash collisions. In fact, it's a cryptographic algorithm: there's no known efficient way to generate SipHash-1-3 collisions.
Strings and Text
The Rust String
and str
types represent text using the UTF-8 encoding form. UTF-8 encodes a character as a sequence of one to four bytes.
Since UTF-8 encodes code points 0
through 0x7f
as nothing more than the bytes 0
through 0x7f
, a range of bytes holding ASCII text is valid UTF-8. And if a string of UTF-8 includes only characters from ASCII, the reverse is also true: the UTF-8 encoding is valid ASCII.
Unicode stores characters in the order in which they would normally be written or read, so the initial bytes of a string holding, say, Hebrew text encode the character that would be written at the right.
A Rust char
is a 32-bit value holding a Unicode code point. A char is guaranteed to fall in the range from 0
to 0xd7ff
or in the range 0xe000
to 0x10ffff
; all the methods for creating and manipulating char
values ensure that this is true.
Rust's String
and str
types are guaranteed to hold only well-formed UTF-8. The library ensures this by restricting the ways you can create String
and str
values and the operations you can perform on them, such that the values are well-formed when introduced and remain so as you work with them. All their methods protect this guarantee: no safe operation on them can introduce ill-formed UTF-8.
Rust places text-handling methods on either str
or String
depending on whether the method needs a resizable buffer or is content just to use the text in place. Since String
dereferences to &str
, every method defined on str
is directly available on String
as well.
A String
is implemented as a wrapper around a Vec<u8>
that ensures the vector's contents are always well-formed UTF-8.
When a standard library function needs to search, match, split, or trim text, it accepts several different types to represent what to look for. [...] These types are called patterns, and most operations support them. [...] The standard library supports four main kinds of patterns:
- A
char
as a pattern matches that character. - A
String
or&str
or&&str
as a pattern matches a substring equal to the pattern. - A
FnMut(char) -> bool
closure as a pattern matches a single character for which the closure returns true. - A
&[char]
as a pattern (not a&str
, but a slice ofchar
values) matches any single character that appears in the list.
The external regex
crate is Rust's official regular expression library.
Input and Output
Rust's standard library features for input and output are organized around three traits, Read
, BufRead
, and Write
:
- Values that implement
Read
have methods for byte-oriented input. They're called readers. - Values that implement
BufRead
are buffered readers. They support all the methods ofRead
, plus methods for reading lines of text and so forth. - Values that implement
Write
support both byte-oriented and UTF-8 text output. They're called writers.
Rust strings are always valid Unicode. Filenames are almost always Unicode in practice, but Rust has to copy somehow with the rare case where they aren't. This is why Rust has std::ffi::OsStr
and OsString
.
OsStr
is a string type that's a superset of UTF-8. Its job is to be able to represent all filenames, command-line arguments, and environment variables on the current system, whether they're valid Unicode or not.
Path
is exactly like OsStr
, but it adds many handy filename-related methods [...]. Use Path
for both absolute and relative paths. For an individual component of a path, use OsStr
.
[...] for each string type, there's a corresponding owning type: a String
owns a heap-allocated str
, a std::ffi::OsString
owns a heap-allocated OsStr
, and a std::path::PathBuf
owns a heap-allocated Path
.
Concurrency
The simplest use cases for threads arise when we have several completely independent tasks that we'd like to do at once. For example, suppose we're doing natural language processing on a large corpus of documents. [...] Since each document is processed separately, it's relatively easy to speed this task up by splitting the corpus into chunks and processing each chunk on a separate thread [...]. This pattern is called fork-join parallelism. To fork is to start a new thread, and to join a thread is to wait for it to finish.
The function std::thread::spawn
starts a new thread. [...] It takes one argument, an FnOnce
closure or function. Rust starts a new thread to run the code of that closure or function. The new thread is a real operating system thread with its own stack [...].
Joining threads is often necessary for correctness, because a Rust program exits as soon as main
returns, even if other threads are still running. Destructors are not called; the extra threads are just killed. If this isn't what you want, be sure to join any threads you care about before returning from main
.
A channel is a one-way conduit for sending values from one thread to another. In other words, it's a thread-safe queue.
[Channels are] something like Unix pipes: one end is for sending data, and the other is for receiving. The two ends are typically owned by two different threads. But whereas Unix pipes are for sending bytes, channels are for sending Rust values. sender.send(item)
puts a single value into the channel; receiver.recv()
removes one. Ownership is transferred from the sending thread to the receiving thread. If the channel is empty, receiver.recv()
blocks until a value is sent.
The mpsc
part of std::sync::mpsc
stands for multiproducer, single-consumer, a terse description of the kind of communication Rust's channels provide.
[...] Rust's full thread safety story hinges on two built-in traits, std::marker::Send
and std::marker::Sync
.
- Types that implement
Send
are safe to pass by value to another thread. They can be moved across threads. - Types that implement
Sync
are safe to pass by non-mut
reference to another thread. They can be shared across threads.
A mutex (or lock) is used to force multiple threads to take turns when accessing certain data.
Unlike C++, in Rust the protected data is stored inside the Mutex
.
Safe Rust code cannot trigger a data race, a specific kind of bug where multiple threads read and write the same memory concurrently, producing meaningless results.
Valid Rust programs can't have data races, but they can still have other race conditions – situations where a program's behavior depends on timing among threads and may therefore vary from run to run. Some race conditions are benign. Some manifest as general flakiness and incredibly hard-to-fix bugs. Using mutexes in an unstructured way invites race conditions. It's up to you to make sure they're benign.
Whereas a mutex has a single lock
method, a read/write lock has two locking methods, read
and write
. The RwLock::write
method is like Mutex::lock
. It waits for exclusive, mut
access to the protected data. The RwLock::read
method provides non-mut
access, with the advantage that it is less likely to have to wait, because many threads can safely read at once. With a mutex, at any given moment, the protected data has only one reader or writer (or none). With a read/write lock, it can have either one writer or many readers, much like Rust references generally.
Programs can use condition variables to build their own [blocking API]. In Rust, the std::sync::Condvar
type implements condition variables. A Condvar
has methods .wait()
and .notify_all()
; .wait()
blocks until some other thread calls .notify_all()
.
The std::sync::atomic
module contains atomic types for lock-free concurrent programming.
Instead of the usual arithmetic and logical operators, atomic types expose methods that perform atomic operations, individual loads, stores, exchanges, and arithmetic operations that happen safely, as a unit, even if other threads are also performing atomic operations that touch the same memory location.
Asynchronous Programming
Rust's approach to supporting asynchronous operations is to introduce a trait, std::future::Future
. [...] A Future
represents an operation that you can test for completion. A future's poll
method never waits for the operation to finish: it always returns immediately. If the operation is complete, poll
returns Poll::Ready(output)
, where output
is its final result. Otherwise, it returns Pending
. If and when the future is worth polling again, it promises to let us know by invoking a waker, a callback function supplied in the Context
.
This is the general pattern: the asynchronous version of any function takes the same arguments as the synchronous version, but the return type has a Future
wrapped around it.
Unlike an ordinary function, when you call an asynchronous function, it returns immediately, before the body begins execution at all. Obviously, the call's final return value hasn't been computed yet; what you get is a future of its final value.
You don't need to adjust an asynchronous function's return type; Rust automatically treats async fn f(...) -> T
as a function that returns a future of a T
, not a T
directly.
The future's specific type is generated automatically by the compiler, based on the function's body and arguments. This type doesn't have a name; all you know about it is that it implements Future<Output=R>
, where R
is the async function's return type.
An await
expression takes ownership of the future and then polls it. If it's ready, then the future's final value is the value of the await
expression, and execution continues. Otherwise, it returns the Poll::Pending
to its own caller.
The ability to suspend execution mid-function and then resume later is unique to async functions. When an ordinary function returns, its stack frame is gone for good. Since await
expressions depend on the ability to resume, you can only use them inside async functions.
In addition to asynchronous functions, Rust also supports asynchronous blocks. Whereas an ordinary block statement returns the value of its last expression, an async block returns a future of the value of its last expression. [...] An async block looks like an ordinary block statement, preceded by the async
keyword.
In Rust [...] an async call does nothing until you pass it to a function like block_on
, spawn
, or spawn_local
that will poll it and drive the work to completion. These functions, called executors, play the role that other languages cover with a global event loop.
Macros
Macros are a kind of shorthand. During compilation, before types are checked and long before any machine code is generated, each macro call is expanded – that is, it's replaced with some Rust code.
Macro calls are always marked with an exclamation point, so they stand out when you're reading code, and they can't be called accidentally when you meant to call a function.
macro_rules!
is the main way to define macros in Rust.
A macro defined with macro_rules!
works entirely by pattern matching. The body of a macro is just a series of rules:
( pattern1 ) => ( template1 );
( pattern2 ) => ( template2 );
...
Macro patterns are a mini-language within Rust. They're essentially regular expressions for matching code. But where regular expressions operate on characters, patterns operate on tokens – the numbers, names, punctuation marks, and so forth that are the building blocks of Rust programs.
Macro templates aren't much different from any of a dozen template languages commonly used in web programming. The only difference [...] is that the output is Rust code.
[...] the syntax $( PATTERN ),*
is used to match any comma-separated list, where each item in the list matches PATTERN
. The *
here has the same meaning as in regular expressions ("0 or more") [...]. You can also use +
to require at least one match, or ?
for zero or one match.
The first job in writing any complex macro is figuring out how to match, or parse, the desired input.
Procedural macros support extending the #[derive]
attribute to handle custom derivations [...] as well as creating custom attributes and new macros that are invoked just like the macro_rules!
macros [...].
What makes a procedural macro "procedural" is that it's implemented as a Rust function, not a declarative rule set. This function interacts with the compiler through a thin layer of abstraction and can be arbitrarily complex.
Unsafe Code
Unsafe code lets you tell Rust, "I am opting to use features whose safety you cannot guarantee." By marking off a block or function as unsafe, you acquire the ability to call unsafe
functions in the standard library, dereference unsafe pointers, and call functions written in other languages like C and C++, among other powers. Rust's other safety checks still apply: type checks, lifetime checks, and bounds checks on indices all occur normally. Unsafe code just enables a small set of additional features.
An unsafe feature is one that imposes a contract: rules that Rust cannot enforce automatically, but which you must nonetheless follow to avoid undefined behavior. A contract goes beyond the usual type checks and lifetime checks, imposing further rules specific to that unsafe feature. Typically, Rust itself doesn't know about the contract at all; it's just explained in the feature's documentation.
When you use unsafe features, you, as the programmer, bear the responsibility for checking that your code adheres to their contracts.
[...] by forcing you to write an unsafe
block or function, Rust makes sure you have acknowledged that your code may have additional rules to follow.
An unsafe
block looks just like an ordinary Rust block preceded by the unsafe
keyword, with the difference that you can use unsafe features in the block.
An unsafe
function definition looks like an ordinary function definition preceded by the unsafe
keyword. The body of an unsafe
function is automatically considered an unsafe
block. You may call unsafe
functions only within unsafe
blocks. This means that marking a function unsafe
warns its callers that the function has a contract they must satisfy to avoid undefined behavior.
Essentially, Rust's type checker, borrow checker, and other static checks are inspecting your program and trying to construct proof that it cannot exhibit undefined behavior. When Rust compiles your program successfully, that means it succeeded in proving your code sound. An unsafe
block is a gap in this proof: "This code," you are saying to Rust, "is fine, trust me." Whether your claim is true could depend on any part of the program that influences what happens in the unsafe
block, and the consequences of being wrong could appear anywhere influenced by the unsafe
block. Writing the unsafe
keyword amounts to a reminder that you are not getting the full benefit of the language's safety checks.
An unsafe trait
is a trait that has a contract Rust cannot check or enforce that implementers must satisfy to avoid undefined behavior. To implement an unsafe trait, you must mark the implementation as unsafe. It is up to you to understand the trait's contract and make sure your type satisfies it.
A raw pointer in Rust is an unconstrained pointer. You can use raw pointers to form all sorts of structures that Rust's checked pointer types cannot, like doubly linked lists or arbitrary graphs of objects. But because raw pointers are so flexible, Rust cannot tell whether you are using them safely or not, so you can dereference them only in an unsafe
block.
There are two kinds of raw pointers:
- A
*mut T
is a raw pointer to aT
that permits modifying its referent. - A
*const T
is a raw pointer to aT
that only permits reading its referent.
Rust lays out the elements of an array, slice, or vector as a single contiguous block of memory [...]. Elements are regularly spaced, so that if each element occupies size
bytes, then the i
th element starts with the i * size
th byte. One nice consequence of this is that if you have two raw pointers to elements of an array, comparing the pointers gives the same results as comparing the elements' indices: if i < j
, then a raw pointer to the i
th element is less than a raw pointer to the j
th element.
Rust provides many useful abstractions, but ultimately, the software you write is just pushing bytes around. Unions are one of Rust's most powerful features for manipulating those bytes and choosing how they are interpreted. For instance, any collection of 32 bits – 4 bytes – can be interpreted as an integer or as a floating-point number. Either interpretation is valid, though interpreting data meant for one as the other will likely result in nonsense.
Where the fields of a struct refer to different positions in memory, the fields of a union refer to different interpretations of the same sequence of bits. Assigning to a different field simply means overwriting some or all of those bits, in accordance with an appropriate type.
While constructing a union or assigning to its fields is completely safe, reading from any field of a union is always unsafe. [...] This is because, unlike enums, unions don't have a tag. The compiler adds no additional bits to tell variants apart.
Foreign Functions
Rust's foreign function interface (FFI) lets Rust code call functions written in C, and in some cases C++. Since most operating systems offer C interfaces, Rust's foreign function interface allows immediate access to all sorts of low-level facilities.
The common denominator of Rust and C is machine language, so in order to anticipate what Rust values look like to C code, or vice versa, you need to consider their machine-level representations.
[...] Rust's std::os::raw
module defines a set of Rust types that are guaranteed to have the same representation as certain C types.
For defining Rust struct types compatible with C structs, you can use the #[repr(C)]
attribute. Placing #[repr(C)]
above a struct definition asks Rust to lay out the struct's fields in memory the same way a C compiler would lay out the analogous C struct type.
An extern
block declares functions or variables defined in some other library that the final Rust executable will be linked with.
To use functions provided by a particular library, you can place a #[link]
attribute atop the extern
block that names the library Rust should link the executable with.
You can tell Rust where to search for libraries by writing a build script, Rust code that Cargo compiles and runs at build time. [...] To create your build script, add a file named build.rs in the same directory as the Cargo.toml file [...].