2022-11-14
An enum is made up of zero or more variants and a discriminator.
enum Choice {
A, // <-|
B, // <-|- variants
C // <-|
}
The discriminator is an integer indicating which variant is held.
Given the following code: let a = Choice::A;
the discriminant would be zero.
Given the following code: let c = Choice::C;
the discriminant would be two.
An enum without associated data is a "fieldless" enum.
It is possible to assign a custom discriminant value to a fieldless enum:
enum Choice {
A = 10,
B = 20,
C,
}
assert_eq!(Choice::A as usize, 10);
assert_eq!(Choice::B as usize, 20);
assert_eq!(Choice::C as usize, 21);
If no specific value is given as a discriminant the value is that of the previous variant plus one.
Note that Choice::C
is the same as Choice::B + 1
.
An enum with no variants can never be created. There are few use cases for such an enum but they do exist.
One such use case is the error type on a Result<T, E>
that can never fail, but
the return type still has to be a Result<T, E>
.
There is currently an enum in the standard library that exists for this reason: Infallible.
The size of an enum is the size of its largest variant + the discriminant (unless the discriminant can be optimised away, more on this below).
Given the following enum, we can look at the size of each variant:
use std::mem::size_of; // returns the size of a type
enum Data {
SingleByte(u8), // size of u8
Bytes(Vec<u8>), // size of Vec<u8>
NoData // zero size
}
The size of Data::SingleByte(u8)
is one (as one byte is needed to store a u8
).
A Vec<T>
contains more
information:
usize
)usize
) of the vectorusize
)On a 64 bit architecture a usize
is a 64 bit integer, making it eight bytes in
size.
use std::mem::size_of;
let size = size_of::<usize>() // first the pointer
+ size_of::<usize>() // then the capacity
+ size_of::<usize>(); // then the length
// usize = 8 bytes on a 64 bit architecture
assert_eq(size, 24);
This makes Data::Bytes(byte_vec)
the largest variant, taking up 24 bytes (assuming 64 bit arch).
The values inside the vector is not taken into consideration when talking about the size of the vector here.
It might seem wasteful that [Data::NoData, Data::NoData]
takes up the same
amount of space as [Data::Bytes(bytes), Data::Bytes(more_bytes)]
, but it makes
sense from the perspective of a stack frame, or storing
multiple enums in a collection (like an array or a vector).
Given the following:
let data: Vec<Data> = get_data();
let some_payload = &data[4];
Without a fixed size it would be rather difficult to jump straight to the fifth entry in the vector, nor would it be possible to determine the size of a stack frame, something Rust needs to know at compile time.
The compiler can perform optimisations on an enum depending on the variants.
If the size of an enum is the size of the largest variant plus the discriminant,
and the discriminant is eight bytes by default, the following enum should have a
size of 32 bytes (the Vec<u8>
is 24, and the discriminant is eight).
enum Data {
Bytes(Vec<u8>),
NoData
}
However this enum has a size of 24 bytes.
To observe this it's possible to transmute the enum to bytes:
let data = Data::Bytes(vec![1, 2, 3]);
let bytes: [u8; mem::size_of::<Data>()] = unsafe { mem::transmute(data) };
println!("{:?}", bytes);
// Bytes:
[
// Pointer to the first value on the heap
160, 10, 43, 168, 200, 85, 0, 0,
// Capacity
1, 0, 0, 0, 0, 0, 0, 0,
// Size
1, 0, 0, 0, 0, 0, 0, 0
]
There is no discriminant in this data.
If the second variant was transmuted to bytes the output would look something like this:
[
0, 0, 0, 0, 0, 0, 0, 0,
47, 236, 38, 139, 176, 127, 0, 0,
16, 170, 58, 244, 208, 85, 0, 0
]
Note that the first eight bytes here are zero, a value that would be invalid for the vector. Thus the discriminant can be determined by the first bytes, rather than adding an additional eight bytes specifically for it.
A vector is composed of three values.
The first value is a pointer to the heap. This value can never be zero, as that is reserved for the null pointer, and since Rust doesn't allow null values it would be an invalid vector if the pointer component was 0x0.
The compiler can perform something called "niche filling".
A niche in this context means a bit pattern that isn't valid.
Take the following enum:
enum State {
Processing(u8, bool),
Done,
}
The first variant has two types: u8
and bool
.
This means the first byte can not be used to represent the discriminant as any
byte would be valid here.
However the second type is a boolean, and there are only two valid values for a boolean:
zero or one.
Therefore we can use two to represent the second variant: State::Done
.
// State::Processing(42, false)
[42, 0]
// State::Processing(42, true)
[42, 1]
// State::Done
[16, 2] // here 16 is an arbitrary value, only 2 is relevant
The layout of types in Rust is not considered stable as Rust has an unstable ABI (Application Binary Interface).
This means that the output of std::mem::transmute
might not be the same between
different versions of the compiler.
The advantage of this is that new optimisations can be added to the compiler without breaking existing code.
The disadvantage is that it's not safe to transmute a series of bytes into a type as the byte order, or even the size of the type might change from one compiler to another.
Thanks to twitch.tv/museun for reading and correcting my mistakes