1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
//! An explanation of the crate's naming conventions.
//!
//! This crate attempts to follow the general naming scheme of `verb_type` when
//! the operation is "simple", and `verb_description_words_type` when the
//! operation (op) needs to be more specific than normal. Like this:
//! * `add_m128`
//! * `add_saturating_i8_m128i`
//!
//! ## Types
//! Currently, only `x86` and `x86_64` types are supported. Among those types:
//! * `m128` and `m256` are always considered to hold `f32` lanes.
//! * `m128d` and `m256d` are always considered to hold `f64` lanes.
//! * `m128i` and `m256i` hold integer data, but each op specifies what lane
//!   width of integers the operation uses.
//! * If the type has `_s` on the end then it's a "scalar" operation that
//!   affects just the lowest lane. The other lanes are generally copied forward
//!   from one of the inputs, though the details there vary from op to op.
//! * The SIMD types are often referred to as "registers" because each SIMD
//!   typed value represents exactly one CPU register when you're doing work.
//!
//! ## Operations
//! There's many operations that can be performed. When possible, `safe_arch`
//! tries to follow normal Rust naming (eg: adding is still `add` and left
//! shifting is still `shl`), but if an operation doesn't normally exist at all
//! in Rust then we basically have to make something up.
//!
//! Many operations have more than one variant, such as `add` and also
//! `add_saturating`. In this case, `safe_arch` puts the "core operation" first
//! and then any "modifiers" go after, which isn't how you might normally say it
//! in English, but it makes the list of functions sort better.
//!
//! As a general note on SIMD terminology: When an operation uses the same
//! indexed lane in two _different_ registers to determine the output, that is a
//! "vertical" operation. When an operation uses more than one lane in the
//! _same_ register to determine the output, that is a "horizontal" operation.
//! * Vertical: `out[0] = a[0] + b[0]`, `out[1] = a[1] + b[1]`
//! * Horizontal: `out[0] = a[0] + a[1]`, `out[1] = b[0] + b[1]`
//!
//! ## Operation Glossary
//! Here follows the list of all the main operations and their explanations.
//!
//! * `abs`: Absolute value (wrapping).
//! * `add`: Addition. This is "wrapping" by default, though some other types of
//!   addition are available. Remember that wrapping signed addition is the same
//!   as wrapping unsigned addition.
//! * `average`: Averages the two inputs.
//! * `bitand`: Bitwise And, `a & b`, like [the trait](core::ops::BitAnd).
//! * `bitandnot`: Bitwise `(!a) & b`. This seems a little funny at first but
//!   it's useful for clearing bits. The output will be based on the `b` side's
//!   bit pattern, but with all active bits in `a` cleared:
//!   * `bitandnot(0b0010, 0b1011) == 0b1001`
//! * `bitor`: Bitwise Or, `a | b`, like [the trait](core::ops::BitOr).
//! * `bitxor`: Bitwise eXclusive Or, `a ^ b`, like [the
//!   trait](core::ops::BitXor).
//! * `blend`: Merge the data lanes of two SIMD values by taking either the `b`
//!   value or `a` value for each lane. Depending on the instruction, the blend
//!   mask can be either an immediate or a runtime value.
//! * `cast`: Convert between data types while preserving the exact bit
//!   patterns, like how [`transmute`](core::mem::transmute) works.
//! * `ceil`: "Ceiling", rounds towards positive infinity.
//! * `cmp`: Numeric comparisons of various kinds. This generally gives "mask"
//!   output where the output value is of the same data type as the inputs, but
//!   with all the bits in a "true" lane as 1 and all the bits in a "false" lane
//!   as 0. Remember that with floating point values all 1s bits is a NaN, and
//!   with signed integers all 1s bits is -1.
//!   * An "Ordered comparison" checks if _neither_ floating point value is NaN.
//!   * An "Unordered comparison" checks if _either_ floating point value is
//!     NaN.
//! * `convert`: This does some sort of numeric type change. The details can
//!   vary wildly. Generally, if the number of lanes goes down then the lowest
//!   lanes will be kept. If the number of lanes goes up then the new high lanes
//!   will be zero.
//! * `div`: Division.
//! * `dot_product`: This works like the matrix math operation. The lanes are
//!   multiplied and then the results are summed up into a single value.
//! * `duplicate`: Copy the even or odd indexed lanes to the other set of lanes.
//!   Eg, `[1, 2, 3, 4]` becomes `[1, 1, 3, 3]` or `[2, 2, 4, 4]`.
//! * `extract`: Get a value from the lane of a SIMD type into a scalar type.
//! * `floor`: Rounds towards negative infinity.
//! * `fused`: All the fused operations are a multiply as well as some sort of
//!   adding or subtracting. The details depend on which fused operation you
//!   select. The benefit of this operation over a non-fused operation are that
//!   it can compute slightly faster than doing the mul and add separately, and
//!   also the output can have higher accuracy in the result.
//! * `insert`: The opposite of `extract`, this puts a new value into a
//!   particular lane of a SIMD type.
//! * `load`: Reads an address and makes a SIMD register value. The details can
//!   vary because there's more than one type of `load`, but generally this is a
//!   `&T -> U` style operation.
//! * `max`: Picks the larger value from each of the two inputs.
//! * `min`: Picks the smaller value from each of the two inputs.
//! * `mul`: Multiplication. For floating point this is just "normal"
//!   multiplication, but for integer types you tend to have some options. An
//!   integer multiplication of X bits will produce a 2X bit output, so
//!   generally you'll get to pick if you want to keep the high half of that,
//!   the low half of that (a normal "wrapping" mul), or "widen" the outputs to
//!   be all the bits at the expense of not multiplying half the lanes the
//!   lanes.
//! * `pack`: Take the integers in the `a` and `b` inputs, reduce them to fit
//!   within the half-sized integer type (eg: `i16` to `i8`), and pack them all
//!   together into the output.
//! * `population`: The "population" operations refer to the bits within an
//!   integer. Either counting them or adjusting them in various ways.
//! * `rdrand`: Use the hardware RNG to make a random value of the given length.
//! * `rdseed`: Use the hardware RNG to make a random seed of the given length.
//!   This is less commonly available, but theoretically an improvement over
//!   `rdrand` in that if you have to combine more than one usage of this
//!   operation to make your full seed size then the guess difficulty rises at a
//!   multiplicative rate instead of just an additive rate. For example, two
//!   `u64` outputs concatenated to a single `u128` have a guess difficulty of
//!   2^(64*64) with `rdseed` but only 2^(64+64) with `rdrand`.
//! * `read_timestamp_counter`: Lets you read the CPU's cycle counter, which
//!   doesn't strictly mean anything in particular since even the CPU's clock
//!   rate isn't even stable over time, but you might find it interesting as an
//!   approximation during benchmarks, or something like that.
//! * `reciprocal`: Turns `x` into `1/x`. Can also be combined with a `sqrt`
//!   operation.
//! * `round`: Convert floating point values to whole numbers, according to one
//!   of several available methods.
//! * `set`: Places a list of scalar values into a SIMD lane. Conceptually
//!   similar to how building an array works in Rust.
//! * `splat`: Not generally an operation of its own, but a modifier to other
//!   operations such as `load` and `set`. This will copy a given value across a
//!   SIMD type as many times as it can be copied. For example, a 32-bit value
//!   splatted into a 128-bit register will be copied four times.
//! * `shl`: Bit shift left. New bits shifted in are always 0. Because the shift
//!   is the same for both signed and unsigned values, this crate simply marks
//!   left shift as always being an unsigned operation.
//!   * You can shift by an immediate value ("imm"), all lanes by the same value
//!     ("all"), or each lane by its own value ("each").
//! * `shr`: Bit shift right. This comes in two forms: "Arithmetic" shifts shift
//!   in the starting sign bit (which preserves the sign of the value), and
//!   "Logical" shifts shift in 0 regardless of the starting sign bit (so the
//!   result ends up being positive). With normal Rust types, signed integers
//!   use arithmetic shifts and unsigned integers use logical shifts, so these
//!   functions are marked as being for signed or unsigned integers
//!   appropriately.
//!   * As with `shl`, you can shift by an immediate value ("imm"), all lanes by
//!     the same value ("all"), or each lane by its own value ("each").
//! * `sign_apply`: Multiplies one set of values by the signum (1, 0, or -1) of
//!   another set of values.
//! * `sqrt`: Square Root.
//! * `store`: Writes a SIMD value to a memory location.
//! * `string_search`: A rather specialized instruction that lets you do byte
//!   based searching within a register. This lets you do some very high speed
//!   searching through ASCII strings when the stars align.
//! * `sub`: Subtract.
//! * `shuffle`: This lets you re-order the data lanes. Sometimes x86/x64 calls
//!   this is called "shuffle", and sometimes it's called "permute", and there's
//!   no particular reasoning behind the different names, so we just call them
//!   all shuffle.
//!   * `shuffle_{args}_{lane-type}_{lane-sources}_{simd-type}`.
//!   * "args" is the input arguments: `a` (one arg) or `ab` (two args), then
//!     either `v` (runtime-varying) or `i` (immediate). All the immediate
//!     shuffles are macros, of course.
//!   * "lane type" is `f32`, `f64`, `i8`, etc. If there's a `z` after the type
//!     then you'll also be able to zero an output position instead of making it
//!     come from a particular source lane.
//!   * "lane sources" is generally either "all" which means that all lanes can
//!     go to all other lanes, or "half" which means that each half of the lanes
//!     is isolated from the other half, and you can't cross data between the
//!     two halves, only within a half (this is how most of the 256-bit x86/x64
//!     shuffles work).
//! * `unpack`: Takes a SIMD value and gets out some of the lanes while widening
//!   them, such as converting `i16` to `i32`.