1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
#![no_std]
#![warn(missing_docs)]
#![allow(unused_imports)]
#![allow(clippy::too_many_arguments)]
#![allow(clippy::transmute_ptr_to_ptr)]
#![cfg_attr(docs_rs, feature(doc_cfg))]

//! A crate that safely exposes arch intrinsics via `#[cfg()]`.
//!
//! `safe_arch` lets you safely use CPU intrinsics. Those things in the
//! [`core::arch`](core::arch) modules. It works purely via `#[cfg()]` and
//! compile time CPU feature declaration. If you want to check for a feature at
//! runtime and then call an intrinsic or use a fallback path based on that then
//! this crate is sadly not for you.
//!
//! SIMD register types are "newtype'd" so that better trait impls can be given
//! to them, but the inner value is a `pub` field so feel free to just grab it
//! out if you need to. Trait impls of the newtypes include: `Default` (zeroed),
//! `From`/`Into` of appropriate data types, and appropriate operator
//! overloading.
//!
//! * Most intrinsics (like addition and multiplication) are totally safe to use
//!   as long as the CPU feature is available. In this case, what you get is 1:1
//!   with the actual intrinsic.
//! * Some intrinsics take a pointer of an assumed minimum alignment and
//!   validity span. For these, the `safe_arch` function takes a reference of an
//!   appropriate type to uphold safety.
//!   * Try the [bytemuck](https://docs.rs/bytemuck) crate (and turn on the
//!     `bytemuck` feature of this crate) if you want help safely casting
//!     between reference types.
//! * Some intrinsics are not safe unless you're _very_ careful about how you
//!   use them, such as the streaming operations requiring you to use them in
//!   combination with an appropriate memory fence. Those operations aren't
//!   exposed here.
//! * Some intrinsics mess with the processor state, such as changing the
//!   floating point flags, saving and loading special register state, and so
//!   on. LLVM doesn't really support you messing with that within a high level
//!   language, so those operations aren't exposed here. Use assembly or
//!   something if you want to do that.
//!
//! ## Naming Conventions
//! The `safe_arch` crate does not simply use the "official" names for each
//! intrinsic, because the official names are generally poor. Instead, the
//! operations have been given better names that makes things hopefully easier
//! to understand then you're reading the code.
//!
//! For a full explanation of the naming used, see the [Naming
//! Conventions](crate::naming_conventions) page.
//!
//! ## Current Support
//! * `x86` / `x86_64` (Intel, AMD, etc)
//!   * 128-bit: `sse`, `sse2`, `sse3`, `ssse3`, `sse4.1`, `sse4.2`
//!   * 256-bit: `avx`, `avx2`
//!   * Other: `adx`, `aes`, `bmi1`, `bmi2`, `fma`, `lzcnt`, `pclmulqdq`,
//!     `popcnt`, `rdrand`, `rdseed`
//!
//! ## Compile Time CPU Target Features
//!
//! At the time of me writing this, Rust enables the `sse` and `sse2` CPU
//! features by default for all `i686` (x86) and `x86_64` builds. Those CPU
//! features are built into the design of `x86_64`, and you'd need a _super_ old
//! `x86` CPU for it to not support at least `sse` and `sse2`, so they're a safe
//! bet for the language to enable all the time. In fact, because the standard
//! library is compiled with them enabled, simply trying to _disable_ those
//! features would actually cause ABI issues and fill your program with UB
//! ([link][rustc_docs]).
//!
//! If you want additional CPU features available at compile time you'll have to
//! enable them with an additional arg to `rustc`. For a feature named `name`
//! you pass `-C target-feature=+name`, such as `-C target-feature=+sse3` for
//! `sse3`.
//!
//! You can alternately enable _all_ target features of the current CPU with `-C
//! target-cpu=native`. This is primarily of use if you're building a program
//! you'll only run on your own system.
//!
//! It's sometimes hard to know if your target platform will support a given
//! feature set, but the [Steam Hardware Survey][steam-survey] is generally
//! taken as a guide to what you can expect people to have available. If you
//! click "Other Settings" it'll expand into a list of CPU target features and
//! how common they are. These days, it seems that `sse3` can be safely assumed,
//! and `ssse3`, `sse4.1`, and `sse4.2` are pretty safe bets as well. The stuff
//! above 128-bit isn't as common yet, give it another few years.
//!
//! **Please note that executing a program on a CPU that doesn't support the
//! target features it was compiles for is Undefined Behavior.**
//!
//! Currently, Rust doesn't actually support an easy way for you to check that a
//! feature enabled at compile time is _actually_ available at runtime. There is
//! the "[feature_detected][feature_detected]" family of macros, but if you
//! enable a feature they will evaluate to a constant `true` instead of actually
//! deferring the check for the feature to runtime. This means that, if you
//! _did_ want a check at the start of your program, to confirm that all the
//! assumed features are present and error out when the assumptions don't hold,
//! you can't use that macro. You gotta use CPUID and check manually. rip.
//! Hopefully we can make that process easier in a future version of this crate.
//!
//! [steam-survey]:
//! https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam
//! [feature_detected]:
//! https://doc.rust-lang.org/std/index.html?search=feature_detected
//! [rustc_docs]: https://doc.rust-lang.org/rustc/targets/known-issues.html
//!
//! ### A Note On Working With Cfg
//!
//! There's two main ways to use `cfg`:
//! * Via an attribute placed on an item, block, or expression:
//!   * `#[cfg(debug_assertions)] println!("hello");`
//! * Via a macro used within an expression position:
//!   * `if cfg!(debug_assertions) { println!("hello"); }`
//!
//! The difference might seem small but it's actually very important:
//! * The attribute form will include code or not _before_ deciding if all the
//!   items named and so forth really exist or not. This means that code that is
//!   configured via attribute can safely name things that don't always exist as
//!   long as the things they name do exist whenever that code is configured
//!   into the build.
//! * The macro form will include the configured code _no matter what_, and then
//!   the macro resolves to a constant `true` or `false` and the compiler uses
//!   dead code elimination to cut out the path not taken.
//!
//! This crate uses `cfg` via the attribute, so the functions it exposes don't
//! exist at all when the appropriate CPU target features aren't enabled.
//! Accordingly, if you plan to call this crate or not depending on what
//! features are enabled in the build you'll also need to control your use of
//! this crate via cfg attribute, not cfg macro.

use core::{
  convert::AsRef,
  fmt::{Binary, Debug, Display, LowerExp, LowerHex, Octal, UpperExp, UpperHex},
  ops::{Add, AddAssign, BitAnd, BitAndAssign, BitOr, BitOrAssign, BitXor, BitXorAssign, Div, DivAssign, Mul, MulAssign, Neg, Not, Sub, SubAssign},
};

pub mod naming_conventions;

/// Declares a private mod and then a glob `use` with the visibility specified.
macro_rules! submodule {
  ($v:vis $name:ident) => {
    mod $name;
    $v use $name::*;
  };
  ($v:vis $name:ident { $($content:tt)* }) => {
    mod $name { $($content)* }
    $v use $name::*;
  };
}

// Note(Lokathor): Stupid as it sounds, we need to put the imports here at the
// crate root because the arch-specific macros that we define in our inner
// modules are actually "scoped" to also be at the crate root. We want the
// rustdoc generation of the macros to "see" these imports so that the docs link
// over to the `core::arch` module correctly.
// https://github.com/rust-lang/rust/issues/72243

#[cfg(target_arch = "x86")]
use core::arch::x86::*;
#[cfg(target_arch = "x86_64")]
use core::arch::x86_64::*;

#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
submodule!(pub x86_x64 {
  //! Types and functions for safe `x86` / `x86_64` intrinsic usage.
  //!
  //! `x86_64` is essentially a superset of `x86`, so we just lump it all into
  //! one module. Anything not available on `x86` simply won't be in the build
  //! on that arch.
  use super::*;

  submodule!(pub m128_);
  submodule!(pub m128d_);
  submodule!(pub m128i_);

  submodule!(pub m256_);
  submodule!(pub m256d_);
  submodule!(pub m256i_);

  // Note(Lokathor): We only include these sub-modules with the actual functions
  // if the feature is enabled. Ae *also* have a cfg attribute on the inside of
  // the modules as a "double-verification" of sorts. Technically either way on
  // its own would also be fine.

  // These CPU features follow a fairly clear and strict progression that's easy
  // to remember. Most of them offer a fair pile of new functions.
  #[cfg(target_feature = "sse")]
  submodule!(pub sse);
  #[cfg(target_feature = "sse2")]
  submodule!(pub sse2);
  #[cfg(target_feature = "sse3")]
  submodule!(pub sse3);
  #[cfg(target_feature = "ssse3")]
  submodule!(pub ssse3);
  #[cfg(target_feature = "sse4.1")]
  submodule!(pub sse4_1);
  #[cfg(target_feature = "sse4.2")]
  submodule!(pub sse4_2);
  #[cfg(target_feature = "avx")]
  submodule!(pub avx);
  #[cfg(target_feature = "avx2")]
  submodule!(pub avx2);

  // These features aren't as easy to remember the progression of and they each
  // only add a small handful of functions.
  #[cfg(target_feature = "adx")]
  submodule!(pub adx);
  #[cfg(target_feature = "aes")]
  submodule!(pub aes);
  #[cfg(target_feature = "bmi1")]
  submodule!(pub bmi1);
  #[cfg(target_feature = "bmi2")]
  submodule!(pub bmi2);
  #[cfg(target_feature = "fma")]
  submodule!(pub fma);
  #[cfg(target_feature = "lzcnt")]
  submodule!(pub lzcnt);
  #[cfg(target_feature = "pclmulqdq")]
  submodule!(pub pclmulqdq);
  #[cfg(target_feature = "popcnt")]
  submodule!(pub popcnt);
  #[cfg(target_feature = "rdrand")]
  submodule!(pub rdrand);
  #[cfg(target_feature = "rdseed")]
  submodule!(pub rdseed);

  /// Reads the CPU's timestamp counter value.
  ///
  /// This is a monotonically increasing time-stamp that goes up every clock
  /// cycle of the CPU. However, since modern CPUs are variable clock rate
  /// depending on demand this can't actually be used for telling the time. It
  /// also does _not_ fully serialize all operations, so previous instructions
  /// might still be in progress when this reads the timestamp.
  ///
  /// * **Intrinsic:** `_rdtsc`
  /// * **Assembly:** `rdtsc`
  pub fn read_timestamp_counter() -> u64 {
    // Note(Lokathor): This was changed from i64 to u64 at some point, but
    // everyone ever was already casting this value to `u64` so crater didn't
    // even consider it a problem. We will follow suit.
    #[allow(clippy::unnecessary_cast)]
    unsafe { _rdtsc() as u64 }
  }

  /// Reads the CPU's timestamp counter value and store the processor signature.
  ///
  /// This works similar to [`read_timestamp_counter`] with two main
  /// differences:
  /// * It and also stores the `IA32_TSC_AUX MSR` value to the reference given.
  /// * It waits on all previous instructions to finish before reading the
  ///   timestamp (though it doesn't prevent other instructions from starting).
  ///
  /// As with `read_timestamp_counter`, you can't actually use this to tell the
  /// time.
  ///
  /// * **Intrinsic:** `__rdtscp`
  /// * **Assembly:** `rdtscp`
  pub fn read_timestamp_counter_p(aux: &mut u32) -> u64 {
    unsafe { __rdtscp(aux) }
  }

  /// Swap the bytes of the given 32-bit value.
  ///
  /// ```
  /// # use safe_arch::*;
  /// assert_eq!(byte_swap_i32(0x0A123456), 0x5634120A);
  /// ```
  /// * **Intrinsic:** `_bswap`
  /// * **Assembly:** `bswap r32`
  pub fn byte_swap_i32(i: i32) -> i32 {
    unsafe { _bswap(i) }
  }

  /// Swap the bytes of the given 64-bit value.
  ///
  /// ```
  /// # use safe_arch::*;
  /// assert_eq!(byte_swap_i64(0x0A123456_789ABC01), 0x01BC9A78_5634120A);
  /// ```
  /// * **Intrinsic:** `_bswap64`
  /// * **Assembly:** `bswap r64`
  #[cfg(target_arch="x86_64")]
  pub fn byte_swap_i64(i: i64) -> i64 {
    unsafe { _bswap64(i) }
  }

  /// Turns a round operator token to the correct constant value.
  #[macro_export]
  #[cfg_attr(docs_rs, doc(cfg(target_feature = "avx")))]
  macro_rules! round_op {
    (Nearest) => {{
      #[cfg(target_arch = "x86")]
      use ::core::arch::x86::{
        _MM_FROUND_NO_EXC, _MM_FROUND_TO_NEAREST_INT,
      };
      #[cfg(target_arch = "x86_64")]
      use ::core::arch::x86_64::{
        _MM_FROUND_NO_EXC, _MM_FROUND_TO_NEAREST_INT,
      };
      _MM_FROUND_NO_EXC | _MM_FROUND_TO_NEAREST_INT
    }};
    (NegInf) => {{
      #[cfg(target_arch = "x86")]
      use ::core::arch::x86::{
        _MM_FROUND_NO_EXC, _MM_FROUND_TO_NEG_INF,
      };
      #[cfg(target_arch = "x86_64")]
      use ::core::arch::x86_64::{
        _MM_FROUND_NO_EXC, _MM_FROUND_TO_NEG_INF,
      };
      _MM_FROUND_NO_EXC | _MM_FROUND_TO_NEG_INF
    }};
    (PosInf) => {{
      #[cfg(target_arch = "x86")]
      use ::core::arch::x86::{
        _MM_FROUND_NO_EXC, _MM_FROUND_TO_POS_INF,
      };
      #[cfg(target_arch = "x86_64")]
      use ::core::arch::x86_64::{
        _MM_FROUND_NO_EXC, _MM_FROUND_TO_POS_INF,
      };
      _MM_FROUND_NO_EXC | _MM_FROUND_TO_POS_INF
    }};
    (Zero) => {{
      #[cfg(target_arch = "x86")]
      use ::core::arch::x86::{
        _mm256_round_pd, _MM_FROUND_NO_EXC, _MM_FROUND_TO_ZERO,
      };
      #[cfg(target_arch = "x86_64")]
      use ::core::arch::x86_64::{
        _mm256_round_pd, _MM_FROUND_NO_EXC, _MM_FROUND_TO_ZERO,
      };
      _MM_FROUND_NO_EXC | _MM_FROUND_TO_ZERO
    }};
  }
});