Table of Contents
Numeric / Math / Linear Algebra
Numerically Robust Variance and weighted sum
- John D. Cook's Accurately computing running variance
-
-
C/C++ Libraries
- C: Intel(R) Math Kernel Library (MKL)
- C: Intel(R) Integrated Performance Primitives (IPP)
- C: Vector Optimized Library of Kernels (libVOLK)
- currently GPLv3 license
- future version 3 will be LGPL, see https://www.libvolk.org/help-us-relicense-volk-under-lgpl.html
- C++ Eigen
- pre-allocated arrays can be mapped to Eigen's Matrix/Vector types
- see https://eigen.tuxfamily.org/dox/group__matrixtypedefs.html for the native types
- matrix self transpose/adjoint multiply optimization, e.g. for least squares
- Boost Interval Arithmetic
- C++ Armadillo
- Fastor
- libxsmm
- NumPy
- NumPy is python, but there are promising c++ libraries:
-
- optionally utilizes https://github.com/xtensor-stack/xsimd / https://xsimd.readthedocs.io/en/latest/
- other libraries:
IEEE-754 Float Numbers
-
- existence of denormalized numbers (aka subnormal numbers) should be known
- existence of signed zeros might be interesting
- existence of following non-numbers (Not-a-Number: NaN) should be known
- quiet NaN (qNaN)
- signaling NaN (sNaN)
- positive and negative infinity (+/- inf)
- rounding modes can be controlled
- nearest, towards 0, towards + or -inf
- be aware, that rotating a 2D-point or complex coordinate in a loop will go towards zero, due to error propagation with default rounding mode 'round towards zero' !
- exceptions can be handled, by setting up float-traps/signal handler
Performance Issues with Denormals and NANs
Calculation with denormals or non-numbers slows down performance - even when not signalled.
it might be interesting to abort a calculation, e.g. a matrix/vector multiplication, with first occurence of NaN - or with one of the other conditions .. Unfortunately, SIMD instruction sets have issues on exception trapping and NAN propagation. See Agner Fog's article at https://www.agner.org/optimize/#nan_propagation
Special options like DAZ (Denormals-Are-Zero) and FTZ (Flush-To-Zero) can be used, if the application won't care about very small denomalized numbers, see https://en.wikipedia.org/wiki/Subnormal_number
IPP library does also provide helper functions:
Fast cache-efficient matrix transposition
The topic is explained at wikipedia - with a specialized article for the in-place operation. Matrix transposition can also be utilized in the field of image processing. (De-)Interleaving is a different wording for the same operation, e.g. multi-channel audio data. See https://stackoverflow.com/questions/7780279/de-interleave-an-array-in-place
Transposing a matrix the simple way will produce many cache misses. That is, why special algorithms, like Cache-oblivious ones, are beneficial. There are numerous scientific papers on this topic, e.g. Cache-efficient matrix transposition. But the problem is also discussed on stackoverflow - happily with some code snippets.
In general, one should consider following aspects:
- rectangular or square matrix
- transpose only from/to rectangular regions of bigger matrices or images
- in-place or out-of-place operation
- combination with conjugation
Here some libraries, which should be quite performance efficient, providing transpose functions:
-
- unfortunately no smaller data types than float
-
- returns new created/transposed matrix
- pre-allocated arrays can be mapped to Eigen's Matrix/Vector types
- see https://eigen.tuxfamily.org/dox/group__matrixtypedefs.html for the native types
- Armadillo: trans() - also with slower in-place variant
- returns new created/transposed matrix
- advanced constructor with
copy_aux_mem = false
for using external data directly
github.com also produces many results, when searching for “transpose”.
Pavel Zemtsov wrote a bunch of related articles at Experiments in program optimisation, backed with sources at https://github.com/pzemtsov/article-e1-cache and https://github.com/pzemtsov/article-E1-demux-C:
other links:
there's also a new library: https://github.com/hayguen/libtranspose
Links
- Numerical Computation Guide
- Several Blog Entries of Bruce Dawson
- C99 and C++11 Floating-point environment reference