Table of Contents

Numeric / Math / Linear Algebra

Numerically Robust Variance and weighted sum

C/C++ Libraries

IEEE-754 Float Numbers

Performance Issues with Denormals and NANs

Calculation with denormals or non-numbers slows down performance - even when not signalled.
it might be interesting to abort a calculation, e.g. a matrix/vector multiplication, with first occurence of NaN - or with one of the other conditions .. Unfortunately, SIMD instruction sets have issues on exception trapping and NAN propagation. See Agner Fog's article at https://www.agner.org/optimize/#nan_propagation

Special options like DAZ (Denormals-Are-Zero) and FTZ (Flush-To-Zero) can be used, if the application won't care about very small denomalized numbers, see https://en.wikipedia.org/wiki/Subnormal_number

IPP library does also provide helper functions:

Fast cache-efficient matrix transposition

The topic is explained at wikipedia - with a specialized article for the in-place operation. Matrix transposition can also be utilized in the field of image processing. (De-)Interleaving is a different wording for the same operation, e.g. multi-channel audio data. See https://stackoverflow.com/questions/7780279/de-interleave-an-array-in-place

Transposing a matrix the simple way will produce many cache misses. That is, why special algorithms, like Cache-oblivious ones, are beneficial. There are numerous scientific papers on this topic, e.g. Cache-efficient matrix transposition. But the problem is also discussed on stackoverflow - happily with some code snippets.

In general, one should consider following aspects:

Here some libraries, which should be quite performance efficient, providing transpose functions:

github.com also produces many results, when searching for “transpose”.

Pavel Zemtsov wrote a bunch of related articles at Experiments in program optimisation, backed with sources at https://github.com/pzemtsov/article-e1-cache and https://github.com/pzemtsov/article-E1-demux-C:

other links:

there's also a new library: https://github.com/hayguen/libtranspose