| |
| development:numeric_math [2025/10/30] – created ayguen | development:numeric_math [Unknown date] (current) – external edit (Unknown date) 127.0.0.1 |
|---|
| | ~~NOTRANS~~ |
| | {{indexmenu_n>300}} |
| | |
| | ====== Numeric / Math / Linear Algebra ====== |
| | |
| | ===== Numerically Robust Variance and weighted sum ===== |
| | |
| | * John D. Cook's [[https://www.johndcook.com/blog/standard_deviation/|Accurately computing running variance]] |
| | * https://kunalbandekar.blogspot.com/2011/03/algorithms-for-calculating-variance.html |
| | * https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance |
| | * [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm|Welford algorithm]] |
| | * [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Weighted_incremental_algorithm|West algorithm]] |
| | * https://en.wikipedia.org/wiki/Squared_deviations_from_the_mean |
| | * https://en.wikipedia.org/wiki/Kahan_summation_algorithm |
| | * https://www.ddj.com/floating-point-summation/184403224 |
| | * https://rosettacode.org/wiki/Kahan_summation |
| | * https://stackoverflow.com/questions/57301596/optimal-implementation-of-iterative-kahan-summation |
| | |
| | |
| | ===== C/C++ Libraries ===== |
| | |
| | * C: Intel(R) Math Kernel Library (MKL) |
| | * https://en.wikipedia.org/wiki/Math_Kernel_Library |
| | * https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html |
| | * https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top.html |
| | * C: Intel(R) Integrated Performance Primitives (IPP) |
| | * https://en.wikipedia.org/wiki/Integrated_Performance_Primitives |
| | * https://www.intel.com/content/www/us/en/developer/tools/oneapi/ipp.html |
| | * https://www.intel.com/content/www/us/en/develop/documentation/ipp-dev-reference/top.html |
| | * C: Vector Optimized Library of Kernels (libVOLK) |
| | * https://www.libvolk.org/ |
| | * currently GPLv3 license |
| | * future version 3 will be LGPL, see https://www.libvolk.org/help-us-relicense-volk-under-lgpl.html |
| | * C++ Eigen |
| | * https://eigen.tuxfamily.org/ |
| | * https://eigen.tuxfamily.org/dox/index.html |
| | * pre-allocated arrays can be mapped to Eigen's Matrix/Vector types |
| | * https://eigen.tuxfamily.org/dox/classEigen_1_1Map.html |
| | * see https://eigen.tuxfamily.org/dox/group__matrixtypedefs.html for the native types |
| | * matrix self transpose/adjoint multiply optimization, e.g. for least squares |
| | * https://stackoverflow.com/questions/39606224/does-eigen-have-self-transpose-multiply-optimization-like-h-transposeh |
| | * https://stackoverflow.com/questions/42712307/efficient-matrix-transpose-matrix-multiplication-in-eigen |
| | * Boost Interval Arithmetic |
| | * https://www.boost.org/doc/libs/1_75_0/libs/numeric/interval/doc/interval.htm |
| | * C++ Armadillo |
| | * https://arma.sourceforge.net/ |
| | * Fastor |
| | * https://github.com/romeric/Fastor |
| | * libxsmm |
| | * https://github.com/libxsmm/libxsmm |
| | * NumPy |
| | * [[https://numpy.org/|NumPy]] is python, but there are promising c++ libraries: |
| | * https://github.com/dpilger26/NumCpp see [[https://dpilger26.github.io/NumCpp/doxygen/html/index.html|documentation]] |
| | * https://github.com/xtensor-stack/xtensor see [[https://xtensor.readthedocs.io/en/latest/|documentation]] |
| | * optionally utilizes https://github.com/xtensor-stack/xsimd / https://xsimd.readthedocs.io/en/latest/ |
| | * other libraries: |
| | * https://en.wikipedia.org/wiki/Comparison_of_linear_algebra_libraries |
| | * https://stackoverflow.com/questions/1380371/what-are-the-most-widely-used-c-vector-matrix-math-linear-algebra-libraries-a |
| | |
| | ===== IEEE-754 Float Numbers ===== |
| | |
| | * https://en.wikipedia.org/wiki/IEEE_754 |
| | * existence of denormalized numbers (aka subnormal numbers) should be known |
| | * existence of signed zeros might be interesting |
| | * existence of following non-numbers (Not-a-Number: NaN) should be known |
| | * quiet NaN (qNaN) |
| | * signaling NaN (sNaN) |
| | * positive and negative infinity (+/- inf) |
| | * rounding modes can be controlled |
| | * nearest, towards 0, towards + or -inf |
| | * be aware, that rotating a 2D-point or complex coordinate in a loop will go towards zero, due to error propagation with default rounding mode //'round towards zero'// ! |
| | * exceptions can be handled, by setting up float-traps/signal handler |
| | |
| | ==== Performance Issues with Denormals and NANs ==== |
| | |
| | Calculation with denormals or non-numbers slows down performance - even when not signalled.\\ |
| | it might be interesting to abort a calculation, e.g. a matrix/vector multiplication, with first occurence of NaN - or with one of the other conditions .. Unfortunately, SIMD instruction sets have issues on exception trapping and NAN propagation. See Agner Fog's article at https://www.agner.org/optimize/#nan_propagation |
| | |
| | Special options like DAZ (Denormals-Are-Zero) and FTZ (Flush-To-Zero) can be used, if the application won't care about very small denomalized numbers, see https://en.wikipedia.org/wiki/Subnormal_number |
| | |
| | IPP library does also provide helper functions: |
| | |
| | * https://www.intel.com/content/www/us/en/docs/ipp/developer-reference/2021-9/setdenormarezeros.html |
| | * https://www.intel.com/content/www/us/en/docs/ipp/developer-reference/2021-9/setflushtozero.html |
| | |
| | |
| | ===== Fast cache-efficient matrix transposition ===== |
| | |
| | The topic is explained at [[https://en.wikipedia.org/wiki/Transpose|wikipedia]] - with a specialized [[https://en.wikipedia.org/wiki/In-place_matrix_transposition|article]] for the //in-place// operation. Matrix transposition can also be utilized in the field of image processing. (De-)Interleaving is a different wording for the same operation, e.g. multi-channel audio data. See https://stackoverflow.com/questions/7780279/de-interleave-an-array-in-place |
| | |
| | Transposing a matrix the //simple// way will produce many cache misses. That is, why special algorithms, like [[https://en.wikipedia.org/wiki/Cache-oblivious_algorithm|Cache-oblivious]] ones, are beneficial. There are numerous scientific papers on this topic, e.g. [[https://www.researchgate.net/publication/3838543_Cache-efficient_matrix_transposition|Cache-efficient matrix transposition]]. But the problem is also discussed on [[https://stackoverflow.com/questions/5200338/a-cache-efficient-matrix-transpose-program|stackoverflow]] - happily with some code snippets. |
| | |
| | In general, one should consider following aspects: |
| | * rectangular or square matrix |
| | * transpose only from/to rectangular regions of bigger matrices or images |
| | * in-place or out-of-place operation |
| | * combination with conjugation |
| | |
| | Here some libraries, which should be quite performance efficient, providing transpose functions: |
| | * [[https://www.intel.com/content/www/us/en/develop/documentation/ipp-dev-reference/top/volume-2-image-processing/image-data-exchange-and-initialization-functions/transpose.html|Intel IPP: ippiTranspose_*()]] |
| | * [[https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/blas-like-extensions/mkl-omatcopy.html|Intel MKL: out-of-place mkl_*omatcopy()]] |
| | * unfortunately no smaller data types than float |
| | * [[https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/blas-like-extensions/mkl-imatcopy-batch.html|Intel MKL: in-place mkl_*imatcopy()]] |
| | * [[https://eigen.tuxfamily.org/dox/group__TutorialMatrixArithmetic.html|Eigen: .transpose()]] |
| | * returns new created/transposed matrix |
| | * pre-allocated arrays can be mapped to Eigen's Matrix/Vector types |
| | * https://eigen.tuxfamily.org/dox/classEigen_1_1Map.html |
| | * see https://eigen.tuxfamily.org/dox/group__matrixtypedefs.html for the native types |
| | * [[https://arma.sourceforge.net/docs.html#trans|Armadillo: trans()]] - also with slower [[https://arma.sourceforge.net/docs.html#inplace_trans|in-place variant]] |
| | * returns new created/transposed matrix |
| | * [[https://arma.sourceforge.net/docs.html#Mat|advanced constructor]] with ''copy_aux_mem = false'' for using external data directly |
| | |
| | [[https://github.com/|github.com]] also produces many results, when searching for "//transpose//". |
| | |
| | |
| | Pavel Zemtsov wrote a bunch of related articles at [[https://pzemtsov.github.io/|Experiments in program optimisation]], backed with sources at https://github.com/pzemtsov/article-e1-cache and https://github.com/pzemtsov/article-E1-demux-C: |
| | * [[https://pzemtsov.github.io/2014/05/02/demultiplexing-of-e1-converting-to-C.html|De-multiplexing of E1 stream: converting to C]] |
| | * [[https://pzemtsov.github.io/2014/06/04/fast-e1-de-multiplexing-in-C-using-sse-avx.html|Fast E1 de-multiplexing in C using SSE/AVX]] |
| | * [[https://pzemtsov.github.io/2014/10/01/how-to-transpose-a-16x16-matrix.html|How to transpose a 16×16 byte matrix using SSE]] |
| | * [[https://pzemtsov.github.io/2014/10/21/bug-story-2-unrolling-16x16-transposition.html|Bug story 2: Unrolling the 16×16 matrix transposition, or be careful with macros]] |
| | * [[https://pzemtsov.github.io/2014/11/24/how-cache-affects-demultiplexing-of-e1.html|How cache affects demultiplexing of E1]] |
| | |
| | other links: |
| | * https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/opencl/transpose.cl |
| | * https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/arm/neon.S |
| | * https://www.cs.technion.ac.il/~itai/Courses/Cache/matrix-transposition.pdf |
| | * https://dl.acm.org/doi/10.1145/3555353 |
| | * https://github.com/romeric/Fastor/blob/master/Fastor/backend/transpose/transpose_kernels.h |
| | * https://groups.google.com/g/llvm-dev/c/P_CXf5hPro8/m/RI6Fm3bzAQAJ |
| | |
| | there's also a new library: https://github.com/hayguen/libtranspose |
| | |
| | |
| | ===== Links ===== |
| | |
| | * Numerical Computation Guide |
| | * https://docs.oracle.com/cd/E19957-01/806-3568/ncgTOC.html |
| | * Several Blog Entries of Bruce Dawson |
| | * https://randomascii.wordpress.com/category/floating-point/ |
| | * C99 and C++11 Floating-point environment reference |
| | * https://en.cppreference.com/w/c/numeric/fenv |
| | * https://en.cppreference.com/w/cpp/numeric/fenv |
| |