User Tools

Site Tools


development:numeric_math

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

development:numeric_math [2025/10/30] – created ayguendevelopment:numeric_math [Unknown date] (current) – external edit (Unknown date) 127.0.0.1
Line 1: Line 1:
 +~~NOTRANS~~
 +{{indexmenu_n>300}}
 +
 +====== Numeric / Math / Linear Algebra ======
 +
 +===== Numerically Robust Variance and weighted sum =====
 +
 +  * John D. Cook's [[https://www.johndcook.com/blog/standard_deviation/|Accurately computing running variance]]
 +  * https://kunalbandekar.blogspot.com/2011/03/algorithms-for-calculating-variance.html
 +  * https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
 +    * [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm|Welford algorithm]]
 +    * [[https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Weighted_incremental_algorithm|West algorithm]]
 +  * https://en.wikipedia.org/wiki/Squared_deviations_from_the_mean
 +  * https://en.wikipedia.org/wiki/Kahan_summation_algorithm
 +    * https://www.ddj.com/floating-point-summation/184403224
 +    * https://rosettacode.org/wiki/Kahan_summation
 +    * https://stackoverflow.com/questions/57301596/optimal-implementation-of-iterative-kahan-summation
 +
 +
 +===== C/C++ Libraries =====
 +
 +  * C: Intel(R) Math Kernel Library (MKL)
 +    * https://en.wikipedia.org/wiki/Math_Kernel_Library
 +    * https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html
 +    * https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top.html
 +  * C: Intel(R) Integrated Performance Primitives (IPP)
 +    * https://en.wikipedia.org/wiki/Integrated_Performance_Primitives
 +    * https://www.intel.com/content/www/us/en/developer/tools/oneapi/ipp.html
 +    * https://www.intel.com/content/www/us/en/develop/documentation/ipp-dev-reference/top.html
 +  * C: Vector Optimized Library of Kernels (libVOLK)
 +    * https://www.libvolk.org/
 +    * currently GPLv3 license
 +    * future version 3 will be LGPL, see https://www.libvolk.org/help-us-relicense-volk-under-lgpl.html
 +  * C++ Eigen
 +    * https://eigen.tuxfamily.org/
 +    * https://eigen.tuxfamily.org/dox/index.html
 +    * pre-allocated arrays can be mapped to Eigen's Matrix/Vector types
 +      * https://eigen.tuxfamily.org/dox/classEigen_1_1Map.html
 +      * see https://eigen.tuxfamily.org/dox/group__matrixtypedefs.html for the native types
 +    * matrix self transpose/adjoint multiply optimization, e.g. for least squares
 +      * https://stackoverflow.com/questions/39606224/does-eigen-have-self-transpose-multiply-optimization-like-h-transposeh
 +      * https://stackoverflow.com/questions/42712307/efficient-matrix-transpose-matrix-multiplication-in-eigen
 +  * Boost Interval Arithmetic
 +    * https://www.boost.org/doc/libs/1_75_0/libs/numeric/interval/doc/interval.htm
 +  * C++ Armadillo
 +    * https://arma.sourceforge.net/
 +  * Fastor
 +    * https://github.com/romeric/Fastor
 +  * libxsmm
 +    * https://github.com/libxsmm/libxsmm
 +  * NumPy
 +    * [[https://numpy.org/|NumPy]] is python, but there are promising c++ libraries:
 +    * https://github.com/dpilger26/NumCpp see [[https://dpilger26.github.io/NumCpp/doxygen/html/index.html|documentation]]
 +    * https://github.com/xtensor-stack/xtensor see [[https://xtensor.readthedocs.io/en/latest/|documentation]]
 +      * optionally utilizes https://github.com/xtensor-stack/xsimd / https://xsimd.readthedocs.io/en/latest/
 +  * other libraries:
 +    * https://en.wikipedia.org/wiki/Comparison_of_linear_algebra_libraries
 +    * https://stackoverflow.com/questions/1380371/what-are-the-most-widely-used-c-vector-matrix-math-linear-algebra-libraries-a
 +
 +===== IEEE-754 Float Numbers =====
 +
 +  * https://en.wikipedia.org/wiki/IEEE_754
 +    * existence of denormalized numbers (aka subnormal numbers) should be known
 +    * existence of signed zeros might be interesting
 +    * existence of following non-numbers (Not-a-Number: NaN) should be known
 +      * quiet NaN (qNaN)
 +      * signaling NaN (sNaN)
 +      * positive and negative infinity (+/- inf)
 +    * rounding modes can be controlled
 +      * nearest, towards 0, towards + or -inf
 +      * be aware, that rotating a 2D-point or complex coordinate in a loop will go towards zero, due to error propagation with default rounding mode //'round towards zero'// !
 +    * exceptions can be handled, by setting up float-traps/signal handler
 +
 +==== Performance Issues with Denormals and NANs ====
 +
 +Calculation with denormals or non-numbers slows down performance - even when not signalled.\\
 +it might be interesting to abort a calculation, e.g. a matrix/vector multiplication, with first occurence of NaN - or with one of the other conditions .. Unfortunately, SIMD instruction sets have issues on exception trapping and NAN propagation. See Agner Fog's article at https://www.agner.org/optimize/#nan_propagation
 +
 +Special options like DAZ (Denormals-Are-Zero) and FTZ (Flush-To-Zero) can be used, if the application won't care about very small denomalized numbers, see https://en.wikipedia.org/wiki/Subnormal_number
 +
 +IPP library does also provide helper functions:
 +
 +  * https://www.intel.com/content/www/us/en/docs/ipp/developer-reference/2021-9/setdenormarezeros.html
 +  * https://www.intel.com/content/www/us/en/docs/ipp/developer-reference/2021-9/setflushtozero.html
 +
 +
 +===== Fast cache-efficient matrix transposition =====
 +
 +The topic is explained at [[https://en.wikipedia.org/wiki/Transpose|wikipedia]] - with a specialized [[https://en.wikipedia.org/wiki/In-place_matrix_transposition|article]] for the //in-place// operation. Matrix transposition can also be utilized in the field of image processing. (De-)Interleaving is a different wording for the same operation, e.g. multi-channel audio data. See https://stackoverflow.com/questions/7780279/de-interleave-an-array-in-place
 +
 +Transposing a matrix the //simple// way will produce many cache misses. That is, why special algorithms, like [[https://en.wikipedia.org/wiki/Cache-oblivious_algorithm|Cache-oblivious]] ones, are beneficial. There are numerous scientific papers on this topic, e.g. [[https://www.researchgate.net/publication/3838543_Cache-efficient_matrix_transposition|Cache-efficient matrix transposition]]. But the problem is also discussed on [[https://stackoverflow.com/questions/5200338/a-cache-efficient-matrix-transpose-program|stackoverflow]] - happily with some code snippets.
 +
 +In general, one should consider following aspects:
 +  * rectangular or square matrix
 +  * transpose only from/to rectangular regions of bigger matrices or images
 +  * in-place or out-of-place operation
 +  * combination with conjugation
 +
 +Here some libraries, which should be quite performance efficient, providing transpose functions:
 +  * [[https://www.intel.com/content/www/us/en/develop/documentation/ipp-dev-reference/top/volume-2-image-processing/image-data-exchange-and-initialization-functions/transpose.html|Intel IPP: ippiTranspose_*()]]
 +  * [[https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/blas-like-extensions/mkl-omatcopy.html|Intel MKL: out-of-place mkl_*omatcopy()]]
 +    * unfortunately no smaller data types than float
 +    * [[https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/blas-like-extensions/mkl-imatcopy-batch.html|Intel MKL: in-place mkl_*imatcopy()]]
 +  * [[https://eigen.tuxfamily.org/dox/group__TutorialMatrixArithmetic.html|Eigen: .transpose()]]
 +    * returns new created/transposed matrix
 +    * pre-allocated arrays can be mapped to Eigen's Matrix/Vector types
 +      * https://eigen.tuxfamily.org/dox/classEigen_1_1Map.html
 +      * see https://eigen.tuxfamily.org/dox/group__matrixtypedefs.html for the native types
 +  * [[https://arma.sourceforge.net/docs.html#trans|Armadillo: trans()]] - also with slower [[https://arma.sourceforge.net/docs.html#inplace_trans|in-place variant]]
 +    * returns new created/transposed matrix
 +    * [[https://arma.sourceforge.net/docs.html#Mat|advanced constructor]] with ''copy_aux_mem = false'' for using external data directly
 +
 +[[https://github.com/|github.com]] also produces many results, when searching for "//transpose//".
 +
 +
 +Pavel Zemtsov wrote a bunch of related articles at [[https://pzemtsov.github.io/|Experiments in program optimisation]], backed with sources at https://github.com/pzemtsov/article-e1-cache and https://github.com/pzemtsov/article-E1-demux-C:
 +  * [[https://pzemtsov.github.io/2014/05/02/demultiplexing-of-e1-converting-to-C.html|De-multiplexing of E1 stream: converting to C]]
 +  * [[https://pzemtsov.github.io/2014/06/04/fast-e1-de-multiplexing-in-C-using-sse-avx.html|Fast E1 de-multiplexing in C using SSE/AVX]]
 +  * [[https://pzemtsov.github.io/2014/10/01/how-to-transpose-a-16x16-matrix.html|How to transpose a 16×16 byte matrix using SSE]]
 +  * [[https://pzemtsov.github.io/2014/10/21/bug-story-2-unrolling-16x16-transposition.html|Bug story 2: Unrolling the 16×16 matrix transposition, or be careful with macros]]
 +  * [[https://pzemtsov.github.io/2014/11/24/how-cache-affects-demultiplexing-of-e1.html|How cache affects demultiplexing of E1]]
 +
 +other links:
 +  * https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/opencl/transpose.cl
 +  * https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/arm/neon.S
 +  * https://www.cs.technion.ac.il/~itai/Courses/Cache/matrix-transposition.pdf
 +  * https://dl.acm.org/doi/10.1145/3555353
 +  * https://github.com/romeric/Fastor/blob/master/Fastor/backend/transpose/transpose_kernels.h
 +  * https://groups.google.com/g/llvm-dev/c/P_CXf5hPro8/m/RI6Fm3bzAQAJ
 +
 +there's also a new library: https://github.com/hayguen/libtranspose
 +
 +
 +===== Links =====
 +
 +  * Numerical Computation Guide
 +    * https://docs.oracle.com/cd/E19957-01/806-3568/ncgTOC.html
 +  * Several Blog Entries of Bruce Dawson
 +    * https://randomascii.wordpress.com/category/floating-point/
 +  * C99 and C++11 Floating-point environment reference
 +    * https://en.cppreference.com/w/c/numeric/fenv
 +    * https://en.cppreference.com/w/cpp/numeric/fenv