Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nequalities Takagi Factorization on a GPU using CUDA Gagandeep S. Sachdev, Vishay Vanjani & Mary W. Hall School of Computing, University of Utah What is.

Similar presentations


Presentation on theme: "Nequalities Takagi Factorization on a GPU using CUDA Gagandeep S. Sachdev, Vishay Vanjani & Mary W. Hall School of Computing, University of Utah What is."— Presentation transcript:

1 nequalities Takagi Factorization on a GPU using CUDA Gagandeep S. Sachdev, Vishay Vanjani & Mary W. Hall School of Computing, University of Utah What is Takagi Factorization? Special form of singular value decomposition (SVD) Applicable to complex symmetric matrices. Mathematically: A = UsU T U = left singular matrix s = Non-Negative Diagonal matrix (Eigen values). U T = Transpose of V matrix Applications: Grunsky inequalities Computation of the near-best uniform polynomial Rational approximation of a high degree polynomial on a disk Complex independent component analysis problems Complex coordinate rotation method Optical potential method Nuclear magnetic resonance Diagonalization of mass matrices of Majorana fermions SVD is defined as A = UsV*. Left and right singular vectors are different matrices Computing Takagi vectors is not trivial Takagi factorization takes advantage of symmetry Less computational complexity Less storage requirements How is it different from SVD? Jacobi Method For a given pair of p and q, form a 2x2 symmetric matrix as shown: Diagonalize this matrix such that: Apply transformation to p,q rows and columns. Apply such unitary transform to all possible pairs N C 2 = N(N-1)/2 combinations. This finishes one sweep. Keep performing sweeps until the matrix converges. Convergence: sum of absolute values of off-diagonal elements. Quadratic in nature Complexity: N^3 p th colq th col p th row q th row ac bc J JTJT 0 0 0 0 & cos sin cos - sin J Other Algorithms Divide and Conquer Algorithm Twisted Factorization method Both involve tridiagonalization followed by diagonalization Complexity: N^2 Problems : Not as parallelizable as Jacobi Jacobi method is more stable Example: ill-conditioned matrices Related work Fast SVD on 7900 GTK A generation before CUDA enabled GPUs (2005) Recently, SVD implemented on GPU using CUDA Implemented only for real matrices Bidiagonalization by Householder transforms Diagonalization by implicitly shifted QR algorithm Complexity: m*n^2 High storage space Matlab and serial C implementations available Parallel Jacobi Algorithm N/2 parallel unitary transforms possible. Order does not alter the final result. Parallel threads cannot share a common index This is done (N-1) times for one sweep All row updates need to finish before column updates To prevent race conditions Column entries will be part of rows of other thread’s row Computation now done on full symmetric matrix Need a mechanism to generate unique N/2 pairs for (N-1) times Cyclic chess tournament scheduling GPU Execution Copy complex Matrix A from host to GPU Kernel : Initialize auxiliary data structures – real diagonal matrix D and complex Takagi vector matrix U Kernel : Calculate threshold Convergence reached Kernel : Diagonalize and row updates --------------Host synchronization-------- Kernel : Column and Unitary matrix updates --------------Host synchronization-------- Kernel : make diagonal elements non-negative and sort Yes No N-1 times ? No Yes Copy U and D from GPU Results Conclusion 0909 1818 2727 3636 4545 Tid=0Tid=1Tid=2 Tid=3 Tid=4 Iteration: 0 1919 2020 3838 4747 5656 Tid=0Tid=1Tid=2 Tid=3 Tid=4 Iteration: 1 Pthreads implementation reach speedup of 1.64x over serial CPU version. NVIDIA GTX 260 reaches a speedup of 11.69x. NVIDIA GTX 480 (Fermi) reaches a speedup of 30.80x. Speedup of GPUs and Pthreads for different matrix sizes Jacobi method is the most stable and accurate parallel algorithm. Good speedups achieved over serial and Pthreads implementation. Nvidia Fermi architecture gave much better speedups. Non-coalesced reads are cached. Much higher FLOP performance.


Download ppt "Nequalities Takagi Factorization on a GPU using CUDA Gagandeep S. Sachdev, Vishay Vanjani & Mary W. Hall School of Computing, University of Utah What is."

Similar presentations


Ads by Google