Presentation is loading. Please wait.

Presentation is loading. Please wait.

SPARSE TENSORS DECOMPOSITION SOFTWARE

Similar presentations


Presentation on theme: "SPARSE TENSORS DECOMPOSITION SOFTWARE"— Presentation transcript:

1 SPARSE TENSORS DECOMPOSITION SOFTWARE
Papa S. Diaw, Master’s Candidate Dr. Michael W. Berry, Major Professor 12/2/2018

2 Introduction Large data sets Nonnegative Matrix Factorization (NMF)
Insights on the hidden relationships Arrange multi-way data into a matrix Computation memory and higher CPU Linear relationships in the matrix representation Failure to capture important structure information Slower or less accurate calculations Large data sets ==> telecom records, facebook, Myspace NMF did a good job at first. It was the main tool 12/2/2018

3 Introduction (cont'd) Nonnegative Tensor Factorizations (NTF)
Natural way for high dimensionality Original multi-way structure of the data Image processing, text mining 12/2/2018

4 Tensor Toolbox For MATLAB
Sandia National Laboratories Licenses Proprietary Software 12/2/2018

5 Motivation of the PILOT
Python Software for NTF Alternative to Tensor Toolbox for MATLAB Incorporation into FutureLens Exposure to NTF Interest in the open source community 12/2/2018

6 Tensors Multi-way array Order/Mode/Ways High-order Fiber Slice
Unfolding Matricization or flattening Reordering the elements of an N-th order tensor into a matrix. Not unique 12/2/2018

7 Tensors (cont’d) Kronecker Product Khatri-Rao product
A⊙B=[a1⊗b1 a2⊗b2… aJ⊗bJ] 12/2/2018

8 Rewrite a given tensor as a finite sum of lower-rank tensors.
Tensor Factorization Hitchcock in 1927 and later developed by Cattell in and Tucker in 1966 Rewrite a given tensor as a finite sum of lower-rank tensors. Tucker and PARAFAC Rank Approximation is a problem 12/2/2018

9 PARAFAC Parallel Factor Analysis Canonical Decomposition (CANDE-COMPE)
Harsman,Carroll and Chang, 1970 12/2/2018

10 PARAFAC (cont’d) Given a three-way tensor X and an approximation rank R, we define the factor matrices as the combination of the vectors from the rank-one components. 12/2/2018

11 PARAFAC (cont’d) 12/2/2018

12 PARAFAC (cont’d) Alternating Least Square (ALS)
We cycle “over all the factor matrices and performs a least-square update for one factor matrix while holding all the others constant.”[7] NTF can be considered an extension of the PARAFAC model with the constraint of nonnegativity 12/2/2018

13 Python Object-oriented, Interpreted Runs on all systems
Flat learning curve Supports object methods (everything is an object in Python) 12/2/2018

14 Python (cont’d) Recent interest in the scientific community
Several scientific computing packages Numpy Scipy Python is extensible 12/2/2018

15 Data Structures Dictionary Store the tensor data
Mutable type of container that can store any number of Python objects Pairs of keys and their corresponding values Suitable for sparseness of our tensors VAST 2007 contest data 1,385,205,184 elements, with 1,184,139 nz Stores the nonzero elements and keeps track of the zeros by using the default value of the dictionary 12/2/2018

16 Data Structures (cont’d)
Numpy Arrays Fundamental package for scientific computing in Python Khatri-Rao products or tensors multiplications Speed 12/2/2018

17 Modules 12/2/2018

18 Modules (cont’d) SPTENSOR Most important module
Class (subscripts of nz, values) Flexibility (Numpy Arrays, Numpy Matrix, Python Lists) Dictionary Keeps a few instances variables Size Number of dimensions Frobenius norm (Euclidean Norm) 12/2/2018

19 Modules (cont’d) PARAFAC coordinates the NTF Implementation of ALS
Convergence or the maximum number of iterations Factor matrices are turned into a Kruskal Tensor 12/2/2018

20 Modules (cont’d) 12/2/2018

21 Modules (cont’d) 12/2/2018

22 Modules (cont’d) TTV INNERPROD
Inner product between SPTENSOR and KTENSOR PARAFAC to compute the residual norm Kronecker product for matrices TTV Product sparse tensor with a (column) vector Returns a tensor Workhorse of our software package Most computation It is called by the MTTKRP and INNERPROD modules 12/2/2018

23 Modules (cont’d) MTTKRP Ktensor
Khatri-Rao product off all factor matrices except the one being updated Matrix multiplication of the matricized tensor with KR product obtained above Ktensor Kruskal tensor Object returned after the factorization is done and the factor matrices are normalized. Class Instance variables such as the Norm. Norm of ktensor plays a big part in determining the residual norm in the PARAFAC module. 12/2/2018

24 Performance Python Profiler Run time performance
Tool for detecting bottlenecks Code optimization negligible improvement efficiency loss in some modules 12/2/2018

25 Performance (cnt’d) Lists and Recursions 12/2/2018

26 Performance (cnt’d) Numpy Arrays 12/2/2018
Return_unique ==> traverse the arguments given to sparse tensor to identify duplicates data points and add up their values 12/2/2018

27 After removing Recursions
Performance (cnt’d) After removing Recursions Myaccumarray is a poor man’s representation of the matlab accumarray. Only function that I am not happy about performance wise. 12/2/2018

28 Floating-Point Arithmetic
Binary floating-point “Binary floating-point cannot exactly represent decimal fractions, so if binary floating-point is used it is not possible to guarantee that results will be the same as those using decimal arithmetic.”[12] Makes the iterations volatile The value 0.1, for example, would need an infinitely recurring binary fraction. In contrast, a decimal number system can represent 0.1 exactly, as one tenth (that is, 10-1). Consequently, binary floating-point cannot be used for financial calculations 12/2/2018

29 Convergence Issues 12/2/2018

30 Convergence Issues (ctn’d)
12/2/2018

31 Convergence Issues (cont’d)
12/2/2018

32 Conclusion GUI There is still work to do after NTF
Preprocessing of data Post Processing of results such as FutureLens Expertise Extract and Identify hidden components Tucker Implementation. C extension to increase speed. GUI 12/2/2018

33 Tensor Toolbox For MATLAB (Bader and Kolda)
Acknowledgments Mr. Andrey Puretskiy Discussions at all stages of the PILOT Consultancy in text mining Testing Tensor Toolbox For MATLAB (Bader and Kolda) Understanding of tensor Decomposition PARAFAC 12/2/2018

34 References http://csmr.ca.sandia.gov/~tgkolda/TensorToolbox/
Tamara G. Kolda, Brett W. Bader , “Tensor Decompostions and Applications”, SIAM Review , June 10, 2008. Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, Shun-ichi Amari, “Nonnegative Matrix and Tensor Factorizations”, John Wiley & Sons, Ltd, 1009. Brett W. Bader, Andrey A. Puretskiy, Michael W. Berry, “Scenario Discovery Using Nonnegative Tensor Factorization”, J. Ruiz-Schulcloper and W.G. Kropatsch (Eds.): CIARP 2008, LNCS 5197, pp , 2008 Tamara G. Kolda, “Multilinear operators for higher-order decompositions”, SANDIA REPORT, April 2006 12/2/2018

35 QUESTIONS? 12/2/2018


Download ppt "SPARSE TENSORS DECOMPOSITION SOFTWARE"

Similar presentations


Ads by Google