Reproducibility in Matrix and Tensor Decompositions: Focus on model match, interpretability, and uniqueness

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Reproducibility in Matrix and Tensor Decompositions: Focus on model match, interpretability, and uniqueness

By: 
Tülay Adali; Furkan Kantar; Mohammad Abu Baker Siddique Akhonda; Stephen Strother; Vince D. Calhoun; Evrim Acar

Data-driven solutions are playing an increasingly important role in numerous practical problems across multiple disciplines. The shift from the traditional model-driven approaches to those that are data driven naturally emphasizes the importance of the explainability of solutions, as, in this case, the connection to a physical model is often not obvious. Explainability is a broad umbrella and includes interpretability, but it also implies that the solutions need to be complete, in that one should be able to “audit” them, ask appropriate questions, and hence gain further insight about their inner workings [1]. Thus, interpretability, reproducibility, and, ultimately, our ability to generalize these solutions to unseen scenarios and situations are all strongly tied to the starting point of explainability.

Model-based solutions, through their natural connection to the physical model, are fully interpretable. As data-driven solutions, matrix and tensor decompositions (MTDs) provide an attractive middle ground. While allowing for the discovery of structure in the data in an unsupervised manner, they can also result in fully interpretable solutions, by which we mean we can associate the rows/columns of the final factor matrices with (physical) quantities of interest. In other data-driven solutions, such as multilayered neural networks, interpretability takes an indirect form and generally requires a second-level analysis, e.g., the generation of heat maps in multilayered neural networks [2]. In MTDs, interpretability is direct due to their intimate connection to the linear blind source separation problem, where the assumption is that there is a number of linearly mixed latent variables of interest. This assumption has proved useful in an array of applications, and MTDs are being adopted across multiple domains, such as medical image analysis and fusion, health care, remote sensing, chemometrics, metabolomics, recommender systems, natural language processing, and physical sciences [3][4][5][6][7][8][9][10]. Note that we use matrices for data arranged as two-way arrays and tensors for N-way arrays, where N > 2.

While this rapidly growing interest in the development and use of factorization methods is very encouraging, a serious concern is the lack of formalizations, such as a “reproducibility checklist,” which have been developed for supervised learning; see, e.g., [11]. The reproducibility basics in this checklist that relate to the description and specification of models, algorithms, data sets, and code are common in all machine learning approaches, including MTD. However, the last group of items in the checklist in [11], about reported experimental results, i.e., computational reproducibility, has not been emphasized for unsupervised learning and for MTD. While many success stories are reported using the factorization of a given set of observations across application areas, e.g., noting the identification of putative biomarkers of disease in the analysis of neuroimaging data, in many instances, results are reported without much consideration to their computational reproducibility.

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel