Explainability in Graph Data Science: Interpretability, replicability, and reproducibility of community detection

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Explainability in Graph Data Science: Interpretability, replicability, and reproducibility of community detection

By: 
Selin Aviyente; Abdullah Karaaslanli

In many modern data science problems, data are represented by a graph (network), e.g., social, biological, and communication networks. Over the past decade, numerous signal processing and machine learning (ML) algorithms have been introduced for analyzing graph structured data. With the growth of interest in graphs and graph-based learning tasks in a variety of applications, there is a need to explore explainability in graph data science. In this article, we aim to approach the issue of explainable graph data science, focusing on one of the most fundamental learning tasks, community detection, as it is usually the first step in extracting information from graphs. A community is a dense subnetwork within a larger network that corresponds to a specific function. Despite the success of different community detection methods on synthetic networks with strong modular structure, much remains unknown about the quality and significance of the outputs of these algorithms when applied to real-world networks with unknown modular structure. Inspired by recent advances in explainable artificial intelligence (AI) and ML, in this article, we present methods and metrics from network science to quantify three different aspects of explainability, i.e., interpretability, replicability, and reproducibility, in the context of community detection.

Introduction

Modern data analysis involves large sets of structured data, where the structure carries critical information about the nature of the data. Typically, graphs are used as mathematical tools to describe the structure of such data. Graphs are ubiquitous in the real world, representing objects and their relationships in varied domains, including social networks, e-commerce networks, biological networks, traffic networks, and brain networks [1]. As a result, numerous signal processing and ML tasks have been extended for analyzing graph structured data, e.g., graph signal processing (GSP), graph topology inference, node classification, link prediction, community detection, and supervised learning with graphs [2]. Among these tasks, community detection is fundamental for uncovering links between structure and function in complex networks. The community detection problem is challenging, in part, because it is not very well posed. For this reason, researchers have proposed a variety of definitions of what constitutes a community and an array of algorithms corresponding to these definitions [3]. While the success of these algorithms has been quantified for synthetic networks with ground truth information, it is harder to evaluate the accuracy, significance, and meaning of the obtained community structure for real networks. For these results to be useful in a variety of scientific and technological studies, there is a need to provide transparency to the community detection algorithms and their outputs.

Over the past decade, the explainability of data-driven methods, e.g., AI and ML, has been a focus of research in the ML and data mining communities. While the ML community is mostly focused on describing how black boxes work, data mining is more interested in explaining the decisions, without even understanding how the opaque decision systems work. Recent survey articles on the topic offer a multitude of terminologies, such as interpretabilityaccountabilityresponsibilitytransparencycomprehensibilityaccuracy, and understandability, to evaluate different dimensions of explainability [4][5]. Along with these different terminologies, a variety of methods, including black-box input–output analysis, sensitivity analysis, saliency maps, attention heat maps, and approximation of the predictions using simple proxy models, have been introduced [4].

SPS on Twitter

  • DEADLINE EXTENDED: The 2023 IEEE International Workshop on Machine Learning for Signal Processing is now accepting… https://t.co/NLH2u19a3y
  • ONE MONTH OUT! We are celebrating the inaugural SPS Day on 2 June, honoring the date the Society was established in… https://t.co/V6Z3wKGK1O
  • The new SPS Scholarship Program welcomes applications from students interested in pursuing signal processing educat… https://t.co/0aYPMDSWDj
  • CALL FOR PAPERS: The IEEE Journal of Selected Topics in Signal Processing is now seeking submissions for a Special… https://t.co/NPCGrSjQbh
  • Test your knowledge of signal processing history with our April trivia! Our 75th anniversary celebration continues:… https://t.co/4xal7voFER

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel