1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.
Contributed by Fernando Gama, Antonio G. Marques, Geert Leus and Alejandro Ribeiro and based on the original article, Convolutional Neural Network Architectures for Signals Supported on Graphs, published in the IEEE Transactions on Signal Processing vol. 67, Issue 4, 2019 and the SPS webinar, Graph Neural Networks, available on the SPS Resource Center.
Filtering is the fundamental operation upon which the field of signal processing is built. Loosely speaking, filtering is a mapping between signals, typically used to extract useful information (output signal) from data (input signal). Arguably, the most popular type of filter is the linear and shift-invariant (i.e. independent of the starting point of the signal) filter, which can be computed efficiently by leveraging the convolution operation. A convolution outputs a signal obtained from calculating a linear combination (filter coefficients) of nearby values of an input signal. The operation of convolution relies heavily on the implicit assumption that the relationship between nearby values of the input signal carries useful information. This is the case for time signals (nearby values of the signal correspond to consecutive time instants) and for images (nearby values of the signal correspond to contiguous pixels).
Many modern systems, however, generate data that does not follow the assumption that nearby values are related. A paradigmatic example is that of network systems, where the values of the signals are related by the connectivity structure of the network: if two elements of the system are connected, then it is expected that their corresponding signal values will be related. Networks are often described by a mathematical object known as a graph (which consists of two sets: one containing the nodes describing the elements of the system, and one containing the edges describing their interconnections). Therefore, the signals generated by network systems are known as graph signals [1].
Due to the arbitrary relationship between values in a graph signal (determined by the network structure), simply using the convolution operation to process these signals will not lead to a reasonable output (since contiguous entries are not necessarily related). Therefore, to satisfactorily process graph signals, we need to extend the convolution to leverage the underlying graph structure of the data. More specifically, we can define the graph convolution as a linear combination of signal values in neighboring nodes. In other words, the output of a graph convolution at any given node is computed by receiving the signal values of neighboring nodes, aggregating them, and weighing the result by the filter coefficient. Repeating this operation allows a node to reach the information that is located farther away in the graph. Interestingly enough, the graph convolution boils down to the traditional convolution operation when considering graphs that describe discrete-time signals. A filter that processes graph signals and can be computed using a graph convolution is known as a graph convolutional filter. There exist many other types of graph filters [2] generally defined as mappings between graph signals that exploit the underlying network structure, whose output cannot be computed using a graph convolution.
Designing [3] an appropriate graph filter often requires additional information about the input signal, even beyond the knowledge that its structure is described by a given underlying graph support. In these cases, provided that several instances of the signals to process are available, we can instead use optimization techniques to find the best filters for a given task and a given dataset. This is typically known as an empirical risk minimization problem. Additionally, we must take into consideration that the convolutional filter is a linear operation and, oftentimes, we want to capture nonlinear relationships between input and output.
To capture nonlinear relationships, we can simply include a nonlinear function after the graph convolutional filter has been applied. This nonlinear function is typically a pointwise function (acting independently on each node) and is known as an activation function. The block consisting of a graph convolutional filter followed by a pointwise nonlinear function is known as a graph perceptron [4]. To further increase the capability of this structure to capture a wider range of nonlinear relationships between input and output, we can cascade several of these blocks to obtain a graph neural network (GNN) [5]. Thus, a GNN is a concatenation of graph convolutional filters followed by a pointwise nonlinearity, and where each graph perceptron block is called a layer. Given a dataset, we can obtain the best filter coefficients for each layer by leveraging optimization techniques such as stochastic gradient descent to solve the empirical risk minimization. The gradient can be computed efficiently throughout the layers using the backpropagation algorithm.
There are several extensions of GNNs that can be easily obtained by leveraging signal processing concepts. For example, by using a bank of graph filters [6] at each layer, we can process high-dimensional graph signals where the value at each node is described by a vector. Another example is to go beyond graph convolutions and replace the filter at each layer with more complex (but more descriptive) non-convolutional graph filters [7]. Different implementations of graph convolutions (analogous to FIR, IIR, or ARMA filters) can also help solve the optimization problem more efficiently. Finally, we note that by leveraging nonlinear graph filters [8] (such as median filters) as activation functions (instead of pointwise nonlinearities) we can also increase the capability of GNNs. The concepts of spectral domain and frequency response can be used to further derive properties of GNNs and understand some aspects of their observed superior performance. For example, we can prove that, with adequate frequency responses of the filters involved, GNNs are stable [9] to changes in the underlying graph topology (i.e. if we run the same GNN on two similar graphs, the output will be similar). We can also observe that the effect of the nonlinearities is to demodulate the signal into lower frequencies where it can be stably processed.
GNNs have been successfully applied to a myriad of problems, ranging from abstract networks (where the graphs are inferred from data) to physical networks (where the graphs are given by the physical connections between the nodes). GNNs have been demonstrably useful in learning distributed behaviors in multi-agent systems [10], in approximately solving optimal power flow [11] in smart grids, and in efficient resource allocation [12] in wireless networks. The main advantage of GNNs is the distributed nature of the operations involved, which allows them to scale to larger networks seamlessly.
[1] A. Sandryhaila and J. M. F. Moura, “Discrete Signal Processing on Graphs,” in IEEE Transactions on Signal Processing, vol. 61, no. 7, pp. 1644-1656, April 1, 2013, doi: 10.1109/TSP.2013.2238935.
[2] M. Coutino, E. Isufi and G. Leus, “Advances in Distributed Graph Filtering,” in IEEE Transactions on Signal Processing, vol. 67, no. 9, pp. 2320-2333, 1 May 1, 2019, doi: 10.1109/TSP.2019.2904925.
[3] S. Segarra, A. G. Marques and A. Ribeiro, “Optimal Graph-Filter Design and Applications to Distributed Linear Network Operators,” in IEEE Transactions on Signal Processing, vol. 65, no. 15, pp. 4117-4131, 1 Aug. 1, 2017, doi: 10.1109/TSP.2017.2703660.
[4] F. Gama, E. Isufi, G. Leus and A. Ribeiro, “Graphs, Convolutions, and Neural Networks: From Graph Filters to Graph Neural Networks,” in IEEE Signal Processing Magazine, vol. 37, no. 6, pp. 128-138, Nov. 2020, doi: 10.1109/MSP.2020.3016143.
[5] F. Gama, A. G. Marques, G. Leus and A. Ribeiro, “Convolutional Neural Network Architectures for Signals Supported on Graphs,” in IEEE Transactions on Signal Processing, vol. 67, no. 4, pp. 1034-1049, 15 Feb.15, 2019, doi: 10.1109/TSP.2018.2887403.
[6] F. Gama, A. G. Marques, G. Leus and A. Ribeiro, “Convolutional Neural Network Architectures for Signals Supported on Graphs,” in IEEE Transactions on Signal Processing, vol. 67, no. 4, pp. 1034-1049, 15 Feb.15, 2019, doi: 10.1109/TSP.2018.2887403.
[7] EdgeNets:Edge Varying Graph Neural Networks, https://arxiv.org/abs/2001.07620.
[8] L. Ruiz, F. Gama, A. G. Marques and A. Ribeiro, “Invariance-Preserving Localized Activation Functions for Graph Neural Networks,” in IEEE Transactions on Signal Processing, vol. 68, pp. 127-141, 2020, doi: 10.1109/TSP.2019.2955832.
[9] F. Gama, J. Bruna and A. Ribeiro, “Stability Properties of Graph Neural Networks,” in IEEE Transactions on Signal Processing, vol. 68, pp. 5680-5695, 2020, doi: 10.1109/TSP.2020.3026980.
[10] Decentralized Control with Graph Neural Networks, https://arxiv.org/abs/2012.14906.
[11] D. Owerko, F. Gama and A. Ribeiro, “Optimal Power Flow Using Graph Neural Networks,” in Proc. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 5930-5934, doi: 10.1109/ICASSP40776.2020.9053140.
[12] M. Eisen and A. Ribeiro, “Optimal Wireless Resource Allocation With Random Edge Graph Neural Networks,” in IEEE Transactions on Signal Processing, vol. 68, pp. 2977-2991, 2020, doi: 10.1109/TSP.2020.2988255.