What Should We Learn From... Big Data: Theoretical Aspects from Proceedings of the IEEE Jan. 2016

January 2016

What Should We Learn From... Big Data: Theoretical Aspects from Proceedings of the IEEE Jan. 2016

Big data has burst into public awareness over the past few years as people have become more and more aware of the massive amount of data being produced by social and scientific activities, and its potential utilization for good or harm. On the research front, big data has spurred new activity across a range of fields, including statistics, machine learning, and computer systems. Many areas have been profoundly altered by the big data revolution, including wireless communications, speech processing, social networking, online commerce, medical informatics, and finance. In these areas, and in many others, analysis of the data yields valuable information that deepens understanding, improves decision making, and enhances performance of predictive models. This special issue Big Data: Theoretical Aspects, published in Proceedings of the IEEE Jan. 2016, highlights a number of algorithmic approaches that are fundamental to data analysis, both in formulating and solving problems. These methods form part of the core of the field, a set of tools that can be applied to many specific application areas. ‘‘A Review of Relational Machine Learning for Knowledge Graphs’’ by Nickel et al reviews how statistical models can be trained on large knowledge graphs, and then used to predict new facts, such as prediction of new edges in the graph. It is shown how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web. Google’s Knowledge Vault project provides an example of such combinations. ‘‘Learning to Hash for Indexing Big DataVA Survey’’ by Wang et al describes new approaches that incorporate data-driven learning methods in the development of advanced hash functions. It also provides a comprehensive survey of the learning-to-hash framework and representative techniques of various types that include unsupervised, semisupervised, and supervised procedures. ‘‘Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments’’ by Yang et al. reviews recent work on developing and implementing randomized matrix algorithms in large-scale parallel and distributed environments, with a focus on the theory and practical implementation of random projection and random sampling algorithms for very large and very overdetermined l1-and l2-regression problems. ‘‘Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining’’ by Hero and Rajaratnam presents a correlation-mining framework for large-scale inference from the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different inference tasks. ‘‘Resource Allocation for Statistical Estimation’’ by Berthet and Chandrasekaran adopts a general view of the notion of a resource and its effect on the quality of the corresponding data source. With statistical efficiency as the objective, several stylized examples involving inferential tasks such as parameter estimation and hypothesis testing based on heterogeneous data sources are discussed. ‘‘Magging: Maximin Aggregation for Inhomogeneous Large-Scale Data’’ by Bülmann and Meinshausen shows that a tweak to the aggregation step can produce an estimator whose influences are common to all the data. Thus, the procedure often results in a better prediction than would be the case with pooled effects. ‘‘Learning Reductions That Really Work’’ by Beygelzimer et al presents a summary of the mathematical and computational techniques that have enabled learning reductions to address a wide class of tasks effectively. ‘‘Taking the Human Out of the Loop: A Review of Bayesian Optimization’’ by Shahriari et al introduces Bayesian optimization, highlights some of its methodological aspects, and showcases a wide range of applications. ‘‘Machine Learning in Genomic Medicine: A Review of Computational Problems and Data Sets’’ by Leung et al provides an introduction to machine-learning tasks in genomic medicine. Modern biology allows high-throughput measurement of many cell variables, including gene expression, splicing, and protein binding to nucleic acids, all of which can be treated as training targets for predictive models. With the growing availability of large-scale data sets and advanced computational techniques such as deep learning, researchers in machine learning can help to usher in a new era of effective genomic medicine.

Reference:

Haykin, S. ; Wright, S. ; Bengio, Y. Big Data: Theoretical Aspects [Scanning the Issue]. Proceedings of the IEEE. Jan. 2016:8-10

Open Calls

Nomination/Position	Deadline
Call for Nominations Extended! Board of Governors Members-at-Large and Regional Directors-at-Large	17 April 2026
Call for Nominations: Chief Editor, Resource Center	17 April 2026
Share Your Society Enthusiasm: Become an SPS Ambassador!	15 May 2026
Call for Proposals - SPS Challenge Program	25 May 2026
Nominate a colleague: SPS Distinguished Lecturer or Distinguished Industry Speaker!	31 May 2026
Call for Nominations: IEEE Medals & Recognitions	15 June 2026
Call for Papers: JSTSP Special Issue on Autonomous and Evolutive Optimization in Networked AI	15 June 2026
Apply for a 2026 Signal Processing Society Scholarship!	30 June 2026

Nomination/Position

Deadline

Call for Nominations Extended! Board of Governors Members-at-Large and Regional Directors-at-Large

17 April 2026