What Should We Learn From... Big Data: Theoretical Aspects from Proceedings of the IEEE Jan. 2016

You are here

Inside Signal Processing Newsletter Home Page

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

10 years of news and resources for members of the IEEE Signal Processing Society

What Should We Learn From... Big Data: Theoretical Aspects from Proceedings of the IEEE Jan. 2016


Big data has burst into public awareness over the past few years as people have become more and more aware of the massive amount of data being produced by social and scientific activities, and its potential utilization for good or harm. On the research front, big data has spurred new activity across a range of fields, including statistics, machine learning, and computer systems. Many areas have been profoundly altered by the big data revolution, including wireless communications, speech processing, social networking, online commerce, medical informatics, and finance. In these areas, and in many others, analysis of the data yields valuable information that deepens understanding, improves decision making, and enhances performance of predictive models.

This special issue Big Data: Theoretical Aspects, published in Proceedings of the IEEE Jan. 2016, highlights a number of algorithmic approaches that are fundamental to data analysis, both in formulating and solving problems. These methods form part of the core of the field, a set of tools that can be applied to many specific application areas.

‘‘A Review of Relational Machine Learning for Knowledge Graphs’’ by Nickel et al reviews how statistical models can be trained on large knowledge graphs, and then used to predict new facts, such as prediction of new edges in the graph. It is shown how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web. Google’s Knowledge Vault project provides an example of such combinations.

‘‘Learning to Hash for Indexing Big DataVA Survey’’ by Wang et al describes new approaches that incorporate data-driven learning methods in the development of advanced hash functions. It also provides a comprehensive survey of the learning-to-hash framework and representative techniques of various types that include unsupervised, semisupervised, and supervised procedures.

‘‘Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments’’ by Yang et al. reviews recent work on developing and implementing randomized matrix algorithms in large-scale parallel and distributed environments, with a focus on the theory and practical implementation of random projection and random sampling algorithms for very large and very overdetermined l1-and l2-regression problems.

‘‘Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining’’ by Hero and Rajaratnam presents a correlation-mining framework for large-scale inference from the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.

‘‘Resource Allocation for Statistical Estimation’’ by Berthet and Chandrasekaran adopts a general view of the notion of a resource and its effect on the quality of the corresponding data source. With statistical efficiency as the objective, several stylized examples involving inferential tasks such as parameter estimation and hypothesis testing based on heterogeneous data sources are discussed.

‘‘Magging: Maximin Aggregation for Inhomogeneous Large-Scale Data’’ by Bülmann and Meinshausen shows that a tweak to the aggregation step can produce an estimator whose influences are common to all the data. Thus, the procedure often results in a better prediction than would be the case with pooled effects.

‘‘Learning Reductions That Really Work’’ by Beygelzimer et al presents a summary of the mathematical and computational techniques that have enabled learning reductions to address a wide class of tasks effectively.

‘‘Taking the Human Out of the Loop: A Review of Bayesian Optimization’’ by Shahriari et al introduces Bayesian optimization, highlights some of its methodological aspects, and showcases a wide range of applications.

‘‘Machine Learning in Genomic Medicine: A Review of Computational Problems and Data Sets’’ by Leung et al provides an introduction to machine-learning tasks in genomic medicine. Modern biology allows high-throughput measurement of many cell variables, including gene expression, splicing, and protein binding to nucleic acids, all of which can be treated as training targets for predictive models. With the growing availability of large-scale data sets and advanced computational techniques such as deep learning, researchers in machine learning can help to usher in a new era of effective genomic medicine.

Reference:

Haykin, S. ; Wright, S. ; Bengio, Y. Big Data: Theoretical Aspects [Scanning the Issue]. Proceedings of the IEEE. Jan. 2016:8-10

Table of Contents:

SPS on Twitter

SPS Videos


Signal Processing in Home Assistants

 


Multimedia Forensics


Careers in Signal Processing             

 


Under the Radar