Gopalan, Prem K (Princeton University) “Scalable inference of discrete data: User behavior, networks and genetic variation” (2015)

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

News and Resources for Members of the IEEE Signal Processing Society

March 2015

Gopalan, Prem K (Princeton University) “Scalable inference of discrete data: User behavior, networks and genetic variation” (2015)

PhD Theses

Gopalan, Prem K (Princeton University) “Scalable inference of discrete data: User behavior, networks and genetic variation” (2015) Advisor: Blei, David M.

Recent years have seen explosive growth in data, models and computation. Massive data sets and sophisticated probabilistic models are increasingly used in the fields of high-energy physics, biology, genetics and in personalization applications; however, many statistical algorithms remain inefficient, impeding scientific progress.

In this thesis, we present several efficient statistical algorithms for learning from massive discrete data sets. We focus on discrete data because complex and structured activity such as chromosome folding in three dimensions, human genetic variation, social network interactions and product ratings are often encoded as simple matrices of discrete numerical observations. Our algorithms derive from a Bayesian perspective and lie in the framework of directed graphical models and mean-field variational inference. Situated in this framework, we gain computational and statistical efficiency through modeling insights and through subsampling informative data during inference.

We begin with additive Poisson factorization models for recommending items to users based on user consumption or ratings. These models provide sparse latent representations of users and items, and capture the long-tailed distributions of user consumption. We use them as building blocks for article recommendation models by sharing latent spaces across readership and article text. We demonstrate that our algorithms scale to massive data sets, are easy to implement and provide competitive user recommendations. Then, we develop a Bayesian nonparametric model in which the latent representations of users and items grow to accommodate new data.

In the second part of the thesis, we develop novel algorithms for discovering overlapping communities in large networks. These algorithms interleave non-uniform subsampling of the network with model estimation. Our network models capture the basic ways in which nodes connect to each other, through similarity and popularity, using mixed-memberships representations and generalized linear model formulation.

Finally, we present the TeraStructure algorithm to fit Bayesian models of genetic variation in human populations on tera-sample-sized data sets (1012 observed genotypes, e.g, 1M individuals at 1M SNPs). On real genomic data collected from thousands of individuals, TeraStructure is faster than existing methods and recovers the latent population structure with equal accuracy. On genomic data simulated at the tera-sample-size scales, TeraStructure is highly accurate and is the only method that can complete its analysis.

For details, please visit the thesis page.

Open Calls

Nomination/Position	Deadline
Call for Proposals for 2026 Signal Processing Cup	17 August 2025
Call for Nominations for the IEEE Transactions on Medical Imaging (TMI) Best Paper Award	1 September 2025
Nominate a Colleague for a 2025 IEEE Signal Processing Society Award	1 September 2025
Call for Mentors: 2025 IEEE SPS SigMA Program - Signal Processing Mentorship Academy	14 September 2025
Call for Nominations: Technical Committee Vice Chair and Member Positions	15 September 2025
Take Part in the 2025 Low-Resource Audio Codec (LRAC) Challenge	1 October 2025
Call for proposals: 2027 IEEE Conference on Artificial Intelligence (CAI)	1 October 2025
Meet the 2025 Candidates: IEEE President-Elect	1 October 2025
Call for Nominations for the SPS Chapter of the Year Award	15 October 2025
Call for Project Proposals: IEEE SPS SigMA Program - Signal Processing Mentorship Academy	2 November 2025

Society News

Technical Committee News

webinar_cube.jpg

SPS BSI Webinar: Unlocking Precision Mental Health with Data-Driven Neuroimaging Biomarkers

multimedia_general.jpg

2025 Cycle 1 Chapter Initiative: DecodeX: A Comprehensive Signal Processing Experience

SP-Society-Name-Change-Forum.jpg

2025 Cycle 1 IEEE SPS Forum on IGNITE : A PhD Forum and PG Poster Presentation 2.0

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Gopalan, Prem K (Princeton University) “Scalable inference of discrete data: User behavior, networks and genetic variation” (2015)

Newsletter Menu

Newsletter Categories

Top Reasons to Join SPS Today!

Gopalan, Prem K (Princeton University) “Scalable inference of discrete data: User behavior, networks and genetic variation” (2015)

Open Calls

Table of Contents:

Society News

Technical Committee News

PhD Theses

Member Highlights

Publications News

Conferences & Events

SPS Social Media

IEEE SPS Educational Resources

webinar_cube.jpg

SPS BSI Webinar: Unlocking Precision Mental Health with Data-Driven Neuroimaging Biomarkers

multimedia_general.jpg

2025 Cycle 1 Chapter Initiative: DecodeX: A Comprehensive Signal Processing Experience

SP-Society-Name-Change-Forum.jpg

2025 Cycle 1 IEEE SPS Forum on IGNITE : A PhD Forum and PG Poster Presentation 2.0

What is Signal Processing?

Popular Pages

Today's:

All time:

Last viewed:

Gopalan, Prem K (Princeton University) “Scalable inference of discrete data: User behavior, networks and genetic variation” (2015)

Search form

You are here

Newsletter Menu

Newsletter Categories

Top Reasons to Join SPS Today!

Gopalan, Prem K (Princeton University) “Scalable inference of discrete data: User behavior, networks and genetic variation” (2015)

Open Calls

Table of Contents:

Society News

Technical Committee News

PhD Theses

Member Highlights

Publications News

Conferences & Events

SPS Social Media

IEEE SPS Educational Resources