Chris Schwiegelshohn: on Coresets for Logistic Regression
26 September 2018
Manno, Galleria 1, 2nd floor, room G1-204 @12:00
Coresets are one of the central methods to facilitate the analysis of large data sets. We continue a recent line of research applying the theory of coresets to logistic regression.
First, we show a negative result, namely, that no strongly sublinear sized coresets exist for logistic regression.
To deal with intractable worst-case instances, we introduce a complexity measure $\mu(X)$, which quantifies the hardness of compressing a data set for logistic regression. $\mu(X)$ has an intuitive statistical interpretation that may be of independent interest.
For data sets with bounded $\mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1\pm\eps)$-coreset.
Our algorithms are viable in practise, comparing favorably to uniform sampling as well as to state of the art methods in the area.

Joint work with Alexander Munteanu, Christian Sohler, and David Woodruff. To appear at NIPS 2018.

The speaker

Chris Schwiegelshohn is currently a Researcher in Sapienza, University of Rome. He did his Phd in Dortmund with a thesis on "Algorithms for Large-Scale Graph and Clustering Problems". Chris' research interests include streaming and approximation algorithms as well as machine learning.


Registration is welcome

Pizza and drinks will be offered at the end of the talk.
If you plan to attend, please register in a timely fashion at the
following link so that we will have no shortage of food.