List of publications
Privacy
- N. Mishra and M. Sandler, Privacy via Pseudorandom Sketches. To appear in PODS 2006.
Preliminary version available as Cornell Technical Report
TR2005-2012
Mathematical models for data
- A. Dasgupta,
J. Hopcroft,
J. Kleinberg, M. Sandler,
On Learning mixtures of heavy-tailed distributions, FOCS 2005 .
Summary:
We formulate sufficient separation conditions and present a learning algorithm
with provable guarantees for mixtures of distributions that satisfy these separation conditions.
Our bounds are independent of the variances of the distributions, and to the best of our knowledge,
the presented algorithm is the first with provable learning guarantees for distributions with infinite
variance and/or expectation. The same algorithm matches the best known separation bounds for Gaussians
and log-concave distributions.
- M. Sandler. On the use of Linear Programming for Unsupervised
Text Classification, KDD 2005.
The collection of python/matlab scripts used for experiments is available
here.
Summary :
We propose a new algorithm for dimensionality reduction
and unsupervised text classification.
We use mixture models as underlying process of generating corpus
and utilize a novel, L1-norm based approach introduced by Kleinberg and Sandler.
We show that our algorithm performs extremely well on large datasets,
with peak accuracy approaching that of supervised learning based on Support Vector
Machines with large training sets.
-
J. Kleinberg, M. Sandler.
Using Mixture Models for Collaborative Filtering.
In Proc. 35th ACM Symposium on Theory of computing, 2004, To appear in Special Issue of Journal of Computer and System Sciences.
Summary:
Consider a system, where users behave accordingly to hidden mixture
structure of the items. We show that if certain conditions are met, it is possible to reconstruct the latent
mixture model and give close to optimal recommendations.
Our bounds depend on a novel measure of independence that can be
viewed as an L_1-analogue of the smallest singular value of a matrix.
We show that standard approaches based on SVD methods
are not strong enough to yield comparable results, thereby
suggesting some inherent limitations of spectral analysis.
-
J. Kleinberg, M. Sandler.
Convergent Algorithms for Collaborative Filtering.
Proc. 4th ACM Conference on Electronic Commerce, 2003.
Summary:
We generalize a model of
collaborative filtering proposed by Kumar et al.,
in which the recommendations made by an algorithm are
compared to those of an {\em omniscient} algorithm that
knows the hidden preferences of users.
Within our generalized model, we
develop a recommendation algorithm with a
strong convergence property --- as the amount of
data increases, the quality of its recommendations
approach those of the optimal omniscient algorithm.
Network Design
-
J. Kleinberg, M. Sandler, A. Slivkins.
Network Failure Detection and Graph Connectivity.
Proc. 15th ACM-SIAM Symposium on Discrete Algorithms, 2004.
Summary:
We consider a failure detection problem, where an adversary can delete fixed
number edges or nodes from the network. We show, that using using a small set of 'agents'
placed randomly detectors on nodes of the network, with high probability we can detect when
large component become separated from the rest of the graph. We show that the number
of detectors needed is inversely proportional to the graph connectivity.