### abstract ###
We define a novel, basic, unsupervised learning problem - learning the lowest density homogeneous hyperplane separator of an unknown probability distribution
This task is relevant to several problems in machine learning, such as semi-supervised learning and clustering stability
We investigate the question of existence of a universally consistent algorithm for this problem
We propose two natural learning paradigms and prove that, on input unlabeled random samples generated by any member of a rich family of distributions, they are guaranteed to converge to the optimal separator for that distribution
We complement this result by showing that no learning algorithm for our task can achieve uniform learning rates (that are independent of the data generating distribution)
### introduction ###
While the theory of machine learning has achieved extensive understanding of many aspects of supervised learning, our theoretical understanding of unsupervised learning leaves a lot to be desired
In spite of the obvious practical importance of various unsupervised learning tasks, the state of our current knowledge does not provide anything that comes close to the rigorous mathematical performance guarantees that classification prediction theory enjoys
In this paper we make a small step in that direction by analyzing one specific unsupervised learning task -- the detection of low-density linear separators for data distributions over Euclidean spaces
We consider the following task:  for an unknown data distribution over  SYMBOL , find the homogeneous hyperplane of lowest density that cuts through that distribution
We assume that the underlying data distribution has a continuous density function and that the data available to the learner are finite  iid 
samples of that distribution
Our model can be viewed as a restricted instance of the fundamental issue of inferring information about a probability distribution from the random samples it generates
Tasks of that nature range from the ambitious problem of density estimation  CITATION , through estimation of level sets  CITATION ,  CITATION ,  CITATION , densest region detection  CITATION , and, of course, clustering
All of these tasks are notoriously difficult with respect to both the sample complexity and the computational complexity aspects (unless one presumes strong restrictions about the nature of the underlying data distribution)
Our task seems more modest than these
Although we are not aware of any previous work on this problem (from the point of view of statistical machine learning, at least), we believe that it is a rather basic problem that is relevant to various practical learning scenarios
One important domain to which the detection of low-density linear data separators is relevant is semi-supervised learning  CITATION
Semi-supervised learning is motivated by the fact that in many real world classification problems, unlabeled samples are much cheaper and easier to obtain than labeled examples
Consequently, there is great incentive to develop tools by which such unlabeled samples can be utilized to improve the quality of sample based classifiers
Naturally, the utility of unlabeled data to classification depends on assuming some relationship between the unlabeled data distribution and the class membership of data points (see  CITATION  for a rigorous discussion of this point)
A common postulate of that type is that the boundary between data classes passes through low-density regions of the data distribution
The Transductive Support Vector Machines paradigm (TSVM)~ CITATION  is an example of an algorithm that implicitly uses such a low density boundary assumption
Roughly speaking, TSVM searches for a hyperplane that has small error on the labeled data and at the same time has wide margin with respect to the unlabeled data sample
Another area in which low-density boundaries play a significant role is the analysis of clustering stability
Recent work on the analysis of clustering stability found close relationship between the stability of a clustering and the data density along the cluster boundaries -- roughly speaking, the lower these densities the more stable the clustering ( CITATION ,  CITATION )
A low-density-cut algorithm for a family  SYMBOL  of probability distributions takes as an input a finite sample generated by some distribution  SYMBOL  and has to output a hyperplane through the origin with low density w r t
SYMBOL
In particular, we consider the family of all distributions over  SYMBOL  that have continuous density functions
We investigate two notions of success for low-density-cut algorithms -- uniform convergence (over a family of probability distributions) and consistency
For uniform convergence we prove a general negative result, showing that no algorithm can guarantee any fixed convergence rates (in terms of sample sizes)
This negative result holds even in the simplest case where the data domain is the one-dimensional unit interval
For consistency (e g , allowing the learning/convergence rates to depend on the data-generating distribution), we prove the success of two natural algorithmic paradigms;  Soft-Margin  algorithms that choose a margin parameter (depending on the sample size) and output the separator with lowest empirical weight in the margins around it, and  Hard-Margin  algorithms that choose the separator with widest sample-free margins
The paper is organized as follows: Section  provides the formal definition of our learning task as well as the success criteria that we investigate
In Section  we present two natural learning paradigms for the problem over the real line and prove their universal consistency over a rich class of probability distributions
Section  extends these results to show the learnability of lowest-density homogeneous linear cuts for probability distributions over  SYMBOL  for arbitrary dimension,  SYMBOL
In Section  we show that the previous universal consistency results cannot be improved to obtain  uniform  learning rates (by any finite-sample based algorithm)
We conclude the paper with a discussion of directions for further research
