### abstract ###
Scenarios for the emergence or bootstrap of a lexicon involve the repeated interaction between at least two agents  who must reach a consensus on how to name  SYMBOL  objects using   SYMBOL  words
Here we consider minimal models of  two types of learning algorithms: cross-situational learning, in which the individuals determine the meaning of a  word by looking for something in common across all observed uses of that word, and supervised operant conditioning learning,  in which there is strong feedback between individuals about the intended meaning of the words
Despite the  stark differences between these learning schemes, we show that they yield the same communication accuracy in the realistic  limits of large   SYMBOL  and  SYMBOL , which coincides with the result of the classical occupancy problem of randomly  assigning  SYMBOL  objects to  SYMBOL   words
### introduction ###
How a coherent lexicon can emerge in a group of interacting agents is a major open issue in the language evolution and  acquisition research area (Hurford, 1989; Nowak \& Krakauer, 1999; Steels, 2002; Kirby, 2002; Smith, Kirby, \& Brighton, 2003)
In addition, the dynamics in the self-organization of shared lexicons is one of the issues to which computational and mathematical  modeling can contribute the most, as the emergence of a lexicon from scratch implies some type of self-organization and, possibly,  threshold phenomenon
This cannot be completely understood without a thorough exploration of the parameter space of the models  (Baronchelli, Felici, Loreto, Caglioli, \& Steels, 2006)
There are two main research avenues to investigate the emergence or bootstrapping of a lexicon
The first approach,  inspired by the seminal work of Pinker and Bloom (1990) who argued that natural selection is the main design principle  to explain the emergence and complex structure of language, resorts to evolutionary algorithms to evolve the shared lexicon
The key element here is that an improvement on the communication ability of an individual results, in average, in an increase  of the number of offspring it produces (Hurford, 1989; Nowak \& Krakauer, 1999; Cangelosi, 2001; Fontanari \& Perlovsky, 2007, 2008)
The second research avenue, which we will follow in this paper, argues for a culturally based view of language evolution and so  it assumes that the lexicons are acquired and modified solely through learning during the individual's lifetime  (Steels, 2002; Smith, Kirby, \& Brighton, 2003)
Of course, if there is a fact about language which is uncontroversial, it is that the lexicon must be learned from the active or  passive interaction between children and language-proficient adults
The issue of whether this ability to learn the lexicon is  due to some domain-general learning mechanism, or  is an innate ability, unique to humans, is still on the table (Bates \& Elman, 1996)
In the problem we address here, there is simply no language-proficient individuals, so it is not so far-fetched to put forward a  biological rather than a cultural explanation for the emergence of a self-organized lexicon
Nevertheless, in this contribution  we will use many insights produced by research on language acquisition by children (see, eg , Gleitman, 1990; Bloom, 2000) to  study different learning strategies
From a developmental perspective, there are basically two competing schemes for lexicon acquisition by children  (Rosenthal \& Zimmerman, 1978)
The first scheme, termed cross-situational or observational learning, is based on the  intuitive idea that one way that a learner can determine the meaning of a word is to find something in common across all  observed uses of that word (Pinker, 1984; Gleitman, 1990; Siskind, 1996)
Hence learning takes place through the statistical  sampling of the contexts in which a word appears
Since the learner receives no feedback about its inferences, we refer to  this scheme as unsupervised learning
The second scheme, known generally as operant conditioning, involves the active  participation of the agents in the learning process, with exchange of non-linguistic cues to provide feedback on the  hearer inferences
This supervised learning scheme has been applied to the design of a system for communication by  autonomous robots -- the so-called language game in the Talking Heads experiments (Steels, 2003)
Despite the technological  appeal, the empirical evidence is that most part of the lexicon is acquired by children as a product of unsupervised learning  (Pinker, 1984; Gleitman, 1990; Bloom, 2000)
Interestingly, from the perspective of evolving or bootstrapping a lexicon, the unsupervised scheme is very attractive too,  since it eliminates altogether the issue of honest signaling (Dawkins \& Krebs, 1978), as no signaling is involved in the  learning process, which requires only observation and some elements of intuitive psychology (e g Theory of Mind)
Many different computational implementations and variants of these two schemes for bootstrapping a lexicon have  been proposed in the literature
For example, Smith (2003a, 2003b), Smith, Smith, Blythe, \& Vogt (2006),  and De Beule, De Vylder, \& Belpaeme (2006) have addressed the unsupervised learning scheme,  whereas Steels \& Kaplan (1999), Ke, Minett, Au, Wang (2002), Smith, Kirby, \& Brighton, (2003),  and  Lenaerts, Jansen,  Tuyls, \& De Vylder (2005), the supervised scheme
However, except for the extensive statistical  analysis of a variant of the supervised learning algorithm which reduces the problem to that of naming a single object  (Baronchelli, Felici, Loreto,  Caglioli, \& Steels, 2006), the study of the effects of changing the parameters of those  models have been usually limited to the display of the time evolution of some measure of the communication accuracy of the  population
Although at first sight the supervised learning scheme may seem to be clearly superior to the unsupervised  one (albeit less realistic in the context of language acquisition by children), we are not aware of any thorough  comparison between the performances of these two learning scenarios
In fact, in this contribution we show that in  a realistic limit of very large lexicon sizes the supervised and unsupervised learning performances are essentially identical
In this paper we study minimal models of the supervised and unsupervised learning schemes which preserve the main  ingredients of these two classical language acquisition paradigms
For the sake of simplicity, here we interpret the  lexicon as a mapping between objects and words (or sounds) rather than as a mapping between meanings (conceptual structures)  and sounds
A more complete scenario would involve first the creation of meanings, i e , the bootstrapping of an object-meaning  mapping (Steels, 1996; Fontanari, 2006) and then the emergence of a meaning-sound mapping  (see, eg , Smith, 2003a, 2003b; Fontanari \& Perlovsky, 2006)
