### abstract ###
Given a finite set of words  SYMBOL  independently drawn according to a fixed unknown distribution law  SYMBOL  called a  stochastic language , an usual goal in Grammatical Inference is to infer an estimate of  SYMBOL  in some class of probabilistic models, such as  Probabilistic Automata  (PA)
Here, we study the class  SYMBOL  of  rational stochastic languages , which consists in stochastic languages that can be generated by  Multiplicity Automata  (MA) and which strictly includes the class of stochastic languages generated by PA
Rational stochastic languages have minimal normal representation which may be very concise, and whose parameters can be efficiently estimated from stochastic samples
We design an efficient inference algorithm DEES which aims at building a minimal normal representation of the target
Despite the fact that no recursively enumerable class of MA computes exactly  SYMBOL , we show that DEES strongly identifies  SYMBOL  in the limit
We study the intermediary MA output by DEES and show that they compute rational series which converge absolutely to one and which can be used to provide stochastic languages which closely estimate the target
### introduction ###
In probabilistic grammatical inference, it is supposed that data arise in the form of a finite set of words  SYMBOL , built on a predefinite alphabet  SYMBOL , and independently drawn according to a fixed unknown distribution law on  SYMBOL  called a  stochastic language
Then, an usual goal is to try to infer an estimate of this distribution law in some class of probabilistic models, such as  Probabilistic Automata  (PA), which have the same expressivity as Hidden Markov Models (HMM)
PA are identifiable in the limit~ CITATION
However, to our knowledge, there exists no efficient inference algorithm able to deal with the whole class of stochastic languages that can be generated from PA
Most of the previous works use restricted subclasses of PA such as Probabilistic Deterministic Automata (PDA)~ CITATION
In the other hand, Probabilistic Automata are particular cases of  Multiplicity Automata , and stochastic languages which can be generated by multiplicity automata are special cases of  rational languages  that we call  rational stochastic languages
MA have been used in grammatical inference in a variant of the exact learning model of Angluin  CITATION  but not in probabilistic grammatical inference
Let us design by  SYMBOL , the class of rational stochastic languages over the semiring  SYMBOL
When  SYMBOL  or  SYMBOL ,  SYMBOL  is exactly the class of stochastic languages generated by PA with parameters in  SYMBOL
But, when  SYMBOL  or  SYMBOL , we obtain strictly greater classes which provide several advantages and at least one drawback: elements of  SYMBOL  may have significantly smaller representation in  SYMBOL  which is clearly an advantage from a learning perspective; elements of  SYMBOL  have a minimal normal representation while such normal representations do not exist for PA; parameters of these minimal representations are directly related to probabilities of some natural events of the form  SYMBOL , which can be efficiently estimated from stochastic samples; lastly, when  SYMBOL  is a field, rational series over  SYMBOL  form a vector space and efficient linear algebra techniques can be used to deal with rational stochastic languages
However, the class  SYMBOL  presents a serious drawback : there exists no recursively enumerable subset of MA which exactly generates it~ CITATION
Moreover, this class of representations is unstable: arbitrarily close to an MA which generates a stochastic language, we may find MA whose associated rational series  SYMBOL  takes negative values and is not absolutely convergent: the global weight  SYMBOL  may be unbounded or not (absolutely) defined
However, we show that  SYMBOL  is strongly identifiable in the limit: we design an algorithm DEES such that, for any target  SYMBOL  and given access to an infinite sample  SYMBOL  drawn according to  SYMBOL , will converge in a finite but unbounded number of steps to a minimal normal representation of  SYMBOL
Moreover, DEES is efficient: it runs within polynomial time in the size of the input and it computes a minimal number of parameters with classical statistical rates of convergence
However, before converging to the target, DEES output MA which are close to the target but which do not compute stochastic languages
The question is: what kind of guarantees do we have on these intermediary hypotheses and how can we use them for a probabilistic inference purpose
We show that, since the algorithm aims at building a minimal normal representation of the target, the intermediary hypotheses  SYMBOL  output by DEES have a nice property: they absolutely converge to 1, i e SYMBOL  and  SYMBOL
As a consequence,  SYMBOL  is defined without ambiguity for any  SYMBOL , and it can be shown that  SYMBOL  tends to 0 as the learning proceeds
Given any such series  SYMBOL , we can efficiently compute a stochastic language  SYMBOL , which is not rational, but has the property that  SYMBOL  for any word  SYMBOL  such that  SYMBOL
Our conclusion is that, despite the fact that no recursively enumerable class of MA represents the class of rational stochastic languages, MA can be used efficiently to infer such stochastic languages
Classical notions on stochastic languages, rational series, and multiplicity automata are recalled in Section~
We study an example which shows that the representation of rational stochastic languages by MA with real parameters may be very concise
We introduce our inference algorithm DEES in Section~ and we show that  SYMBOL  is strongly indentifiable in the limit
We study the properties of the MA output by DEES in Section~ and we show that they define absolutely convergent rational series which can be used to compute stochastic languages which are estimates of the target
