### abstract ###
Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior
We discuss in breadth how and in which sense universal (non- iid  )\ sequence prediction solves various (philosophical) problems of traditional Bayesian sequence prediction
We show that Solomonoff's model possesses many desirable properties: Fast convergence and strong bounds, and in contrast to most classical continuous prior densities has no zero p(oste)rior problem, ie \ can confirm universal hypotheses, is reparametrization and regrouping invariant, and avoids the old-evidence and updating problem
It even performs well (actually better) in non-computable environments
### introduction ###
Given the weather in the past, what is the probability of rain tomorrow
What is the correct answer in an IQ test asking to continue the sequence 1,4,9,16,
Given historic stock-charts, can one predict the quotes of tomorrow
Assuming the sun rose 5000 years every day, how likely is doomsday (that the sun does not rise) tomorrow
These are instances of the important problem of inductive inference or time-series forecasting or sequence prediction
Finding prediction rules for every particular (new) problem is possible but cumbersome and prone to disagreement or contradiction
What we are interested in is a formal general theory for prediction
The Bayesian framework is the most consistent and successful framework developed thus far  CITATION
A Bayesian considers a set of environments=\-hypotheses=\-models  SYMBOL  which includes the true data generating probability distribution  SYMBOL
From one's prior belief  SYMBOL  in environment  SYMBOL  and the observed data sequence  SYMBOL , Bayes' rule yields one's posterior confidence in  SYMBOL
In a predictive setting, one directly determines the predictive probability of the next symbol  SYMBOL  without the intermediate step of identifying a (true or good or causal or useful) model
Note that classification and regression can be regarded as special sequence prediction problems, where the sequence  SYMBOL  of  SYMBOL -pairs is given and the class label or function value  SYMBOL  shall be predicted
The Bayesian framework leaves open how to choose the model class  SYMBOL  and prior  SYMBOL
General guidelines are that  SYMBOL  should be small but large enough to contain the true environment  SYMBOL , and  SYMBOL  should reflect one's prior (subjective) belief in  SYMBOL  or should be non-informative or neutral or objective if no prior knowledge is available
But these are informal and ambiguous considerations outside the formal Bayesian framework
Solomonoff's  CITATION  rigorous, essentially unique, formal, and universal solution to this problem is to consider a single large universal class  SYMBOL  suitable for  all  induction problems
The corresponding universal prior  SYMBOL  is biased towards simple environments in such a way that it dominates=\-superior to all other priors
This leads to an a priori probability  SYMBOL  which is equivalent to the probability that a universal Turing machine with random input tape outputs  SYMBOL
Many interesting, important, and deep results have been proven for Solomonoff's universal distribution  SYMBOL   CITATION
The motivation and goal of this paper is to provide a broad discussion of how and in which sense universal sequence prediction solves all kinds of (philosophical) problems of Bayesian sequence prediction, and to present some recent results
Many arguments and ideas could be further developed
I hope that the exposition stimulates such a future, more detailed, investigation
In Section  we review the excellent predictive performance of Bayesian sequence prediction for generic (non- iid  )\ countable and continuous model classes
Section  critically reviews the classical principles (indifference, symmetry, minimax) for obtaining objective priors, introduces the universal prior inspired by Occam's razor and quantified in terms of Kolmogorov complexity
In Section  (for  iid  \  SYMBOL ) and Section  (for universal  SYMBOL ) we show various desirable properties of the universal prior and class (non-zero p(oste)rior, confirmation of universal hypotheses, reparametrization and regrouping invariance, no old-evidence and updating problem) in contrast to (most) classical continuous prior densities
Finally, we show that the universal mixture performs better than classical continuous mixtures, even in uncomputable environments
Section  contains critique and summary
