### abstract ###
vlaev and chater  CITATION  demonstrated that the cooperativeness of previously seen prisoner's dilemma games biases choices and predictions in the current game
these effects were  a assimilation to the mean cooperativeness of the played games caused by action reinforcement  and b perceptual contrast with the preceding games depending on the range and the rank order of their cooperativeness
we demonstrate that  when playing against choice strategies that are not biased by such factors  perceptual biases disappear and only assimilation bias caused by reinforcement persists
this suggests that reinforcement learning is a powerful source of inconsistency in strategic interaction  which may not be eliminated even if the other players are unbiased and the markets are efficient
### introduction ###
to explain the behaviour of markets we need a model of the decision-making behaviour of buyers and sellers
to understand how such agents people and firms interact in the economy we need to model the strategies people e g   managers select when the outcome of a situation depends also on the decisions of others agents
thus  an economic understanding of the various markets  of strategic interaction between firms  and indeed of the economy at large  requires understanding how people trade-off payoff and uncertainty when they interact with each other
when decisions are interactive and the outcomes depend also on the decisions of other people  the choice process has a recursive quality  each player makes decisions in the context of assumptions about the decisions of the other player  but the other player may equally choose on the basis of assumptions about the decisions of the first player
game theory attempts to deal with this recursiveness by introducing the concept of a nash equilibrium  CITATION  - a pair of decisions are in nash equilibrium if neither player would obtain a higher expected utility by making a different decision  given that the other player's decision is fixed  and game theory continuously refines this notion
CITATION using experimental methods  psychologists and economists have tested how realistic are such assumptions and approximations  and have found considerable discrepancy between actual behaviour and the predictions of game theory
CITATION  for example  in any prisoner's dilemma pd game  the nash equilibrium is  notoriously  that both players behave uncooperatively in real life  the problem of cooperation is that it can secure mutual benefits  but  by cooperating people risk being exploited
however  many studies showed that behaviour of people playing pd game deviates systematically from theoretical predictions  i e   people cooperate more than expected  CITATION
there are various accounts of this behaviour  some including factors such as misunderstanding of the game  role of repetition of the play and the resulting reputation and retaliation affects  irrationality  motivation incentives  altruism  communication  and so on
CITATION more recently  vlaev and chater  CITATION  tested one of the basic assumptions in game theory that is not typically challenged in experimental work  that each game is considered separately and the resulting choice of strategy should be based only on the attributes of the current game
this study presented a psychological phenomenon  game relativity  which is an anomaly for normative theories of strategic decision-making
specifically  the reported results seem to indicate that people do not possess a well-defined notion of the utility of a strategy and the  cooperativeness  of a game in particular  and instead  people's perceived utility for a strategy appears highly context-sensitive and it depends on the other recently played games
vlaev and chater's  CITATION  experiments were based on research on fundamental cognitive processes in psychophysics  which are related to perception and representation of sensory magnitudes such as loudness  brightness  or weight
note that in judging the utility of decision strategies in games  people must assess the magnitudes of risk and return that are associated with each strategy
in this respect  stewart  chater  stott  and reimers  CITATION  had earlier argued that some of the factors that determine how people assess these magnitudes might be similar to factors underlying assessment of psychophysical magnitudes
there is substantial evidence that people are poor at providing stable absolute judgments of such magnitudes and are heavily influenced by the other options presented to them in the recent past or available at the time of choice
CITATION  such context effects are consistent with people making perceptual judgments on the basis of relative magnitude information  rather than absolute magnitude information
CITATION applying these ideas to strategic decision making in pd games  if the representation of the cooperativeness of a game is also similar to the representation of these simple perceptual dimensions i e   similar underlying cognitive processes are involved  then preceding material might be expected to influence current judgments and decisions in games  as it does in the perceptual case
vlaev and chater  CITATION  tested whether the game's attributes like  cooperativeness   measured by rapoport and chammah's  CITATION  cooperation index ci  behave like those of perceptual stimuli  and they found similar context effects
here we provide a brief summary of these experiments  which are essential background for understanding the argument behind the follow-up study presented in this article
in the various experiments and conditions of this study  the participants played pd games with varying ci  and we tested whether manipulating properties of the distribution of the ci like mean  range  and rank would affect the cooperation rate and the predicted cooperation of the other players
experiment  NUMBER  tested helson's  CITATION  adaptation-level theory
in contrast to the predictions of adaptation-level theory  we did not find contrast effects  depending on whether a particular game is above or below the mean ci i e   the adaptation level - games above below the mean were not perceived as exaggeratedly more less cooperative
instead  the condition with a higher mean cooperativeness caused more  rather than less  cooperation across all game types
this effect of the mean ci can be explained simply by the assumption that cooperativeness is influenced by the amount frequency of observed cooperation that participants received  independent of which game they are playing
this fits with reinforcement accounts of game playing  including pd  CITATION   which predict that more cooperative games on average would lead to more cooperative feedback that reinforces each player to cooperate more across all games
in other words  the mean reinforcement may have caused the observed assimilation effects
in experiments  NUMBER   the range difference between the games along the ci scale was manipulated while keeping their ranks constant
the range of presented games produced a contrast effect so that that games that were further from the minimum ci value in the sequence were perceived as more  cooperative
  in experiment  NUMBER   we varied the rank order between the games along the ci scale  while keeping their range differences constant  and found that the rank had a significant impact on prediction and choice behaviour
the same game  presented with high rank amongst the other games in the sequence condition produced significantly higher cooperation and prediction than when the same game had a low rank
thus  the results from experiments  NUMBER  and  NUMBER  supported the predictions about perceptual contrast  in line with the range frequency theory  CITATION   according to which the neutral point of the judgment scale did not correspond to the mean of the contextual events but rather to a compromise between the midpoint defined by the range of the distribution and the median
the neutral point thus depended on the skew of the distribution and was affected by the rank of the particular stimulus in this distribution
for example  satisfaction judgments would be different in two distributions of experiences that have different skew of their intensities or quality levels and hence will have different rank orders for these stimuli even if the means of the two distributions are the same
the contextual effects caused by the mean  range  and rank of the distribution confirmed our expectations that these relativity effects are due to some general underlying cognitive mechanisms
one is related to perception  and in particular  the representation of perceptual magnitudes  as we discussed earlier
the second fundamental mechanism is related to response action generation  because agents tend to repeat actions e g   c or d according to the average degree of reinforcement with which each action is associated i e   the utility for the agent of the outcome of the game reinforces the chosen strategy
from a psychological point of view  reinforcement corresponds to following thorndike's  CITATION  classic law of effect - repeating behaviours to degree that they are followed by positive outcomes  and stamping out behaviours to the degree that they are followed by negative outcomes
for example  in the context of pd  a reinforcement learner will follow the strategy that brings higher payoff without logical thinking about the strategic structure of the game in other words  reinforcement learner follows the more rewarding choice instead of inferring the dominant strategy  CITATION
vlaev and chater  CITATION  demonstrate that  when people make interactive strategic decisions  these two principles  perceptual vs action related  can create biases in terms of overreaction or under-reaction to particular attributes of games like cooperativeness depending on the environmental distribution of that attribute
none of the existing studies  however  have investigated whether these context effects also hold when playing against a consistent opponent
here we do not mean a  rational  opponent who should permanently defect in pd
by a consistent player  we mean a player whose responses are completely determined by the current game  and not influenced either by the structure of previous games  or the history of past responses
thus  such player is consistent across contexts
because consistent players are uninfluenced by context  they may potentially act to  damp down   rather than amplify  contextual effects on the experimental participant - which may arise if both human participants are influenced by the same contextual factors  hence potentially creating a  bubble  of over- and under-cooperating due to the perceptual or response biases
in our study  the consistent player was a computer algorithm  not a human participant the participants were told this  although the algorithm is not specified
the most psychologically natural model of the consistent player assumes that the probability of cooperation depends on the  cooperativeness  of the game  which is negatively related to the incentive for each player to defect  and also negatively related to the  damage  done to the other player  if one defects
here we use the cooperation index ci which provides a good measure of the typical level of cooperation observed experimentally in pd games  CITATION
the test described in this article is important because it measures the power and sustainability of the perceptual and response biases documented by vlaev and chater  CITATION   which will reveal whether such biases are going to persist in real markets where any overreaction is suboptimal as it can be exploited
in other words  it is good to be cooperative when the situation permits as both players would be better off  if both cooperate  CITATION   but being over-cooperative gives additional incentive to other less cooperative players to exploit you
the unbiased opponent was created by programmed play  in which the participants had to play  against the computer
  in this setting  the computer was pre-programmed to cooperate with a frequency probability reflecting the values of the ci of each game
for example  the computer was programmed to cooperate  NUMBER  percent  of the time when playing games with index   NUMBER 
an alternative design would be to program the computer to respond randomly i e   to cooperate  NUMBER  percent  of the time
however  such a design would not be as powerful evidence about the strength of the biases under question  because the computer generated feedback is more ambiguous about the cooperativeness of the various games cis
thus  players will receive a weaker feedback signal when the games are less or more cooperative  and various biases are more likely to thrive in such an ambiguous environment
on the other side  the policy of cooperating according to ci gives less freedom of interpretation about the cooperativeness of the various games
note that  even if people think that they are playing a  repeated pd game  i e   they think of themselves as playing 'the same' program every time  rather than a different person each time  this should not affect our key argument  because  given the program is uninfluenced by prior context  then it does not matter if the game is repeated or not
actually  if the game is seen as repeated  then that should make the effect of the consistent opponent even stronger  which should  in turn  further weaken the context effects  because the players would be more likely to reciprocate their consistent opponent
to accomplish our research objective  we replicated the design of the three experiments described by vlaev and chater  CITATION
in particular  the manipulated contextual variables were the parameters of the statistical distribution of the cooperativeness of the games in the sequence - mean  range  and rank - which were equivalent to vlaev and chater's experiments  NUMBER    NUMBER   and  NUMBER  respectively
here we present the three studies together for brevity of exposition
