-- Pseudo-code for a ratings algorithm -- revised: October, 1995 -- Algorithm in C and C++ by Paul Matthews -- Pseudo-code version by Wilfred J. Hansen -- This algorithm computes p.rating, the rating for each player, p, -- assuming that the player's initial rating, p.seed, is within a few -- stones of the correct value. -- The approach is to measure the likelihood of each player's rating -- and the likelihood of the outcome of each game given the difference -- in the opponent's ratings. Multiplying all these likelihoods gives -- likelihood of the set of game outcomes for the given set of ratings. -- -- Each iteration step adjusts each player's rating by +delta, -delta, -- or not at all, depending on which increases the likelihood of the -- outcomes for the player's games. Then the likelihood for the entire -- collection of ratings and game outcomes is recomputed and the change -- by the current delta is accepted if the likelihood increased. The -- iteration repeats with diminishing values of delta until the desired -- significance is achieved. -- For general discussion of the statistics, see: -- Elo, Arpad E., Ratings of Chessplayers, past and present. -- NY: Arco Pub., 1986 -- Joe, Harry, "Rating systems based on paired comparison models", -- Statistics and Probability Letters v. 11, 1991 p. 343-347 -- David, H. A., "The method of paired comparisons" -- In this algorithm, a difference of one in rating is intended to -- indicate a difference of one stone in strength. There is no built-in -- normalization, so there is no way to know if a rating of 1.00 -- corresponds to Shodan; however, relative values should be a -- fair approximation of relative strength. -- -- Strictly speaking, the algorithm has no notion of a difference of one -- stone; the ratings are based on the players announced ratings. If -- players adopt consistent ratings that differ by, say, half a stone, -- the computed ratings will be based on that measure. -- Two special data types, Rating and Likelihood, appear below. -- Both are implemented as floating values. -- Rating is a player rating: a dan value is expressed as the corresponding -- positive number and the kyu value k is entered as 1-k -- (so 1 kyu is zero, 2 kyu is -1, and so on). -- Likelihood is similar to a probability. It is a mapping of a rating -- or game result into [0,1] such that higher values correspond -- to more likely ratings or results. -- Revision: Oct, 1995 -- changed compute_player_impact to recompute p.sigma instead of -- dropping the player -- Revisions: Oct, 1993 -- added prune_unrateable_players -- added player_sensitivity to compute sigma for next cycle -- added declarations of all local variables -- fixed parameter list for call on player_pr in propose_ratings -- changed compute_player_impact to drop player if rating is too far from seed -- Here are parameters as used for the AGA rating system: -- -- p.seed: Rating -- Players's initial rating. -- The latest rating at which the player played in a tournament. -- p.sigma: Rating = -- old constant values -- .5 for players rated at least twice before -- .8 for players rated once before -- 1.0 for previously unrated players 10 kyu and above -- 2.0 for others -- As shown below in player_sensitivity, the current AGA software, -- which went into production in 1993, estimates a posterior variance -- (sigma) for each player together with the player's rating; -- both are carried forward from one rating period to the next. -- AGA tournament players begin at sigma=0.8, unless a TD indicates that -- an entry rank/rating is guesswork, in which case we use sigma=2.0. - PM -- g.handicapeqv: Rating -- Rating point equivalents of handicaps -- === if Nstones == 0 then .5 - .1 * komi else Nstones - .1*komi -- where the handicap is 'Nstones' stones on board -- and a 'komi' of additional points for white -- Nstones is 0 or 2...9 -- -20 <= komi <= 20 -- The handicap equivalents in the 1993 AGA software are the same, -- but are treated as point estimates with variance; thusly, large handicaps -- have less effect on ratings, but the effect is not dramatic. -PM px_sigma: Rating = 1.04 -- see description of game_po, below maxdelta: Rating = 4 -- In successive passes, adjust ratings by -- 4. 2. 1. 0.5 0.25 0.125 0.0625 -- 0.03125 0.015625 0.0078125 0.00390625 EPSILON: Number = 1e-15 -- Very small, positive non-zero value BIG: Number = 4 -- number of standard deviations of ratings change -- after which to disregard player in compute_player_impact MOSTLY: Number = 0.925 -- maximum fraction of wins or loses for prune_unrateable_players -- a value of 0.925 implies a rating is incorrect by about 1.5 stones -- see discussion in game_po(), below -- If players have not played enough games, MOSTLY can be set higher. -- information for each game -- Game::: handicapeqv: Rating -- see above po: Likelihood -- likelihood of the outcome given the -- current rating difference of the players oldpo: Likelihood -- po from prior iteration white: Player -- refers to player info for white player black: Player -- ditto for black player whitewins: Boolean -- TRUE if W wins -- information for each player -- Player::: rating: Rating -- current rating estimate pr: Likelihood -- what is the likelihood of the current rating? oldpr: Likelihood -- pr from prior iteration seed: Rating -- initial rating sigma: Rating -- standard deviation of curve of likelihood of -- ratings in the neighborhood of seed. direction: {UP, DOWN, NONE} -- direction to adust rating -- also: a list of all games the player has played in -- "pr" is the likelihood that a given rating is correct -- "po" similarly the likelihood that a game outcome is correct -- Multiplying all pr and pr values computes the total likelihood, -- "pt", that is, the likelihood of all game outcomes given a -- particular set of ratings for the players. -- prune_unrateable_players() -- A player is considered unratable if he or she has either mostly wins -- or mostly loses. In either case, the seed rating is probably incorrect. -- Players 5 dan and above are allowed to have mostly wins. -- (This routine is not in the AGA algorithm. I have added it as a -- hueristic for situations where initial ratings may be uncertain. -wjh) prune_unrateable_players() wins: Number := 0 loses: Number := 0 for each player, p for each game, g, which p has played if (g.whitewins) = (g.white = p) then wins = wins + 1 else loses := loses + 1 if (wins / (wins+loses) > MOSTLY and p.seed < 5) or (loses / (wins+loses) > MOSTLY) remove games in which p played remove p from list of players -- player_pr(p: Player): Likelihood -- Compute the "prior distribution", pr, of the rating for player 'p'. -- Assume the "correct" rating is in a sample space normally -- distributed about p.seed with standard deviation p.sigma. -- Compute z as the corresponding value in a normal -- distribution with mean of 0 and standard deviation of 1. -- Then compute the likelihood that z is the correct rating -- by evaluating the normal(0,1) density function at z. -- player_pr(p: Player): Likelihood === z: Number = ((p.rating - p.seed) / p.sigma) return exp(-z*z/2) / sqrt(2*pi) -- normal density function -- exp(...)/sqrt(...) is the function for the normal curve, -- that is, a probability density function with mean 0 and s.d. 1. -- game_po(g: Game): Likelihood -- Compute the likelihood, po, of the outcome of game 'g'. -- The likelihood that white wins is the value of the normal -- distribution function evaluated at the ratings difference, -- as adjusted by the handicap stones and normalized by px_sigma. -- With px_sigma of 1.04: -- ratings likelihood -- difference white wins -- 0 .5 -- 1 .83 -- 2 .97 -- That is, if the handicap is two stones too low, white will -- win 97 games out of a hundred. -- The likelihood of a black win is (1 - likelihood white wins). -- game_po(g: Game): Likelihood === rd: Number = g.white.rating - g.black.rating - g.handicapeqv p: Likelihood = .5 + erf((rd/px_sigma) / sqrt(2)) / 2 if g.whitewins return p else return 1-p -- -- erf(x) = [2/sqrt(pi)] * integral from 0 to x of exp(-t*t) dt -- Some Unix systems have erf. In the expression given, it computes -- the normal distribution function, that is, the integral of the -- normal density function from negative infinity -- to rd/px_sigma, the value whose likelihood we want. -- Ralston, Anthony, A First Course in Numerical Analysis, -- McGraw-Hill (New York, 1965), p. 21: -- "the normal distribution function corresponding to the -- normal density function with zero mean and variance n/12 -- is given by 0.5 + 0.5 * erf(sqrt(6/n)x)". -- In the algorithm, the variance is 1, so n = 12. -- propose_ratings(delta: Rating) -- For a ratings change of 'delta', adjust all players -- according to their directions and recompute po for each game. -- propose_ratings(delta: Rating) === for each player, p case p.direction UP: p.rating += delta DOWN: p.rating -= delta NONE: p.oldpr = p.pr p.pr = player_pr(p) for each game, g g.oldpo = g.po g.po = game_po(g) --revert_ratings(delta: Rating) -- For a ratings change of 'delta', revert to prior ratings. -- revert_ratings(delta: Rating) === for each player, p case p.direction UP: p.rating -= delta DOWN: p.rating += delta p.pr = p.oldpr for each game, g g.po = g.oldpo -- compute_pt(): Likelihood -- Compute the likelihood of the combination of all current ratings -- and the outcome of all games. -- In practice, repeated multiplication of probabilities can lead to -- floating underflow. It is preferable to use the logarithms of these -- values and use addition instead of multiplication. Indeed, the entire -- algorithm can be rewritten to utilize logarithms of Likelihood values. -- compute_pt(): Likelihood === pt: Likelihood = 1 for each player, p pt *= p.pr for each game, g pt *= g.po return pt --compute_player_impact(p: Player, delta: Rating): Number -- Compute the ratio ptNew/pt, where -- pt is total likelihood given the current assignment of ratings -- and ptNew is the likelihood resulting from -- changing the rating for player 'p' by 'delta'. -- If this value is greater than one, the new rating is more accurate. -- The tests against BIG are because after four or five sd's, -- the estimated rating has zero likelihood. Therefore we give -- the likelihood a little slope back toward the seed to -- minimize the possibility of a plateau in the search. -- compute_player_impact(p: Player, delta: Rating): Number === pfactor: Likelihood r_save: Rating = p.rating p.rating += delta -- temporarily change rating (for game_po) if abs(p.rating - p.seed) / p.sigma > BIG -- player's rating is far from seed: recompute sigma p.sigma = abs(p.rating - p.seed) / (BIG/2) pfactor = player_pr(p) / p.pr for each game, g, that p has played pfactor *= game_po(g) / g.po p.rating = r_save return pfactor -- player_sensitivity(p) -- Computes the sigma to use for player p the next time the program is run. -- The algorithm is basically numerical integration. As a first approximation, -- consider the how the overall likelihood function changes as p's rating is -- varied to either side of the best estimate, with other player's ratings held -- constant. Normalize the area under that curve to be one, thus defining -- a marginal probability density. Then integrate (X - RATING)**2 (the -- definition of variance). The square root of the result can be taken as an -- estimate of sigma. -PM -- {Beware that variance estimates (e.g., sigmas) in general -- are poor compared with central tendency estimates (e.g., ratings). -- Using sigma=constant may not be much worse.} -PM -- The values of 5, the multiplier for bound, and 10, the divsor of p.sigma, -- are chosen to evaluate the integral by evaluating the function -- at 100 points. -- player_sensitivity(p: Player): Rating bound: Number = 5 * p.sigma -- units: delta Rating psave: Rating = p.rating sumX2W: Number = 0 -- units: (delta Rating)^2 * Likelihood sumW: Likelihood = 0 for values, x, from -bound-p.sigma/20 upto bound incrementing by p.sigma/10 p.rating = psave + x -- compute variance of x, using probabilities as weights w: Likelihood = player_pr(p) for each game, g, in which p played w *= game_po(g) sumX2W += (x * x) * w sumW += w p.rating = psave return sqrt(sumX2W / sumW) -- estimate_ratings(): Likelihood -- Estimate ratings simultaneously for all players and games. -- Returns pt, the likelihood of the outcome, given the new ratings. -- estimate_ratings(): Likelihood === prune_unrateable_players() -- set ratings to seed values for each player, p p.rating = p.seed p.direction = NONE -- calculate initial values of all pt components propose_ratings(0) -- compute initial po and pr best_pt:Number = compute_pt() -- compute resulting pt -- search for best new rating values delta: Number = max_delta while delta >= .002 -- ensure two decimal places of accuracy for each player, p -- decide whether rating should go up or down -- Note: Need to add code to deal with the case where -- p is removed within compute_player_impact chng_plus: Number = compute_player_impact(p, delta) chng_minus: Number = compute_player_impact(p, -delta) if chng_plus < 1 and chng_minus < 1 then p.direction = NONE else if chng_plus > chng_minus p.direction = UP else -- chng_minus > chng_plus p.direction = DOWN -- change ratings and repeat with same or smaller delta propose_ratings(delta) new_pt: Number = compute_pt() if new_pt > best_pt + EPSILON best_pt = new_pt -- do next cycle with same delta else if new_pt >= best_pt -- pt is no worse, but not much better; decrease delta delta /= 2 else -- pt is no better; undo the change & decrease delta revert_ratings(delta) delta /= 2 -- compute sigmas to be used for next ratings cycle for each player, p p.sigma = player_sensitivity(p) return pt