The Elo rating system is a method for calculating the relative skill levels of players in competitor-versus-competitor games such as chess. It is named after its creator Arpad Elo, a Hungarian-born American physics professor.

The Elo system was invented as an improved chess rating system and is also used in many other games. It is also used as a rating system for multiplayer competition in a number of video games,[1] and has been adapted to team sports including association football (soccer), American college football, basketball, Major League Baseball, competitive programming, and esports.

The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent's is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%.[2]

A player's Elo rating is represented by a number which increases or decreases based upon the outcome of games between rated players. After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game. In a series of games between a high-rated player and a low-rated player, the high-rated player is expected to score more wins. If the high-rated player wins, then only a few rating points will be taken from the low-rated player. However, if the lower rated player scores an upset win, many rating points will be transferred. The lower rated player will also gain a few points from the higher rated player in the event of a draw. This means that this rating system is self-correcting. A player whose rating is too low should, in the long run, do better than the rating system predicts, and thus gain rating points until the rating reflects their true playing strength.

The Elo system was invented as an improved chess rating system and is also used in many other games. It is also used as a rating system for multiplayer competition in a number of video games,[1] and has been adapted to team sports including association football (soccer), American college football, basketball, Major League Baseball, competitive programming, and esports.

The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent's is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%.[2]

A player's Elo rating is represented by a number which increases or decreases based upon the outcome of games between rated players. After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game. In a series of games between a high-rated player and a low-rated player, the high-rated player is expected to score more wins. If the high-rated player wins, then only a few rating points will be taken from the low-rated player. However, if the lower rated player scores an upset win, many rating points will be transferred. The lower rated player will also gain a few points from the higher rated player in the event of a draw. This means that this rating system is self-correcting. A player whose rating is too low should, in the long run, do better than the rating system predicts, and thus gain rating points until the rating reflects their true playing strength.

**History**

Arpad Elo was a master-level chess player and an active participant in the United States Chess Federation (USCF) from its founding in 1939.[3] The USCF used a numerical ratings system, devised by Kenneth Harkness, to allow members to track their individual progress in terms other than tournament wins and losses. The Harkness system was reasonably fair, but in some circumstances gave rise to ratings which many observers considered inaccurate. On behalf of the USCF, Elo devised a new system with a more sound statistical basis.

Elo's system replaced earlier systems of competitive rewards with a system based on statistical estimation. Rating systems for many sports award points in accordance with subjective evaluations of the 'greatness' of certain achievements. For example, winning an important golf tournament might be worth an arbitrarily chosen five times as many points as winning a lesser tournament.

A statistical endeavor, by contrast, uses a model that relates the game results to underlying variables representing the ability of each player.

Elo's central assumption was that the chess performance of each player in each game is a normally distributed random variable. Although a player might perform significantly better or worse from one game to the next, Elo assumed that the mean value of the performances of any given player changes only slowly over time. Elo thought of a player's true skill as the mean of that player's performance random variable.

A further assumption is necessary, because chess performance in the above sense is still not measurable. One cannot look at a sequence of moves and say, "That performance is 2039." Performance can only be inferred from wins, draws and losses. Therefore, if a player wins a game, he is assumed to have performed at a higher level than his opponent for that game. Conversely if he loses, he is assumed to have performed at a lower level. If the game is a draw, the two players are assumed to have performed at nearly the same level.

Elo did not specify exactly how close two performances ought to be to result in a draw as opposed to a win or loss. And while he thought it is likely that each player might have a different standard deviation to his performance, he made a simplifying assumption to the contrary.

To simplify computation even further, Elo proposed a straightforward method of estimating the variables in his model (i.e. the true skill of each player). One could calculate relatively easily, from tables, how many games a player would be expected to win based on a comparison of his rating to the ratings of his opponents. If a player won more games than expected, his rating would be adjusted upward, while if he won fewer than expected his rating would be adjusted downward. Moreover, that adjustment was to be in exact linear proportion to the number of wins by which the player had exceeded or fallen short of his expected number.

From a modern perspective, Elo's simplifying assumptions are not necessary because computing power is inexpensive and widely available. Moreover, even within the simplified model, more efficient estimation techniques are well known. Several people, most notably Mark Glickman, have proposed using more sophisticated statistical machinery to estimate the same variables. On the other hand, the computational simplicity of the Elo system has proven to be one of its greatest assets. With the aid of a pocket calculator, an informed chess competitor can calculate to within one point what his next officially published rating will be, which helps promote a perception that the ratings are fair.

Implementing Elo's scheme

The USCF implemented Elo's suggestions in 1960,[4] and the system quickly gained recognition as being both more fair and accurate than the Harkness rating system. Elo's system was adopted by the World Chess Federation (FIDE) in 1970. Elo described his work in some detail in the book The Rating of Chessplayers, Past and Present, published in 1978.

Subsequent statistical tests have suggested that chess performance is almost certainly not distributed as a normal distribution, as weaker players have significantly (but not highly significantly) greater winning chances than Elo's model predicts.[citation needed] Therefore, the USCF and some chess sites use a formula based on the logistic distribution. Significant statistical anomalies have also been found when using the logistic distribution in chess.[5] FIDE continues to use the normal distribution. The normal and logistic distribution points are, in a way, arbitrary points in a spectrum of distributions which would work well. In practice, both of these distributions work very well for a number of different games.

(From Wikipedia)

Full article here