včera 23:33 | Zajímavý projekt

Na Humble Bundle byla spuštěna akce Humble Book Bundle: Web Programming by O'Reilly. Za 1 dolar a více lze koupit 5 elektronických knih, za 8 dolarů a více lze koupit 11 elektronických knih a za 15 dolarů a více lze koupit 17 elektronických knih věnovaných webovému programování od nakladatelství O'Reilly Media. Část ceny lze určit charitě.

Ladislav Hagara | Komentářů: 0
včera 23:00 | Pozvánky

Spolek OpenAlt zve příznivce otevřených řešení a přístupu na 162. brněnský sraz, který proběhne v pátek 22. března od 18:00 v restauraci Slatinský šenk na adrese Zlínská 12.

Ladislav Hagara | Komentářů: 0
včera 16:22 | Nová verze

Jonathan Thomas oznámil vydání nové verze 2.4.4 video editoru OpenShot (Wikipedie). Přehled novinek na YouTube. Zdrojové kódy OpenShotu jsou k dispozici na GitHubu. Ke stažení je také balíček ve formátu AppImage. Stačí jej stáhnout, nastavit právo na spouštění a spustit.

Ladislav Hagara | Komentářů: 0
včera 15:22 | Zajímavý článek informuje, že Firefox bude mít nového správce hesel Lockbox. Lockbox bude integrován s Firefox Monitorem – službou pro varování při únicích dat.

Ladislav Hagara | Komentářů: 0
včera 11:33 | Nová verze

Webový prohlížeč Falkon založený na QtWebEngine (Blink) byl vydán ve verzi 3.1. Podle seznamu změn zlepšuje integraci v rámci KDE, opravuje několik chyb a stabilizuje podporu rozšíření v Python a přidává podporu QML.

Fluttershy, yay! | Komentářů: 2
včera 11:22 | Komunita

Česká Wikipedie je dnes vypnuta. Na protest proti evropské reformě autorského práva.

Ladislav Hagara | Komentářů: 13
20.3. 18:11 | Nová verze

Po půl roce vývoje od vydání verze 7.0.0 byla vydána verze 8.0.0 překladačové infrastruktury LLVM (Wikipedie). Přehled novinek v poznámkách k vydání: LLVM, Clang, clang-tools-extra, LLD a Libc++.

Ladislav Hagara | Komentářů: 0
20.3. 17:44 | Nová verze

Bylo vydáno Eclipse IDE 2019-03 aneb Eclipse 4.11. Nově vychází toto vývojové prostředí čtvrtletně. Představení novinek na YouTube.

Ladislav Hagara | Komentářů: 0
20.3. 01:11 | IT novinky

V říjnu loňského roku společnost Google představila hráčům počítačových her Project Stream umožňující streamování AAA her do webového prohlížeče Chrome. Včera na konferenci GDC 2019 (Game Developers Conference) společnost Google oficiálně představila (YouTube) pokračování projektu – herní platformu Stadia. V Česku a na Slovensku je služba Stadia zatím nedostupná.

Ladislav Hagara | Komentářů: 24
19.3. 23:11 | Nová verze

Byla vydána Java 12 / JDK 12. Nových vlastností (JEP - JDK Enhancement Proposal) je 8. Nová Java / JDK vychází každých 6 měsíců.

Ladislav Hagara | Komentářů: 2
Examining how aging affects ability to play chess

12.12.2018 07:25 | Přečteno: 6809× | poslední úprava: 12.12.2018 19:15

1. Introduction

While performance of a long-time chess player is expected to start gradually decreasing at some point in life due to a decline of cognitive functions, a little is known on the subject. Specifically, at which age player is expected to reach his peak performance and how rapidly it decreases over time. A study conducted by Blanch, Aluja and Cornadó in 2015, which briefly touched on this topic, was limited to tournaments played in Spain from 2010 to 2013 [1]. According to a meta-analysis from 2016, chess skill correlated positively and significantly with fluid reasoning, comprehension-knowledge, short-term memory, and processing speed [2]. If combined with a study on how do these cognitive functions progress or regress, their findings could be used to create a model that predicts performance at a given age (assuming other factors, such as a change of personal interests, are weighted in). There are, however, no significant data (that we know of) to test the model against.

2. Dataset

2.1 Gathering the input data

We have used data about players registered at Fédération Internationale des Échecs, commonly referred to as FIDE [3]. This organization is widely respected and it is reasonable to believe that the portion of both amateur and professional competitive over-the-board chess players registered there is representative for the purposes of this study. A list of all players can be downloaded from FIDE's website as a single compressed XML file [4].

Ratings for standard, rapid and blitz time-controls are provided individually for each month as separate files. We used TXT format in favor of XML, as until 05/2012, only the former is available. For standard chess, ratings are available since 01/2001. For blitz and rapid chess, ratings are available only since 07/2012 as FIDE did not recognize them prior to that date [5]. For some months, rating lists for neither of the time-controls have been published.

All ratings were harvested using a simple scraper on November 26, 2018. This yielded 284 files of total size of approximately 3,8 GB (in standard SI units). Appendix A. contains a listing of all files, including their sizes and SHA-256 checksums after unzipping.

2.2 Transforming the input data

A TXT format provided by FIDE is not suitable for automated processing as it differs from file to file greatly. Differencies were observed regarding presence of a header (missing in a few files), included columns, alignment, and more. Thus, it was decided to manually examine each of the files and record offsets where particular column begin. We were only interested in numerical values (namely: a FIDE ID, an Elo rating, and a number of the games played) and these could then be parsed easily using a simple algorithm.

A full list of players was fed into a stream-based XML SAX parser. FIDE ID, a name, a country (where a player is registered, not necessarily where he lives), a sex, year of birth, and current ratings in all three time formats were established as relevant for further processing.

All data was imported into an SQLite backend with the following schema:

` database: fide
    * table: players
        - id: text
        - name: text
        - country: text
        - sex: text
        - year_of_birth: int
        - rating_standard: int
        - rating_rapid: int
        - rating_blitz: int
    * table: ratings_(standard|rapid|blitz):
        - id: text
        - date: text
        - rating: int
        - games: int

2.3 Consistency verification

We successfully verified that:

Contrary to our expectations, however, we found instances of players who have had their ratings changed despite not playing any games (11 728 [3.66 %], 6 843 [4.71 %] and 8 725 [8.16 %] instances for standard, rapid and blitz rating category, respectively). We hypothesized this could be possibly explained by rating corrections, but upon closer examination, this turned out not to be the case at least for some players.

For example, we noticed a Mexican player Vicente Rendon Hidalgo (FIDE ID 5104432) whose standard rating supposedly dropped to 0 on 04/2009 after not playing any games. Such exact value is indeed presented in the original TXT file and thus not caused by a parse error. However, FIDE's website reports a rating of 1 419 and a total of 10 games played for the same month [6]. The same have been true for several other players we manually checked. In case of Leonid Gnezdov (FIDE ID 34174220), inconsitency was traced back to 09/2016. According to our database, he played 8 games and ended up with a rating of 1 271. Despite not playing any more games, his rating increased by one point to 1 272 on 11/2016. According to FIDE's website, he gained a rating of 1 272 after playing 8 games on 09/2016, and it remained the same ever since [7].

This raised serious questions about validity of the records and credibility of the FIDE itself. We subsequently attempted to harvest data directly from FIDE's website, but concluded that parsing such an HTML would be unreliable as well. Furthermore, it was estimated that retrieval of the pages (without processing) would take approximately 76.5 hours to complete on 100 Mbps bandwith employing 32 threads, which would exceed both the time and resources we had allotted for this project.

Thus, we have decided to proceed with the dataset nonetheless. Where relevant, separate data samples will be created.

3. Statistics

3.1 Registered players

There is a total of 821 762 registered players. As figure 3.1.1 shows, there is by far more male than female players.

Figure 3.1.1

Of all registered players, only 367 258 (44.69 %) are rated in at least one category. Portion of unrated women (68.09 %) is higher than portion of unrated men (53.21 %).

Cohort of so-called millenials [8] is by far the most widely represented. This is true for both men and women. It is also apparent that portion of women tends to increase in younger generations.

Figure 3.1.3 Figure 3.1.4 Figure 3.1.5

Some of the birth years are obviously invalid, however. South African player Henry Maketo (FIDE ID 14334070) is supposedly 118 years old, but played 10 games on 10/2018. Official FIDE website supports both of these claims [9, 10]. The oldest known and verified person in the world is a few years younger [11]. Unless chess-playing vampires are a real thing, this is likely a mistake. We also found 7 players with a birthday in 2018. Appart from obvious possibility of misinterpreting year of birth for date of registration when filling the sign-up form, these could also be newly born children registered by their ambitious parents (none of them played any rated games).

The following table shows 20 most widely represented countries:

RankCountryNumber of players
1.India81 069
2.Russia80 119
3.France54 856
4.Spain49 104
5.Germany36 634
6.Turkey35 216
7.Iran34 027
8.Poland24 527
9.Italy21 825
10.Greece21 213
11.USA16 123
12.Brazil15 157
13.Sri Lanka13 594
14.Czechia12 700
15.Ukraine12 136
16.Hungary10 444
17.Malaysia9 303
18.Serbia9 081
19.Argentina8 811
20.Columbia8 585

3.2 Rated players

Standard time-controls are by far the most popular. Only a small number of players focuse solely on faster time-controls. A portion of players rated simultaneously in all three categories is surprisingly the second largest.

Figure 3.2.1

In general, men are achieving higher ratings then women.

Figure 3.2.2a Figure 3.2.2b Figure 3.2.3a Figure 3.2.3b Figure 3.2.4a Figure 3.2.4b

3.3 Activity

In terms of activity, no significant difference was observed between men and women for most of the groups. The most active players, however, tend to be men.

First, we compared players by a total number of nonconsecutive months during which they played at least one game. It appears that once women start playing, they are slightly more likely than men to remain active for more than three months, but less likely to remain active for more than 24 months. It would be necessary to compare different cohorts to draw any direct conclusions.

Figure 3.3.1 Figure 3.3.2 Figure 3.3.3

We then compared players by a total number of games played.

Figure 3.3.4a Figure 3.3.4b Figure 3.3.5a Figure 3.3.5b Figure 3.3.6a Figure 3.3.6b

4. Method

For standard chess, we selected players who played at least once a year within a period of 10 years or more. This yielded a total of 48 193 players. For rapid and blitz chess, which are recognized by FIDE only since 07/2012 [5], we decreased the threshold to 5 years. This yielded a total of 9 048 and 6 654 rapid and blitz players, respectively. This dataset was, again, validated, in the following steps:

  1. Ensure that each player's rating match the value on the latest rating list where he is included
  2. Ensure that the rating did not change if no games had been played
  3. Ensure player's year of birth is known

We removed all players who did not conform to these criteria.

Total players (before)1. step2. step3. stepTotal players (after)
standard48 193025157347 369
rapid9 0480701038 875
blitz6 6540621675 966

Furthermore, we manually compared data related to the youngest and the oldest player from each dataset to a FIDE website. Neither ambiguities or clearly erroneous records were found. There are instances of surprisingly young players (for example, Anastasia Barysheva played a few games on July, 2006 and gained a rating of 1 923 while she was only 2 - 3 years old [12]), but chess prodigies [13, 14] do exist and we cannot simply exclude some players without a clear evidence that such data is invalid.

All further examinations were performed on these datasets.

5. Results

5.1 Development of Elo rating of long-term (≥ 10 years) standard chess players

5.1.1 All cohorts

Figure 5.1.1a Figure 5.1.1b

5.1.2 Cohorts <1920; 1930)

Figure 5.1.2a Figure 5.1.2b

5.1.3 Cohorts <1930; 1940)

Figure 5.1.3a Figure 5.1.3b

5.1.4 Cohorts <1940; 1950)

Figure 5.1.4a Figure 5.1.4b

5.1.5 Cohorts <1950; 1960)

Figure 5.1.5a Figure 5.1.5b

5.1.6 Cohorts <1960; 1970)

Figure 5.1.6a Figure 5.1.6b

5.1.7 Cohorts <1970; 1980)

Figure 5.1.7a Figure 5.1.7b

5.1.8 Cohorts <1980; 1990)

Figure 5.1.8a Figure 5.1.8b

5.1.9 Cohorts <1990; 2000)

Figure 5.1.9a Figure 5.1.9b

5.1.10 Cohorts <2000; 2010)

Figure 5.1.10a Figure 5.1.10b

5.2 Development of Elo rating of long-term (≥ 5 years) rapid chess players

5.2.1 All cohorts

Figure 5.2.1a Figure 5.2.1b

5.2.2 Cohorts <1920; 1930)

Insufficient data.

5.2.3 Cohorts <1930; 1940)

Figure 5.2.3a Figure 5.2.3b

5.2.4 Cohorts <1940; 1950)

Figure 5.2.4a Figure 5.2.4b

5.2.5 Cohorts <1950; 1960)

Figure 5.2.5a Figure 5.2.5b

5.2.6 Cohorts <1960; 1970)

Figure 5.2.6a Figure 5.2.6b

5.2.7 Cohorts <1970; 1980)

Figure 5.2.7a Figure 5.2.7b

5.2.8 Cohorts <1980; 1990)

Figure 5.2.8a Figure 5.2.8b

5.2.9 Cohorts <1990; 2000)

Figure 5.2.9a Figure 5.2.9b

5.2.10 Cohorts <2000; 2010)

Figure 5.2.10a Figure 5.2.10b

5.3 Development of Elo rating of long-term (≥ 5 years) blitz chess players

5.3.1 All cohorts

Figure 5.3.1a Figure 5.3.1b

5.3.2 Cohorts <1920; 1930)

Insufficient data.

5.3.3 Cohorts <1930; 1940)

Figure 5.3.3a Figure 5.3.3b

5.3.4 Cohorts <1940; 1950)

Figure 5.3.4a Figure 5.3.4b

5.3.5 Cohorts <1950; 1960)

Figure 5.3.5a Figure 5.3.5b

5.3.6 Cohorts <1960; 1970)

Figure 5.3.6a Figure 5.3.6b

5.3.7 Cohorts <1970; 1980)

Figure 5.3.7a Figure 5.3.7b

5.3.8 Cohorts <1980; 1990)

Figure 5.3.8a Figure 5.3.8b

5.3.9 Cohorts <1990; 2000)

Figure 5.3.9a Figure 5.3.9b

5.3.10 Cohorts <2000; 2010)

Figure 5.3.10a Figure 5.3.10b

6. Discussion

Given a reasonably large sample, data shows that men reach their peak rating at age of approximately 30. For women, it is about 5 years later. It came to our surprise that women seemed to get better at rapid in their 70s and at blitz in their 50s. Examining this closer cohort-by-cohort, a similar spike can be seen for cohorts 1940 - 1950 and 1960 - 1970 (for rapid) and 1930 - 1940, 1940 - 1950 and 1960 - 1970 (for blitz). Since there was only a small number of players in these cohorts and they generally started playing at older age on a more or less amateur level, such a spike is not really surprising.

Interestingly, both men and women show a decrease of rating between age of 5 and 10 on most of the charts. This could be possibly explained by competing against higher age groups rather than playing children tournaments. Such instances however raise a question about relevance of this analysis (there are many missing pieces of important information).

7. Limitations

Elo rating [15] is, by definition, a relative metric. It is used to compare strength of two players rather than to describe their absolute skill level. Decrease in an Elo rating therefore does not imply that player's absolute skill level decreased as well (altough it is one of the possible explanations). In general, younger generations are expected to be more intelligent (Flynn effect [16]) and have access to a better training (opening preparations, computer analysis, training websites with chess puzzles, and more). This is enough to cause a decrease in rating of older players, even if their absolute skill remained the same.

Furthemore, players can start losing their interest in chess over time. We tried to compensate for this by only including players who remained active for 10 or more years, but that might not be enough. Sadly, there was no better data available.

Originally, we aimed for a more in-depth analysis, but given that the dataset we had was found somewhat unreliable, we significantly limited our efforts as desired output could not be provided anyway.

8. Conclusion

Project is considered unsuccessful, because it did not manage to provide the desired outcome. Data provided by FIDE is too unreliable to be worth putting significantly more effort into analysing it. This is both because of mistakes or inconsitencies (unexpected changes in rating), invalid records (i.e. a player aged 118 years who simply continues playing instead of finally dying) or missing pieces of information (i.e. women can play women tournaments as well as open tournaments; this can have significant impacts on their rating). Some of these difficulties are to-be-expected, yet we underestimated them.

The paper is published as is (with major changes and/or adjustments) for tabloid-like purposes. It is not a scientific study and should not be treated as such.


