Over 65 years ago the Rand Corporation built an "electronic roulette wheel" to generate random numbers. It recorded the binary numbers, converted them to decimal numbers, and published a book, A Million Random Digits (download a free electronic version) which, according to the book's foreword, has "become a standard reference in engineering and econometrics textbooks and [has] been widely used in gaming and simulations that employ Monte Carlo trials. Still the largest known source of random digits and normal deviates, the work is routinely used by statisticians, physicists, poll-takers, market analysts, lottery administrators, and quality control engineers."
Recently, a Rand software engineer named Gary Briggs analyzed the numbers in the book and discovered that while they are random, the order in which the digits are printed diverge from statistical probability.
From The Wall Street Journal:
In a group of 50,000 random digits, mathematicians would expect 4,050 sequences of two identical digits in a row—77, for instance. They would predict 405 spots with three identical digits in a row, such as 555. There would be about 40 cases of four identical digits in a row. And four or five places with five identical digits together.
His results were "soul crushing," Mr. Briggs says. The book contains 48 runs of four digits instead of 40, an astoundingly wide divergence in statistical terms that eluded any explanation he could conjure.
It's not that the digits in the book aren't random, he says. They just don't seem to be exactly the right digits in exactly the right order, given the impulses the Douglas machine generated.
The comments in the WSJ article are interesting. Example:
Burning with curiosity, I downloaded a million digits from Rand's web page https://www.rand.org/pubs/monograph_reports/MR1418.html .
I then used a short Matlab program on a good desktop computer to compare statistics between the Rand data and 10 billion random digits computed using Matlab's modern random number generator. Define N as a random variable representing the number of runs of 4 equal digits in 50,000 digits. The Rand data is 20 samples of N. My 10 billion digits are 200,000 samples of N. The average of N is very close to 50 with standard deviation 7.8 for the 10 billion digits. The Rand data yields an N with average about 49.1 with standard deviation 7.2. The Rand sample standard deviation (for 20 samples) is about 1.6. Thus, I conclude that the Rand data has FEWER runs of 4 identical digits that expected with an error of about half of a sample standard deviation. Thus, the Rand data is fine for this statistic because this is well within expectation.