“When you search for stock market prices, you may see patterns…”
Muhla1/Getty Images
Flipping through the front page of a newspaper, one is greeted by a myriad of numbers—metrics about populations, lengths, areas, and more. If you were to extract these figures and compile them into a list, it might seem like a random assortment.
However, these figures are not as arbitrary as they may appear. In reality, the leading digit of many numbers, such as total revenues or building heights, tends to be predominantly the number 1. While true randomness would suggest that each digit has an equal chance of leading, the actual data shows that about one-third of the time, the first digit is a 1. The number 9, interestingly, appears as the leading digit in about 5% of cases, with other digits following such a trend.
This phenomenon is referred to as Benford’s Law, which illustrates the expected distribution of first digits within a dataset of a certain type—especially those spanning a wide, unspecified range. Although values like human height (where numbers are confined within a limited spectrum) or dates (which also have defined limits) don’t follow this law, others do.
Consider checking your bank balance, numbering a house, or analyzing stock prices (as displayed). Such numbers commonly exhibit a distribution with varied digit lengths. In neighborhoods with just a handful of houses, you might see a balance of numbers, whereas in larger towns, hundreds may share similar leading digits.
Picture a street hosting nine houses. The proportion of leading digits resembles an even split among the nine options. Conversely, on a street with 19 houses, a larger fraction—often over fifty percent—will begin with 1. As the housing number increases, this pattern persists. With 100 houses, you would observe a fairly uniform distribution across all digits, yet with 200 occupants, once again, more than half will typically start with 1.
Due to the diverse origins of data in real-world collections, the average likelihood of seeing numbers that start with 1 fluctuates between these two extremes. Similar calculations can be made for other digits, resulting in an overall frequency distribution observable in extensive datasets.
This characteristic is particularly useful in identifying potential data fabrication. When analyzing a company’s financial records, a Benford-like distribution is expected in their sales figures. However, when someone generates random numbers, the frequency distribution of the leading digits lacks a defined curve. This principle serves as one of the many tools forensic accountants employ to root out dubious activities.
The next time you examine your bank statement or compare river lengths, take note of how often those numbers start with 1.
Katie Steckles is a mathematician, lecturer, YouTuber, and author based in Manchester, UK. She also contributes advice to Brent Wister, a puzzle column for New Scientist. Follow her @stecks
For additional projects, please visit newscientist.com/maker
topic:
Source: www.newscientist.com
