How to cheat with Frank Benford

How to cheat with Frank Benford Pick a number at random from the universe. Not just from inside your head. Open a page of the financial times or look up the size of a planet; convert you height to cubits or measure the weight of your favourite book. Something like that.

Don't actually do it, it's hypothetical. But ask yourself a question. What are the chances of that number starting with a 1? What are the chances or it starting with a 7? What are the chances of it starting with any particular one of the 9 possible starting digits (1, 2, 3, 4, 5, 6, 7, 8 or 9. We're not counting 0, as in 0.5 because it's not the first significant digit.)?

Well you're choosing at random so the chance of your number starting with any one of those 9 digits must be 1 in 9. That's about 11%.

The surprising result of Frank Benford's work is that the number you just plucked from the universe is far more likely to start with a 1 (about 30.1%) and very unlikely to start with a 9 (about 4.6%). And there's a sliding scale for the digits in between.

You can test it yourself. Get a copy of the financial times. Write down every number you see on the front page. These numbers could be stock quotes, dates, ages, populations, profits. Anything (don't include telephone numbers though because they are not proper numbers in as much as they are not expressing an amount of anything).

I did it on 6th March 2007. I'm telling you the date so you can fact-check!

How to cheat with Frank Benford

How to cheat with Frank Benford

These are the numbers I got:

22
2008
1.50
200
11
2
22
3
50
17
20
2
8
12
19
9.6
26
8
8
25
2009
6
5000
19
18.4
9.4
13
6.1
25
26
70
4.2
3.2
700
6
9
20
20
10
16
20
12284.30
2299.78
1342.53
1330.07
3778.21
5932.2
3037.38
4858.85
6904.85
13688.28
23623.00
1.15
1.17
1.29
69
53
65
64
96
7
2.84
14
41
1.481
1.963
755
107.28
210.73
82.50
1.619
675
509
1.325
159.02
95.50
101.50
2.146
97.77
130.34
100.05
100.17
97.20
101.14
3
2.15
4.38
5.62
3.77
4.75
3.99
1.48
4,55
3.33
2.94
2.19
4.36
5.62
15
5
5
10
5
6
5
1
36622
2.20

Now count how many of these numbers start with a 1, a 2, a 3 and so on. Here are the results I got plotted on a graph as percentages. The horizontal line shows my (and hopefully your) initial guess of 1 in 9 or about 11% for all digits:

How to cheat with Frank Benford

So it already looks like Benford might be right. 1s are appearing far more often than 9s. Benford's law predicts that if I keep going through the Financial Times and adding more numbers and if I do it every day then the graph will start to look more and more like this:

How to cheat with Frank Benford

With all the random fluctuations ironed out. Is this just a quirk of the Financial Times? I did the same analysis for the population sizes of all the countries in the world and got this graph:

How to cheat with Frank Benford

Analysing countries by land area in kilometres squared gives this graph:

How to cheat with Frank Benford

Benford himself analysed various groups of things like heights of buildings and areas of rivers with similar results.

What's going on? The thorough explanation involves something called scale invariance and is a bit complicated. But there's an easy way to think about it...

What if you were picking a raffle ticket from a raffle instead of a number from the universe? What are the chances of the number starting with a 1 in that case? Well it depends on the size of the raffle.

Suppose there are only two tickets in this raffle numbered 1 and 2.

How to cheat with Frank Benford

Then the chances of picking a ticket starting with a 1 are 50:50. If there are three tickets in the raffle

How to cheat with Frank Benford

the chances are 1 in 3 and so on. With 9 tickets in the raffle numbered 1 to 9

How to cheat with Frank Benford

the chance of you picking the only ticket starting with a 1 is now what we thought it might be intuitively, 1 in 9.

But now add one more ticket to the raffle. This will have "10" printed on it.

How to cheat with Frank Benford

Now there are 2 tickets out of 10 that start with a one and our chances jump back up to 1 in 5!

And the chances just get better the more tickets are added up to 19 tickets. Then back down again form 20 tickets all the way to up to 99. Then the 100s improve matters again and so on.

So the chance of picking a raffle ticket starting with a 1 fluctuates depending on the size of the raffle.

How to cheat with Frank Benford

But picking a number at random for the universe or from the financial times is like picking a ticket from a raffle you don't know the size of. If you don't know the size of the raffle then to work out the chance of your ticket staring with a 1 you need to average to probability from all possible raffles. That's the horizontal line on this graph:

How to cheat with Frank Benford

That average turns out to be about 30.1%. There's a formula for it which goes like this. The probability, P, of a number chosen at random from the universe starting with a particular  digit, d, is:

How to cheat with Frank Benford

That's Benford's Law and it's used to by Forensic Accountants to detect tax fraud. It can also be used to test the results of academic research and even look for evidence of election rigging.

So if you're going to cheat. Keep Benford's law to hand.

Tags: , , ,

  • Matt Hope

    Nice, I loved the solids of constant curvature, the Gomboc is another interesting idea in a similar vein: http://www.wired.com/gadgetlab/2008/02/the-gomboc-the/

    perhaps you should consider seeing if ars is interested in you: http://arstechnica.com/staff/palatine/2009/11/want-to-freelance-for-ars-technica.ars

  • Matt Hope

    Nice, I loved the solids of constant curvature, the Gomboc is another interesting idea in a similar vein: http://www.wired.com/gadgetlab/2008/02/the-gomboc-the/

    perhaps you should consider seeing if ars is interested in you: http://arstechnica.com/staff/palatine/2009/11/want-to-freelance-for-ars-technica.ars

  • http://www.stevemould.com admin

    Thanks Mat. I really want one of those Gombocs. Hmmm, €149, can I justify it? Very kind to suggest Ars. Might as well give it a go.

  • http://www.stevemould.com admin

    Thanks Mat. I really want one of those Gombocs. Hmmm, €149, can I justify it? Very kind to suggest Ars. Might as well give it a go.

  • FelixCQ

    I think the formula for P is actually log_10(1+1/d).

    And it works in any other base, too. Say you’re looking at binary numbers, then the probability of a number starting with a 1 is P(1) = log_2(1+1/1) = 1 ! ;)

  • Anonymous

    Hey FelixThanks for that. You’re absolutely right. I’ve fixed it now. And you’re right, you just have to change to what base the log is taken to match the base you’re counting it. Fascinating stuff.

  • AndyK

    FelixCQ is right, the formula is wrong and is supposed to be log_10(1+1/d). http://en.wikipedia.org/wiki/Benford%27s_law

  • http://www.lifehacker.com.au/2010/10/use-benfords-law-to-catch-or-pull-off-fake-numbers/ Use Benford’s Law To Catch (Or Pull Off) Fake Numbers | Lifehacker Australia

    [...] How to cheat with Frank Benford [SHIFT_beep via #tips] Tagged:evil weeknumbersprobability [...]

  • Anonymous

    Thanks Andy. Absolutely. Quite a big typo that!

  • Anonymous

    I’m not sure your example of the Financial Times is “correct” – can you really consider the numbers that appear on the front page to be random? I could chose to look at the flier from my local grocery, for example, and we’d find a very different distribution, for example. (lots of 9s, I’d wager!) I think the point here is that when someone fakes numbers, they assume a completely random system – and that’s rarely the case in reality.

  • Anonymous

    The reason the Financial Times works well is because I’m pulling numbers from lots of DIFFERENT distributions (page numbers, dates, stack quotes etc.). This always shows a good match with Benford. Your grocery flayer will mainly have grocery prices all of a similar order of magnitude.

    Benford’s law does work for single distributions so long as the distribution is spread over several orders of magnitude. And assuming that there are no funny quirks like a bias for things priced at 99 cents etc. And even then Benford’s law is a great way of discovering these biases.

  • valve

    I bet most of them don’t start with 9s…

    We are talking about the probability of the number starting with a specific single digit, not of it containing the digit.

  • http://blog.patelive.com/2010/10/26/how-to-throw-off-forensic-accountants How to throw off forensic accountants… | Pate LIVE!

    [...] How to cheat with Frank Benford | SHIFT_beep Pick a number at random from the universe. Not just from inside your head. Open a page of the financial times or look up the size of a planet; convert you height to cubits or measure the weight of your favourite book. Something like that. [...]

  • dgb

    It seems that the key thing to note about this is that it is not pertaining to random numbers, but rather real-life measurements. This would then make sense, because most things in real-life we measure by counting up, and therefore, the higher the numbers get, the less likely they will be used. If we measured a subtractive value, like the distance left to reach something, we should see the opposite curve. This is why phone numbers (something truly random, or at least more random) won’t work.

  • Anonymous

    I think you’ve got to the heart of it there. It’s about real world numbers and how the higher up you go the less often they are used. A random number generator in a computer on the other hand will give a nice even distribution of leading digits.

    And phone numbers are like random numbers because they’re not really numbers! That is to say the actual value isn’t a measure of anything. It’s meaningless. Even more so in the UK where all numbers start with a zero!

  • http://www.lukasvermeer.nl Lukas Vermeer

    The catch here is indeed that you want measurements that are ‘natural’ in the sense that they appear in ordinary life and are not pre-selected by humans. The Times does seem like an unlikely candidate to me, because the numbers are ‘selected’ for journalistic value.

    I applied Benfords law to my own bank statements a while back and wrote on my blog about the experiment.

    http://lukasvermeer.wordpress.com/2010/05/24/benfords-law/

    I later found that my own behavior (at ATM machines I often withdraw 20 euros) was the root cause for an odd spike of twos in the data.

    http://lukasvermeer.wordpress.com/2010/06/16/benfords-law-revised/

    At the bottom of the first post there is a link to a test page I made where you can easily compare Benfords law to your own data. Check it out, and let me know what you find.

  • Justzisguy

    This is also why your suggestion to “convert you height to cubits” [sic] would not work. Human heights do not span orders of magnitude. You’re much more likely to get 3 cubits.

    There is a non-zero chance that this would lead to 1 cubit: there are people less than 91 cm in stature according to http://en.wikipedia.org/wiki/List_of_shortest_people
    but it is exceedingly small.

    Planet size spans orders of magnitude. Book weights would too, but there’s probably a reasonably small std. deviation.

    And adding multiple distributions that span less than an order of magnitude together would never lead you to Benford’s law.

  • Anonymous

    I disagree. If you collected loads of individual measurements from different distributions at random (like your height in cubits and so on. But just one of each thing!) that collection of disparate things absolutely WOULD follow Benford’s law.

    I do agree, that he distribution of human heights in cubits on it’s own would not follow Benford’s Law.

  • Anonymous

    That’s awesome! I’ll have a go.

  • http://followmehere.com/2010/10/26/if-youre-going-to-cheat/ If You’re Going to Cheat… | Follow Me Here…

    [...] …Keep Benford’s Law in mind. [...]

  • http://www.johntracy.com/blog/2010/10/26/cheating-numbers-with-1/ Cheating Numbers with 1 « /blog {words}

    [...] How to cheat with Frank Benford | SHIFT_beep: [...]

  • http://www.moldremoval.net Mold testing service

    Nice, Very good comments and even the post. Thanks

  • Lukas Vermeer

    Neat! No problemo. Be sure to leave your findings in the comments.

  • Ivo

    http://pastebin.com/TRv8xqvm
    little perl proof of concept :)

  • Lukas Vermeer

    Looks like you’re using randomly generated numbers, not ‘real world’ numbers. What are you trying to prove?

    The point here is that numbers from the real world are _not_ randomly distributed.

  • Justzisguy

    I insist that you need the subdistributions to straddle orders of magnitude. It is trivial to demonstrate this. If human heights in cubits have much less than 10% chance of starting with a 1 and you combine that with some other tiny distribution, such as imdb star rankings (0-10, but virtually no movies score 10, and relatively few score 1.something), there’s simply no way that you’ll get 30%. It is only when one of the subdistributions has a 30% chance or larger of starting with 1 that the full distribution might obey Benford’s law. That requires either choosing many subsets that are artificially weighted to 1 (human height in meters?) or that already follow Benford’s law because they span over an order of magnitude.

  • Justzisguy

    I insist that you need the subdistributions to straddle orders of magnitude. It is trivial to demonstrate this. If human heights in cubits have much less than 10% chance of starting with a 1 and you combine that with some other tiny distribution, such as imdb star rankings (0-10, but virtually no movies score 10, and relatively few score 1.something), there’s simply no way that you’ll get 30%. It is only when one of the subdistributions has a 30% chance or larger of starting with 1 that the full distribution might obey Benford’s law. That requires either choosing many subsets that are artificially weighted to 1 (human height in meters?) or that already follow Benford’s law because they span over an order of magnitude.

  • Ivo

    No, it’s not about the randomness, it’s about the range of choice. If you pick at random from 1 to 99 then then the distribution is the same, but if you pick from 1 to 199 then it’s a different picture. And the real world range always varies, so on average the numbers starting with 1 are 30% more.

  • Anonymous

    I still disagree! For your demonstration you’ve cherry picked two distributions. The distributions need to be picked at random. See this para from wikipedia:

    http://en.wikipedia.org/wiki/Benford%27s_law#Multiple_probability_distributions

  • Justzisguy

    The only “cherry picking” I’ve done is exactly what I said: to select distributions that do not span an order of magnitude (biased in scale/base).

    Read Hill’s paper linked from the Wikipedia article. Your distribution needs to be unbiased in scale and base. Real life distributions and sampling does not necessarily satisfy this requirement. Combining multiple distributions that are less than one order of magnitude wide will certainly not!

  • Anonymous

    You also cherry picked distributions that under represent 1s in the leading digit position. If you truly pick the distributions at random you would find that more of the distributions were biased towards having 1s as the leading digit and fewer biased towards having 9s as the leading digit in such a way as to lead to Benford’s Law with a large enough sample.

    A “mixed bag” of items picked from many different distributions IS unbiased in scale even if the underlying distributions are not.

  • anon

    I don’t see how Benford’s law applies to raffles – surely a raffle is not on a log scale? If I have for example 1,000 raffle tickets in total, then the chance is not going to be 30% of any one randomly selected ticket starting with a 1.

  • Anonymous

    You’re right, Benford’s Law does not apply to a raffle you know the size of. But what if you didn’t know the size? Assuming it could be any size you’d have to average the probability of the leading digit being 1 over all possible sizes. That gives you the 30% of Benford’s Law.

  • anon

    I still can’t see it. Surely the distribution of raffles is a linear thing, not like, say, a stock price, which could be exponential?

    I mean, if I take every raffle size from 1 to infinity, there will be as many tickets that begin with 9 as begin with 1 or any other number, right?

  • Anonymous

    ” I mean, if I take every raffle size from 1 to infinity, there will
    be as many tickets that begin with 9 as begin with 1 or any other
    number, right?”

    Nope ;) That’s what the zig zag graph tries it illustrate. Pick an
    example from those 1 to infinity raffles and look at the probability
    of the winning ticket starting with a 1. How about a raffle that has
    20 tickets. Half of them start with a 1! As the size of the raffle
    increases that chance goes up and down (the zig zag graph). The
    average of those probabilities is 30%


    07946 424 688
    http://www.stevemould.com
    http://blog.stevemould.com
    http://twitter.com/moulds

  • Sergy

    Day number and year number are meaningless numbers too. You should not include them in your example ;) .

  • Sergy

    Not meaningless but limited. Therefore not appliable.

  • Anonymous

    I think you’re saying that because dates don’t span several orders of magnitude they don’t follow Benford’s Law. And you’re right, they probably don’t. But the great thing is, if you’re choosing lots of different things from different distributions, those distributions you’re picking from don’t have to follow Benford’s Law. See this paragraph on wikipedia:

    http://en.wikipedia.org/wiki/Benford%27s_law#Multiple_probability_distributions

    And this discussion on age in these comments:

    http://blog.stevemould.com/how-to-cheat-with-frank-benford/#comment-90473715

  • http://blogs.bnet.com/businesstips/?p=9452 Detect Faked Data at Your Next Presentation Using Simple Science | BNET

    [...] as it sounds; it all makes sense when you read the very accessible and plain-English article at SHIFT_beep.  Check it out and it you’ll definitely feel smarter.   [via [...]

  • http://www.mostlymaths.net/ Ruben Berenguel

    I had heard of Benford’s law several times (since the times of my maths degree), but never gave it more than one or two looks (I have always considered me quite “unfond” of probability!). But as you wrote it down, it is wonderful. I think far clearer than my post And e Appears from Nowhere, where the average of some random throws result in e (with a long, long number of runs).

    Thanks for writing it so clearly,

    Ruben

  • http://www.moldremoval.net Mold Removal Cape Coral

    Nice, I loved the solids of constant curvature

  • http://blog.stevemould.com Steve Mould

    This looks like a new kind of spam! You’ve just copied the first few words from the first comment on this blog. The link to your website is in your profile. I get a lot of spam from companies related to mould (mould removal, plastic injection moulding) because of my surname. They search the internet for things related to mould, find my blog and leave a comment or send me an email! It’s pretty funny really.

  • Internalrevenue

    This is well known. The occurence of numbers differs by what they are being used for. So numbers in tax returns have a frequency distribution. In fact one statistician set up rules for the US Revenue dept. to audit tax returns. When you fake numerical entries it is difficult to replicate the frequencies of numbers that naturally occur in tax returns. You tend to think they are random and your numbers just don’t look like they should. So they audit you.

    This is not new. . .

  • http://blog.stevemould.com Steve Mould

    It certainly isn’t new, no. Frank Benford died in 1948! I hope my post doesn’t sound like it’s declaring something new!

  • DampeS8N

    That’s not true. If the raffle is of a finite, but unknown, size, then the data will be evenly distributed. It is only when the max size of the raffle is infinite, that your raffle metaphor works. Not for an unknown size.

    Consider the results of this PHP snippet. For any maximum number, the distribution will be even.

    $i = 0;
    $spread = array();
    while($i < 1000000)
    {
    $n = rand(1,10000000000000000)."";
    $spread[$n[0]]++;
    $i++;
    }
    ksort($spread);
    var_dump($spread);

  • http://blog.stevemould.com Steve Mould

    This is brilliant! Love the php. But what your php does is find the distribution of leading digits for a fixed raffle size of 10000000000000000. If you want to find the distribution when you don’t know the size UP TO 10000000000000000 you’ve got to modify the rand line to be like this:

    $n = rand(1,rand(1,10000000000000000)).”";

    The nested rand is giving you a random raffle size to choose from up to the maximum.

    When the max size of the raffle is infinite, the distribution fits Beford exactly. But if it’s finite, so long as it spans several orders of magnitude it’ll fit very closely. And of course the more orders of magnitude you span the better the fit gets.

  • http://www.moldtestingservice.com Mold Removal

    I just stumbled upon your site and wanted to say that I have really enjoyed reading your opinions.

  • John Henschen

    All this discussion is fine and dandy, but you have yet to touch upon the most important aspect. How will this help me win the lottery? ;)

  • http://blog.stevemould.com Steve Mould

    Finally someone asking the important questions!

  • Ewerton Miglioranza

    Just in case someone else want to take a look: http://testingbenfordslaw.com/
    Great post dude!

  • http://blog.stevemould.com Steve Mould

    That’s awesome. Thanks Ewerton

  • http://digg.com/nerojohn91 Juliet Meeks

    Thank you a lot for your interesting article. I have been looking for such message for a very long time.

  • http://blog.dancingdeer.com/index.php/member/178533/ Dana Aquirre

    Thanks for giving such kind of great information .

  • http://youjustgetme.com/index.php?page=view_profile&id=87713 Robin Nelson

    Interesting blog. It is always good if you can provide additional information about this. Thanks a lot!