Read the fine print: IP Statistics

Lies, damn lies and statistics.  This Kat's first appearance on this blog was in 2007 when she queried the statement, "It has been said that 80% of the information found in patents cannot be found anywhere else."  It was an oft-used number with no citation.  We never really found one.

The basic problem hasn't gone away.  IP statistics are thrown about on a regular basis without sufficient caveats, out of context and, in some cases, stats are so poorly calculated they should never leave the back of the envelope. Counterfeiting and piracy statistics used in early in copyright debates are an excellent example (more here.)

Things are improving, but there is a long way to go. Even the Daily Mail finds misleading stats and graphs funny.

All is not lost.  The PatStat (IP Statistics for Decision Makers) annual conference discusses these challenges. The UK IPO has produced a handy guide on the use of patent data, which merits a post by itself. (IPKat discussions on patent stats here and here.)

The folks at CIGI Waterloo have just published a paper looking at stats in cybercrime entitled, "Global Cyberspace is Safer than You Think: Real Trends in Cybercrime" by Eric Jardine. Eric examines recent stats in cybercrime and seeks to normalise them. In this case, normalisation is done by adjusting absolute figures in the form of totals (e.g. 1,000 attacks per year), or growth, (e.g. 50% more attacks in 2014 than 2013), for the growth of the internet.

To illustrate normalisation, consider the following, totally made up example: 30% more thefts of iPhone 6 in 2015!!!  Without context, rather shocking.  However, the iPhone 6 was only introduced September 2014. There are vastly more iPhones on the market in 2015, and the risk of theft may have actually decreased. Accounting for the growth in number of iPhone 6s would normalise these figures.

Eric investigated 13 absolute figures on cybercrime and found his normalised stats show a much less scary situation:
  • in 6 cases, the absolute figures showed the situation getting worse whereas his normalised figures showed the situation actually improving
  • in 6 cases, both the absolute and normalised figures show the situation getting better, but the normalised figures show improvement happening sooner and faster
  • in 1 case, both the absolute and normalised figures show the situation getting worse, but the normalised figures show this deterioration happening slower
So, good news!  Cyberspace is safer than you think.

Tips for reading stats:
  1. Check citations. Citations are like PDO.  Know where your calculations come from.
  2. Check context.  Read the fine print. Most stats come with caveats.
  3. Check methods. Read the ingredients. Your stats should have good data and processes.
  4. Check with an expert.  When in doubt, ask your friendly statistician or economist. 
Examples of egregious manipulation of data this Kat has seen in her career:
  • exploiting Excel's rounding to display 9.45% as 9.5% (0.05% can make a huge difference)
  • 'massaging' data so three firms had a market majority 
  • a protest described as '100 people' by protestors, 'approximately 50' by police, and '20' by the organisation being protested
And remember, 78% of stats are made up on the spot, the other 29% are drivel. Check out Full Fact and the BBC's More or Less for good explanations and debunking of stats.