Presentation on theme: "A problem with the Long Tail"— Presentation transcript:
1A problem with the Long Tail (Read the text in the notes panel at the bottom for narration)A problem with the Long Tail(Although an amazing number of things are powerlaws, a lot of things aren’t. How can you tell the difference?)Please read these notes to get a sense of what I’d be saying while talking through these slides. Needless to say, I talk a lot more than I’m writing here, but you’ll get the main points.
2A powerlawThis is just a conceptual example of the real-world data sets that we see so commonly in Long Tail research. The problem is that they all look the same. So we plot them another way….
3Shown another way…like this, log-log. Powerlaws are all straight lines on a log-log scale. Some have steeper slope, and some are shallower, but all are a straight line….
4WTF?….unless they’re not. This is US domestic Box Office revenues for a three-year period: What happens at rank 300, where the revenues fall off a cliff? Do the movies get worse? Do they switch into Spanish? Kevin Costner? The answer is none of those. What happens is that they run out of screens. The carrying capacity of the US megaplex network is about 100 films a year. By contrast, each year there are about 13,000 films shown in film festivals each year. Only a tiny fraction of them are picked up for commercial distribution, and only a tiny fraction of those make it into the megaplex. So there is an ample supply of films and, if we believe that the powerlaw describes the “natural shape” of the market, plenty of demand. What’s getting in the way is simply a bottleneck in distribution, a scarcity effect that’s distorting the marketplace.
5The Missing MarketPresumably if you had an infinite number of screens on which to show films, the demand would follow the powerlaw shape. I call the pink part above the “dark matter of the marketplace”—it’s the latent demand for products that’s suppressed by inefficiencies in distribution. Remove the bottlenecks and you can tap that demand—that, in a nutshell, is the theory of the Long Tail.
6Source: Morris Rosenthal Here’s an example, using book sales on Amazon. The chart comes from Morris Rosenthal of Foner Books (http://www.fonerbooks.com/surfing.htm). The dotted line reflects 2005 sales; the red line is What you can see, aside from the fact that it’s becoming more niche-centric (lower-ranked books are selling better than before, compared to the hits), is that in both cases the lines falls off the straight-line trajectory around rank 200,000 or 300,000. What might be causing this?Source: Morris Rosenthal
7Here’s a possible explanation: an example of a book listing (mine, as it happens). Note that it has a cover, an editorial review, a blurb from some guy named Eric Schmidt and fully populated metadata. Perhaps because of all that, its sales rank is #4.
8As it happens, there’s another book called “The Long, Long Tail”, which came out in By comparison, its listing is sadly bare. There’s no cover, no review, no blurb, and it’s missing several metadata fields. No wonder its sales rank is 2,63,800. Of course it had a cover when it was published, and presumably it was reviewed somewhere and had a page count. But all that information is now lost. Research suggests that shoppers are far more likely to buy if they can see a cover and read a bit about the book, and are even more inclined to to purchase if they can see a few sample pages.This is simply an information problem—all that information exists and lies in some archives somewhere; it only needs to be digitized and combined with the book information to create a fully-populated product listing that would sell far better than this one. That’s a business opportunity for someone, whether Amazon or Google or some other company. It’s only a matter of time before someone does it. And when they do it for all books, presumably the Amazon sales cover will follow the straight powerlaw line far further out into its inventory than it now does.
9The problemBut here’s the problem. I’ve been assuming that the powerlaw is the “natural” shape of all these markets. But there are other distributions that start off straight on a log-log chart, then fall off the line not necessarily because of a bottleneck effect but because they’re simply not powerlaws. The best example of this is the “lognormal” distribution, which often stays straight for several orders of magnitude before sloping downward. How to tell whether a market that falls off the line is a natural powerlaw shape distorted by a removable bottleneck or a natural lognormal market that will look like that no matter what you do? In markets such as film or television archives, this could be a billion-dollar question.
10This is taken from a fantastic paper called Log-normal Distributions across the Sciences: Keys and Clues, E. Limpert, W. Stahel and M. Abbt,. BioScience, 51 (5), p. 341–352 (2001) (http://stat.ethz.ch/~stahel/lognormal/bioscience.pdf)Anotther great paper is Mitzenmacher, M. (2003). "A brief history of generative models for power law and lognormal distributions". Internet Mathematics 1: 226–251. (http://www.internetmathematics.org/volumes/1/2/pp226_251.pdf)
11Here’s another grab from that paper Here’s another grab from that paper. It shows how a skewed version of the pin matrix that makes normal distributions in a typical Monte Carlo simulation can create a lognormal distribution.
12Examples of phenomena that follow powerlaw distributions Species distribution among plantsSquare footage of Alaskan Inuit homesForest fires, by sizeCities, by populationDeath toll in warsEarthquakesWord useNumber of papers published by scientistsHere are some well-known examples of natural phenomena that obey a powerlaw distribution.
13Examples of phenomena that follow lognormal distributions Concentration of elements in the earth's crustLatent periods of infectious diseasesSurvival times after cancer diagnosisDistribution of chemicals in the environment (including pollution)Species distribution among moths and diatomsCrystals in ice creamLength of words in spoken conversationAnd these are some that obey the lognormal distribution. Note that while words use in English distribute by a powerlaw (called Zipf’s Law), the length of those words distributes as a lognormal curve. Species of plants distribute as a powerlaw, while species of moths and diatoms distribute as a lognormal. Weird. What’s going on?One answer is that we may have been wrongly calling some lognormal distibutions “powerlaws”. After all, the two curves can looks the same for a few orders of magnitude. Some of the data sets in the previous slide (such as papers per scientist) don’t extend to large enough numbers to be able to tell one curve from the other.
14What’s the difference?Powerlaws: created by “preferential attachment” in scale-free networks.So a better way to tell the two curves apart is to have a better sense of the underlying forces that created them in the first place. In general, it all boils down to network effects of one sort or another. Powerlaws tend to be created by “preferential attachements” in “scale-free” networks, which is to say that some nodes are more connected than others (like Malcolm Gladwell’s “mavens” in The Tipping Point)
15Lognormal distributions: created by "proportionate effects" (like growing by a proportion of your weight).Lognormal distributions, on the other hand, tend to be created by more linear phenomena, such as the geometric growth of simple ratios.
16QuestionAssuming it all comes down to network effects, how can you predict whether the “natural shape” (free of bottlenecks and other scarcity distortions) is a powerlaw or a lognormal distribution?So here’s the Big Question. Discuss.