A problem with the Long Tail

Name: A problem with the Long Tail
Uploaded: 2017-11-03T04:04:07+00:00
Duration: PTM9S3
Channel: Caroline Myers
Description: A problem with the Long Tail

A problem with the Long Tail
(Read the text in the notes panel at the bottom for narration) A problem with the Long Tail (Although an amazing number of things are powerlaws, a lot of things aren’t. How can you tell the difference?) Please read these notes to get a sense of what I’d be saying while talking through these slides. Needless to say, I talk a lot more than I’m writing here, but you’ll get the main points.

A powerlaw This is just a conceptual example of the real-world data sets that we see so commonly in Long Tail research. The problem is that they all look the same. So we plot them another way….

Shown another way …like this, log-log. Powerlaws are all straight lines on a log-log scale. Some have steeper slope, and some are shallower, but all are a straight line….

WTF? ….unless they’re not. This is US domestic Box Office revenues for a three-year period: What happens at rank 300, where the revenues fall off a cliff? Do the movies get worse? Do they switch into Spanish? Kevin Costner? The answer is none of those. What happens is that they run out of screens. The carrying capacity of the US megaplex network is about 100 films a year. By contrast, each year there are about 13,000 films shown in film festivals each year. Only a tiny fraction of them are picked up for commercial distribution, and only a tiny fraction of those make it into the megaplex. So there is an ample supply of films and, if we believe that the powerlaw describes the “natural shape” of the market, plenty of demand. What’s getting in the way is simply a bottleneck in distribution, a scarcity effect that’s distorting the marketplace.

The Missing Market Presumably if you had an infinite number of screens on which to show films, the demand would follow the powerlaw shape. I call the pink part above the “dark matter of the marketplace”—it’s the latent demand for products that’s suppressed by inefficiencies in distribution. Remove the bottlenecks and you can tap that demand—that, in a nutshell, is the theory of the Long Tail.

Source: Morris Rosenthal
Here’s an example, using book sales on Amazon. The chart comes from Morris Rosenthal of Foner Books ( The dotted line reflects 2005 sales; the red line is What you can see, aside from the fact that it’s becoming more niche-centric (lower-ranked books are selling better than before, compared to the hits), is that in both cases the lines falls off the straight-line trajectory around rank 200,000 or 300,000. What might be causing this? Source: Morris Rosenthal

Here’s a possible explanation: an example of a book listing (mine, as it happens). Note that it has a cover, an editorial review, a blurb from some guy named Eric Schmidt and fully populated metadata. Perhaps because of all that, its sales rank is #4.

As it happens, there’s another book called “The Long, Long Tail”, which came out in By comparison, its listing is sadly bare. There’s no cover, no review, no blurb, and it’s missing several metadata fields. No wonder its sales rank is 2,63,800. Of course it had a cover when it was published, and presumably it was reviewed somewhere and had a page count. But all that information is now lost. Research suggests that shoppers are far more likely to buy if they can see a cover and read a bit about the book, and are even more inclined to to purchase if they can see a few sample pages. This is simply an information problem—all that information exists and lies in some archives somewhere; it only needs to be digitized and combined with the book information to create a fully-populated product listing that would sell far better than this one. That’s a business opportunity for someone, whether Amazon or Google or some other company. It’s only a matter of time before someone does it. And when they do it for all books, presumably the Amazon sales cover will follow the straight powerlaw line far further out into its inventory than it now does.

The problem But here’s the problem. I’ve been assuming that the powerlaw is the “natural” shape of all these markets. But there are other distributions that start off straight on a log-log chart, then fall off the line not necessarily because of a bottleneck effect but because they’re simply not powerlaws. The best example of this is the “lognormal” distribution, which often stays straight for several orders of magnitude before sloping downward. How to tell whether a market that falls off the line is a natural powerlaw shape distorted by a removable bottleneck or a natural lognormal market that will look like that no matter what you do? In markets such as film or television archives, this could be a billion-dollar question.

This is taken from a fantastic paper called Log-normal Distributions across the Sciences: Keys and Clues, E. Limpert, W. Stahel and M. Abbt,. BioScience, 51 (5), p. 341–352 (2001) ( Anotther great paper is Mitzenmacher, M. (2003). "A brief history of generative models for power law and lognormal distributions". Internet Mathematics 1: 226–251. (

Here’s another grab from that paper
Here’s another grab from that paper. It shows how a skewed version of the pin matrix that makes normal distributions in a typical Monte Carlo simulation can create a lognormal distribution.

Examples of phenomena that follow powerlaw distributions
Species distribution among plants Square footage of Alaskan Inuit homes Forest fires, by size Cities, by population Death toll in wars Earthquakes Word use Number of papers published by scientists Here are some well-known examples of natural phenomena that obey a powerlaw distribution.

Examples of phenomena that follow lognormal distributions
Concentration of elements in the earth's crust Latent periods of infectious diseases Survival times after cancer diagnosis Distribution of chemicals in the environment (including pollution) Species distribution among moths and diatoms Crystals in ice cream Length of words in spoken conversation And these are some that obey the lognormal distribution. Note that while words use in English distribute by a powerlaw (called Zipf’s Law), the length of those words distributes as a lognormal curve. Species of plants distribute as a powerlaw, while species of moths and diatoms distribute as a lognormal. Weird. What’s going on? One answer is that we may have been wrongly calling some lognormal distibutions “powerlaws”. After all, the two curves can looks the same for a few orders of magnitude. Some of the data sets in the previous slide (such as papers per scientist) don’t extend to large enough numbers to be able to tell one curve from the other.

What’s the difference? Powerlaws: created by “preferential attachment” in scale-free networks. So a better way to tell the two curves apart is to have a better sense of the underlying forces that created them in the first place. In general, it all boils down to network effects of one sort or another. Powerlaws tend to be created by “preferential attachements” in “scale-free” networks, which is to say that some nodes are more connected than others (like Malcolm Gladwell’s “mavens” in The Tipping Point)

Lognormal distributions: created by "proportionate effects" (like growing by a proportion of your weight). Lognormal distributions, on the other hand, tend to be created by more linear phenomena, such as the geometric growth of simple ratios.

Question Assuming it all comes down to network effects, how can you predict whether the “natural shape” (free of bottlenecks and other scarcity distortions) is a powerlaw or a lognormal distribution? So here’s the Big Question. Discuss.

A problem with the Long Tail

Similar presentations

Presentation on theme: "A problem with the Long Tail"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A problem with the Long Tail

Similar presentations

Presentation on theme: "A problem with the Long Tail"— Presentation transcript:

Similar presentations

About project

Feedback