Download presentation
Presentation is loading. Please wait.
Published byJesus Branham Modified over 9 years ago
1
Move Over, Big Data! How Small, Simple Models Can Yield Big Insights Richard C. Larson, Ph.D., rclarson@mit.edu Mitsui Professor of Engineering Systems and Director of the Center for Engineering Systems Fundamentals, MIT September 8, 2014 1© Richard C. Larson 2014
2
2
3
3
4
Fishing in the Ocean…. Random location? No strategy? Or, location and strategy based on prior analysis? 4© Richard C. Larson 2014
5
In Trying to Make Sense of a Sea of Data, We Need Small Simple Models to Guide our Search 5© Richard C. Larson 2014
6
6
7
From an MIT SDM alum: I work on big data “stuff” in my day job and I think simple models are too often discounted, often due to bedazzlement by big data trends, tools, and the quest for the holy grail. © Richard C. Larson 20147
8
F(y) = B(y) – [F(y-1) + F(y-2) + F(y-3)] 8© Richard C. Larson 2014
9
What we are not saying about Big Data and Data Analytics….. What we are saying about small models….. Ideally, in many applications, these two approaches are complementary, going hand in hand. Big Data Small Models 9© Richard C. Larson 2014
10
Outline. Flaws of Averages Square root laws Nonlinearities in Queueing Case Study: Marrying Small Models and Big Data Analysis 10© Richard C. Larson 2014
11
If we are about to deal with lots of data, averages will be important. An average is one of the simplest operations on any dataset. We need to be savvy customers of averages! 11© Richard C. Larson 2014
12
Flaws of Averages Simple model: The average of N quantities, X 1, X 2, …, X N. Average = (X 1 +X 2 +… + X N )/N. Simple, right? 12© Richard C. Larson 2014
13
Flaws of Averages 13© Richard C. Larson 2014
14
Flaws of Averages We tend to think in averages, often to the point of believing that the average is a constant describing all! Warning: Average River Depth is 4 feet! Mutual Fund: Average total annual returns --- 7%. 14© Richard C. Larson 2014
15
Flaws of Averages We’ve all heard the joke: When Bill Gates walks into a crowded establishment, ON AVERAGE everyone becomes a millionaire! The mean salary of a tech worker in San Mateo County is $291,497. $81,000 of this is due to Mark Zuckerberg! Medians anyone? 15© Richard C. Larson 2014
16
16© Richard C. Larson 2014
17
Flaws of Averages Garrison Keeler: Lake Wobegon, where all the women are strong, all the men are good looking and all the children are above average. Possible? Impossible? 17© Richard C. Larson 2014
18
Flaws of Averages Movie Theaters: Estimate the fraction of offered seats that are sold. Movie Theater Management: What do they see? Typically – 5% Selection bias – occurs everywhere. 18© Richard C. Larson 2014
19
Flaws of Averages Selection bias – occurs everywhere. Think of waking up and being a chocolate chip in a chocolate chip cookie! – Your perceived distribution of chips in a cookie – Management’s experience.. 19© Richard C. Larson 2014
20
Flaws of Averages Selection bias – Extends to friends on FaceBook. Yes, it is true that on average my friends on FaceBook have more friends than I do! How does this type of selection bias extend into your business? 20© Richard C. Larson 2014
21
Flaws of Averages Viral growth, R 0. R 0 initially from Germany, population growth In epidemics, R 0 is the average number of new infections created by a newly infected person when almost everyone is susceptible to the disease. 21© Richard C. Larson 2014
22
Flaws of Averages Suppose R 0 = 2.0. Consider two very different possibilities… 1: Every infection generates 2 more. 2: A new infection has a 50% chance of generating 4 new infections and a 50% chance of generating none. Can you picture the temporal dynamics of each case? 22© Richard C. Larson 2014
23
Ebola Summer 2014 © Richard C. Larson 201423
24
Flaws of Averages “Outliers”: What to do with them? Many say, clip them off, they distort the analysis, they mislead intuition. But, “outliers” have determined the course of human history. – Meteors hitting Planet Earth – Richter 9 and above earthquakes – Financial collapses. 24© Richard C. Larson 2014
25
Earthquakes Richter Scale: logarithmic. Each whole number step in the magnitude scale corresponds to the release of about 31 times more energy than the amount associated with the preceding whole number value. 25© Richard C. Larson 2014
26
26© Richard C. Larson 2014
27
Flaws of Averages: Summary Points Averages can be deceiving. Treating a distribution as its average value usually results in incorrect inferences. Averages as experienced by one population may be very different from those experienced by another. Ignore “outliers” at your peril. 27© Richard C. Larson 2014
28
And we haven’t even considered… Regression to the Mean Variance Exponential smoothing – Example: Baseball batting averages And much more… 28© Richard C. Larson 2014
29
One More Average: Based on Dimensionality Arguments Mean travel distance in a city, N police cars, area A. This is a Square Root Law. In our analysis of Big Data, we can look for this type of behavior. 29© Richard C. Larson 2014
30
Let’s Now Switch: From Averages to a Simple Operational Model 30© Richard C. Larson 2014
31
What Kinds of Queues Occur in Systems of Interest to ESD? Queues, Queues Everywhere! 31© Richard C. Larson 2014
32
Queueing System Queue of Waiting Customers Departing Customers SERVICE FACILITY Arriving Customers 32© Richard C. Larson 2014
33
Queues, Queues Everywhere! Queueing Theory: 100 Years Old! Most queues are complicated, and folks want to simulate almost all of the detail. And today there are numerous files of Big Data drawn from queues. But let’s look at simple models first! – to Guide Us 33© Richard C. Larson 2014
34
It May be Little, But It’s The Law! L= Time average number of customers in the system, both in queue and in service = Average rate of arrivals of customers into the system W = Mean time spent by a customer in the system, both in queue and in service 34© Richard C. Larson 2014
35
Formula applies in all sorts of places, including those not normally thought of as queues. Example: Annual rate of new hires of assistant professors in a university. MIT: L = 1,000 tenure-track faculty members W= mean duration of a faculty career If W moves upwards from 20 to 22 years, moves down accordingly, since L = 1,000 remains constant. 35© Richard C. Larson 2014
36
The M/M/k Queue Queue notation: Input/Service/Servers First M: “Memoryless” input process, meaning Poisson process Second M: “Memoryless” service time, meaning exponential probability density function k = number of servers. 36© Richard C. Larson 2014
37
Note the Elbow! Rho = = fraction of time that server is busy serving customers M/M/1 Queue Queue Explodes! 37© Richard C. Larson 2014
38
Elbow as we Increase the Number of Servers (k = 1,2,3,9,16) 38© Richard C. Larson 2014
39
Do You See Why Large Call Centers are More Productive? 39© Richard C. Larson 2014
40
What Do You See as a Role for Big Data Analysis Here? 40© Richard C. Larson 2014
41
D = Deterministic 41© Richard C. Larson 2014
42
Averages in Queues Performance degrades as arrival rate increases and/or mean service time increases. Performance degrades as Variance of time between arrivals increases and/or variance of the service time increases. Can you think of examples? 42© Richard C. Larson 2014
43
Now for Final Switch: From Queueing Overview to Case Study 43© Richard C. Larson 2014
44
Queue Inference Engine: A Personal Big Data–Small Models Experience It started with Reams of Old-fashioned Paper- based Computer Printouts 44© Richard C. Larson 2014
45
Queue Inference Engine: Big Data: – Time ATM card inserted; – Time ATM Transaction completed. 45© Richard C. Larson 2014
46
Queue Inference Engine: Knowing the probability properties of the Arrival Process, a “Poisson Process,” we were able to derive a mathematically valid algorithm to determine many statistics of customers’ queue delays. It’s called an O(N 3 ) algorithm, since the number of computations grows as the 3 rd power of the number of customers in a busy period. 46© Richard C. Larson 2014
47
Queue Inference Engine: Imagine receiving your monthly bank statement and with it is a statement of the times you spent waiting in bank queues. The queues could include both those involving human tellers and automatic teller machines (ATMs). With the technology of the Queue Inference Engine (QIE) such an innovation is now well within the realm of possibility. 47© Richard C. Larson 2014
48
Queue Inference Engine: With our first results published in 1990, Dr. David Simchi-Levi and others call this one of the first applications of Big Data analysis to modern-day problems: “This is just a beautiful example of how data drive new research…” (Simchi-Levi, 2014) But this “QIE” Big Data algorithm could not have been derived without marrying Small Models (with their behavior) with Big Data recursive thinking. 48© Richard C. Larson 2014
49
Big Data and Small Models © Richard C. Larson 201449
50
50© Richard C. Larson 2014
51
References Larson, R.C., "The Queue Inference Engine: Deducing Queue Statistics From Transactional Data." Management Science 36(5):586-601, May 1990. Larson, Richard C., QUEUE INFERENCE ENGINE, chapter in Encyclopedia of Operations Research and Management Science, Centennial Edition, Saul I. Gass and Carl M. Harris (eds.), Kluwer, Boston, 2001, pp.674-679. Jones, Lee K. and Richard C. Larson, "Efficient Computation of Probabilities of Events Described by Order Statistics and Applications to Queue Inference." ORSA Journal on Computation., vol. 7, no. 1, Winter 1995, pp. 89-100. Gross, Donald and Richard C. Larson, “Queuing Systems,” in International Encyclopedia of Business and Management (IEBM), 2 nd edition, 8-volume set, Malcolm Warner, ed., Thomson Learning, London, U.K., 2001, pp. 5502-5513. Larson, Richard C. and Mauricio Gomez Diaz, “Nonfixed Retirement Age for University Professors: Modeling Its Effects on New Faculty Hires,” Service Science, V. 4, No. 1, March 2012, p. 69-78. Simchi-Levi, David. “OM Research: From Problem-Driven to Data-Driven Research,” M&SOM, 16 (1) 2014 pp. 2-10. 51© Richard C. Larson 2014
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.