Presentation is loading. Please wait.

Presentation is loading. Please wait.

Move Over, Big Data! How Small, Simple Models Can Yield Big Insights Richard C. Larson, Ph.D., Mitsui Professor of Engineering Systems.

Similar presentations


Presentation on theme: "Move Over, Big Data! How Small, Simple Models Can Yield Big Insights Richard C. Larson, Ph.D., Mitsui Professor of Engineering Systems."— Presentation transcript:

1 Move Over, Big Data! How Small, Simple Models Can Yield Big Insights Richard C. Larson, Ph.D., rclarson@mit.edu Mitsui Professor of Engineering Systems and Director of the Center for Engineering Systems Fundamentals, MIT September 8, 2014 1© Richard C. Larson 2014

2 2

3 3

4 Fishing in the Ocean…. Random location? No strategy? Or, location and strategy based on prior analysis? 4© Richard C. Larson 2014

5 In Trying to Make Sense of a Sea of Data, We Need Small Simple Models to Guide our Search 5© Richard C. Larson 2014

6 6

7 From an MIT SDM alum: I work on big data “stuff” in my day job and I think simple models are too often discounted, often due to bedazzlement by big data trends, tools, and the quest for the holy grail. © Richard C. Larson 20147

8 F(y) = B(y) – [F(y-1) + F(y-2) + F(y-3)] 8© Richard C. Larson 2014

9 What we are not saying about Big Data and Data Analytics….. What we are saying about small models….. Ideally, in many applications, these two approaches are complementary, going hand in hand. Big Data Small Models 9© Richard C. Larson 2014

10 Outline. Flaws of Averages Square root laws Nonlinearities in Queueing Case Study: Marrying Small Models and Big Data Analysis 10© Richard C. Larson 2014

11 If we are about to deal with lots of data, averages will be important. An average is one of the simplest operations on any dataset. We need to be savvy customers of averages! 11© Richard C. Larson 2014

12 Flaws of Averages Simple model: The average of N quantities, X 1, X 2, …, X N. Average = (X 1 +X 2 +… + X N )/N. Simple, right? 12© Richard C. Larson 2014

13 Flaws of Averages 13© Richard C. Larson 2014

14 Flaws of Averages We tend to think in averages, often to the point of believing that the average is a constant describing all! Warning: Average River Depth is 4 feet! Mutual Fund: Average total annual returns --- 7%. 14© Richard C. Larson 2014

15 Flaws of Averages We’ve all heard the joke: When Bill Gates walks into a crowded establishment, ON AVERAGE everyone becomes a millionaire! The mean salary of a tech worker in San Mateo County is $291,497. $81,000 of this is due to Mark Zuckerberg! Medians anyone? 15© Richard C. Larson 2014

16 16© Richard C. Larson 2014

17 Flaws of Averages Garrison Keeler: Lake Wobegon, where all the women are strong, all the men are good looking and all the children are above average. Possible? Impossible? 17© Richard C. Larson 2014

18 Flaws of Averages Movie Theaters: Estimate the fraction of offered seats that are sold. Movie Theater Management: What do they see? Typically – 5% Selection bias – occurs everywhere. 18© Richard C. Larson 2014

19 Flaws of Averages Selection bias – occurs everywhere. Think of waking up and being a chocolate chip in a chocolate chip cookie! – Your perceived distribution of chips in a cookie – Management’s experience.. 19© Richard C. Larson 2014

20 Flaws of Averages Selection bias – Extends to friends on FaceBook. Yes, it is true that on average my friends on FaceBook have more friends than I do! How does this type of selection bias extend into your business? 20© Richard C. Larson 2014

21 Flaws of Averages Viral growth, R 0. R 0 initially from Germany, population growth In epidemics, R 0 is the average number of new infections created by a newly infected person when almost everyone is susceptible to the disease. 21© Richard C. Larson 2014

22 Flaws of Averages Suppose R 0 = 2.0. Consider two very different possibilities… 1: Every infection generates 2 more. 2: A new infection has a 50% chance of generating 4 new infections and a 50% chance of generating none. Can you picture the temporal dynamics of each case? 22© Richard C. Larson 2014

23 Ebola Summer 2014 © Richard C. Larson 201423

24 Flaws of Averages “Outliers”: What to do with them? Many say, clip them off, they distort the analysis, they mislead intuition. But, “outliers” have determined the course of human history. – Meteors hitting Planet Earth – Richter 9 and above earthquakes – Financial collapses. 24© Richard C. Larson 2014

25 Earthquakes Richter Scale: logarithmic. Each whole number step in the magnitude scale corresponds to the release of about 31 times more energy than the amount associated with the preceding whole number value. 25© Richard C. Larson 2014

26 26© Richard C. Larson 2014

27 Flaws of Averages: Summary Points Averages can be deceiving. Treating a distribution as its average value usually results in incorrect inferences. Averages as experienced by one population may be very different from those experienced by another. Ignore “outliers” at your peril. 27© Richard C. Larson 2014

28 And we haven’t even considered… Regression to the Mean Variance Exponential smoothing – Example: Baseball batting averages And much more… 28© Richard C. Larson 2014

29 One More Average: Based on Dimensionality Arguments Mean travel distance in a city, N police cars, area A. This is a Square Root Law. In our analysis of Big Data, we can look for this type of behavior. 29© Richard C. Larson 2014

30 Let’s Now Switch: From Averages to a Simple Operational Model 30© Richard C. Larson 2014

31 What Kinds of Queues Occur in Systems of Interest to ESD? Queues, Queues Everywhere! 31© Richard C. Larson 2014

32 Queueing System Queue of Waiting Customers Departing Customers SERVICE FACILITY Arriving Customers 32© Richard C. Larson 2014

33 Queues, Queues Everywhere! Queueing Theory: 100 Years Old! Most queues are complicated, and folks want to simulate almost all of the detail. And today there are numerous files of Big Data drawn from queues. But let’s look at simple models first! – to Guide Us 33© Richard C. Larson 2014

34 It May be Little, But It’s The Law! L= Time average number of customers in the system, both in queue and in service = Average rate of arrivals of customers into the system W = Mean time spent by a customer in the system, both in queue and in service 34© Richard C. Larson 2014

35 Formula applies in all sorts of places, including those not normally thought of as queues. Example: Annual rate of new hires of assistant professors in a university. MIT: L = 1,000 tenure-track faculty members W= mean duration of a faculty career If W moves upwards from 20 to 22 years, moves down accordingly, since L = 1,000 remains constant. 35© Richard C. Larson 2014

36 The M/M/k Queue Queue notation: Input/Service/Servers First M: “Memoryless” input process, meaning Poisson process Second M: “Memoryless” service time, meaning exponential probability density function k = number of servers. 36© Richard C. Larson 2014

37 Note the Elbow! Rho =  = fraction of time that server is busy serving customers M/M/1 Queue Queue Explodes! 37© Richard C. Larson 2014

38 Elbow as we Increase the Number of Servers (k = 1,2,3,9,16) 38© Richard C. Larson 2014

39 Do You See Why Large Call Centers are More Productive? 39© Richard C. Larson 2014

40 What Do You See as a Role for Big Data Analysis Here? 40© Richard C. Larson 2014

41 D = Deterministic 41© Richard C. Larson 2014

42 Averages in Queues Performance degrades as arrival rate increases and/or mean service time increases. Performance degrades as Variance of time between arrivals increases and/or variance of the service time increases. Can you think of examples? 42© Richard C. Larson 2014

43 Now for Final Switch: From Queueing Overview to Case Study 43© Richard C. Larson 2014

44 Queue Inference Engine: A Personal Big Data–Small Models Experience It started with Reams of Old-fashioned Paper- based Computer Printouts 44© Richard C. Larson 2014

45 Queue Inference Engine: Big Data: – Time ATM card inserted; – Time ATM Transaction completed. 45© Richard C. Larson 2014

46 Queue Inference Engine: Knowing the probability properties of the Arrival Process, a “Poisson Process,” we were able to derive a mathematically valid algorithm to determine many statistics of customers’ queue delays. It’s called an O(N 3 ) algorithm, since the number of computations grows as the 3 rd power of the number of customers in a busy period. 46© Richard C. Larson 2014

47 Queue Inference Engine: Imagine receiving your monthly bank statement and with it is a statement of the times you spent waiting in bank queues. The queues could include both those involving human tellers and automatic teller machines (ATMs). With the technology of the Queue Inference Engine (QIE) such an innovation is now well within the realm of possibility. 47© Richard C. Larson 2014

48 Queue Inference Engine: With our first results published in 1990, Dr. David Simchi-Levi and others call this one of the first applications of Big Data analysis to modern-day problems: “This is just a beautiful example of how data drive new research…” (Simchi-Levi, 2014) But this “QIE” Big Data algorithm could not have been derived without marrying Small Models (with their behavior) with Big Data recursive thinking. 48© Richard C. Larson 2014

49 Big Data and Small Models © Richard C. Larson 201449

50 50© Richard C. Larson 2014

51 References Larson, R.C., "The Queue Inference Engine: Deducing Queue Statistics From Transactional Data." Management Science 36(5):586-601, May 1990. Larson, Richard C., QUEUE INFERENCE ENGINE, chapter in Encyclopedia of Operations Research and Management Science, Centennial Edition, Saul I. Gass and Carl M. Harris (eds.), Kluwer, Boston, 2001, pp.674-679. Jones, Lee K. and Richard C. Larson, "Efficient Computation of Probabilities of Events Described by Order Statistics and Applications to Queue Inference." ORSA Journal on Computation., vol. 7, no. 1, Winter 1995, pp. 89-100. Gross, Donald and Richard C. Larson, “Queuing Systems,” in International Encyclopedia of Business and Management (IEBM), 2 nd edition, 8-volume set, Malcolm Warner, ed., Thomson Learning, London, U.K., 2001, pp. 5502-5513. Larson, Richard C. and Mauricio Gomez Diaz, “Nonfixed Retirement Age for University Professors: Modeling Its Effects on New Faculty Hires,” Service Science, V. 4, No. 1, March 2012, p. 69-78. Simchi-Levi, David. “OM Research: From Problem-Driven to Data-Driven Research,” M&SOM, 16 (1) 2014 pp. 2-10. 51© Richard C. Larson 2014


Download ppt "Move Over, Big Data! How Small, Simple Models Can Yield Big Insights Richard C. Larson, Ph.D., Mitsui Professor of Engineering Systems."

Similar presentations


Ads by Google