We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byJesus Branham
Modified about 1 year ago
Move Over, Big Data! How Small, Simple Models Can Yield Big Insights Richard C. Larson, Ph.D., Mitsui Professor of Engineering Systems and Director of the Center for Engineering Systems Fundamentals, MIT September 8, © Richard C. Larson 2014
Fishing in the Ocean…. Random location? No strategy? Or, location and strategy based on prior analysis? 4© Richard C. Larson 2014
In Trying to Make Sense of a Sea of Data, We Need Small Simple Models to Guide our Search 5© Richard C. Larson 2014
From an MIT SDM alum: I work on big data “stuff” in my day job and I think simple models are too often discounted, often due to bedazzlement by big data trends, tools, and the quest for the holy grail. © Richard C. Larson 20147
F(y) = B(y) – [F(y-1) + F(y-2) + F(y-3)] 8© Richard C. Larson 2014
What we are not saying about Big Data and Data Analytics….. What we are saying about small models….. Ideally, in many applications, these two approaches are complementary, going hand in hand. Big Data Small Models 9© Richard C. Larson 2014
Outline. Flaws of Averages Square root laws Nonlinearities in Queueing Case Study: Marrying Small Models and Big Data Analysis 10© Richard C. Larson 2014
If we are about to deal with lots of data, averages will be important. An average is one of the simplest operations on any dataset. We need to be savvy customers of averages! 11© Richard C. Larson 2014
Flaws of Averages Simple model: The average of N quantities, X 1, X 2, …, X N. Average = (X 1 +X 2 +… + X N )/N. Simple, right? 12© Richard C. Larson 2014
Flaws of Averages 13© Richard C. Larson 2014
Flaws of Averages We tend to think in averages, often to the point of believing that the average is a constant describing all! Warning: Average River Depth is 4 feet! Mutual Fund: Average total annual returns --- 7%. 14© Richard C. Larson 2014
Flaws of Averages We’ve all heard the joke: When Bill Gates walks into a crowded establishment, ON AVERAGE everyone becomes a millionaire! The mean salary of a tech worker in San Mateo County is $291,497. $81,000 of this is due to Mark Zuckerberg! Medians anyone? 15© Richard C. Larson 2014
16© Richard C. Larson 2014
Flaws of Averages Garrison Keeler: Lake Wobegon, where all the women are strong, all the men are good looking and all the children are above average. Possible? Impossible? 17© Richard C. Larson 2014
Flaws of Averages Movie Theaters: Estimate the fraction of offered seats that are sold. Movie Theater Management: What do they see? Typically – 5% Selection bias – occurs everywhere. 18© Richard C. Larson 2014
Flaws of Averages Selection bias – occurs everywhere. Think of waking up and being a chocolate chip in a chocolate chip cookie! – Your perceived distribution of chips in a cookie – Management’s experience.. 19© Richard C. Larson 2014
Flaws of Averages Selection bias – Extends to friends on FaceBook. Yes, it is true that on average my friends on FaceBook have more friends than I do! How does this type of selection bias extend into your business? 20© Richard C. Larson 2014
Flaws of Averages Viral growth, R 0. R 0 initially from Germany, population growth In epidemics, R 0 is the average number of new infections created by a newly infected person when almost everyone is susceptible to the disease. 21© Richard C. Larson 2014
Flaws of Averages Suppose R 0 = 2.0. Consider two very different possibilities… 1: Every infection generates 2 more. 2: A new infection has a 50% chance of generating 4 new infections and a 50% chance of generating none. Can you picture the temporal dynamics of each case? 22© Richard C. Larson 2014
Ebola Summer 2014 © Richard C. Larson
Flaws of Averages “Outliers”: What to do with them? Many say, clip them off, they distort the analysis, they mislead intuition. But, “outliers” have determined the course of human history. – Meteors hitting Planet Earth – Richter 9 and above earthquakes – Financial collapses. 24© Richard C. Larson 2014
Earthquakes Richter Scale: logarithmic. Each whole number step in the magnitude scale corresponds to the release of about 31 times more energy than the amount associated with the preceding whole number value. 25© Richard C. Larson 2014
26© Richard C. Larson 2014
Flaws of Averages: Summary Points Averages can be deceiving. Treating a distribution as its average value usually results in incorrect inferences. Averages as experienced by one population may be very different from those experienced by another. Ignore “outliers” at your peril. 27© Richard C. Larson 2014
And we haven’t even considered… Regression to the Mean Variance Exponential smoothing – Example: Baseball batting averages And much more… 28© Richard C. Larson 2014
One More Average: Based on Dimensionality Arguments Mean travel distance in a city, N police cars, area A. This is a Square Root Law. In our analysis of Big Data, we can look for this type of behavior. 29© Richard C. Larson 2014
Let’s Now Switch: From Averages to a Simple Operational Model 30© Richard C. Larson 2014
What Kinds of Queues Occur in Systems of Interest to ESD? Queues, Queues Everywhere! 31© Richard C. Larson 2014
Queueing System Queue of Waiting Customers Departing Customers SERVICE FACILITY Arriving Customers 32© Richard C. Larson 2014
Queues, Queues Everywhere! Queueing Theory: 100 Years Old! Most queues are complicated, and folks want to simulate almost all of the detail. And today there are numerous files of Big Data drawn from queues. But let’s look at simple models first! – to Guide Us 33© Richard C. Larson 2014
It May be Little, But It’s The Law! L= Time average number of customers in the system, both in queue and in service = Average rate of arrivals of customers into the system W = Mean time spent by a customer in the system, both in queue and in service 34© Richard C. Larson 2014
Formula applies in all sorts of places, including those not normally thought of as queues. Example: Annual rate of new hires of assistant professors in a university. MIT: L = 1,000 tenure-track faculty members W= mean duration of a faculty career If W moves upwards from 20 to 22 years, moves down accordingly, since L = 1,000 remains constant. 35© Richard C. Larson 2014
The M/M/k Queue Queue notation: Input/Service/Servers First M: “Memoryless” input process, meaning Poisson process Second M: “Memoryless” service time, meaning exponential probability density function k = number of servers. 36© Richard C. Larson 2014
Note the Elbow! Rho = = fraction of time that server is busy serving customers M/M/1 Queue Queue Explodes! 37© Richard C. Larson 2014
Elbow as we Increase the Number of Servers (k = 1,2,3,9,16) 38© Richard C. Larson 2014
Do You See Why Large Call Centers are More Productive? 39© Richard C. Larson 2014
What Do You See as a Role for Big Data Analysis Here? 40© Richard C. Larson 2014
D = Deterministic 41© Richard C. Larson 2014
Averages in Queues Performance degrades as arrival rate increases and/or mean service time increases. Performance degrades as Variance of time between arrivals increases and/or variance of the service time increases. Can you think of examples? 42© Richard C. Larson 2014
Now for Final Switch: From Queueing Overview to Case Study 43© Richard C. Larson 2014
Queue Inference Engine: A Personal Big Data–Small Models Experience It started with Reams of Old-fashioned Paper- based Computer Printouts 44© Richard C. Larson 2014
Queue Inference Engine: Big Data: – Time ATM card inserted; – Time ATM Transaction completed. 45© Richard C. Larson 2014
Queue Inference Engine: Knowing the probability properties of the Arrival Process, a “Poisson Process,” we were able to derive a mathematically valid algorithm to determine many statistics of customers’ queue delays. It’s called an O(N 3 ) algorithm, since the number of computations grows as the 3 rd power of the number of customers in a busy period. 46© Richard C. Larson 2014
Queue Inference Engine: Imagine receiving your monthly bank statement and with it is a statement of the times you spent waiting in bank queues. The queues could include both those involving human tellers and automatic teller machines (ATMs). With the technology of the Queue Inference Engine (QIE) such an innovation is now well within the realm of possibility. 47© Richard C. Larson 2014
Queue Inference Engine: With our first results published in 1990, Dr. David Simchi-Levi and others call this one of the first applications of Big Data analysis to modern-day problems: “This is just a beautiful example of how data drive new research…” (Simchi-Levi, 2014) But this “QIE” Big Data algorithm could not have been derived without marrying Small Models (with their behavior) with Big Data recursive thinking. 48© Richard C. Larson 2014
Big Data and Small Models © Richard C. Larson
50© Richard C. Larson 2014
References Larson, R.C., "The Queue Inference Engine: Deducing Queue Statistics From Transactional Data." Management Science 36(5): , May Larson, Richard C., QUEUE INFERENCE ENGINE, chapter in Encyclopedia of Operations Research and Management Science, Centennial Edition, Saul I. Gass and Carl M. Harris (eds.), Kluwer, Boston, 2001, pp Jones, Lee K. and Richard C. Larson, "Efficient Computation of Probabilities of Events Described by Order Statistics and Applications to Queue Inference." ORSA Journal on Computation., vol. 7, no. 1, Winter 1995, pp Gross, Donald and Richard C. Larson, “Queuing Systems,” in International Encyclopedia of Business and Management (IEBM), 2 nd edition, 8-volume set, Malcolm Warner, ed., Thomson Learning, London, U.K., 2001, pp Larson, Richard C. and Mauricio Gomez Diaz, “Nonfixed Retirement Age for University Professors: Modeling Its Effects on New Faculty Hires,” Service Science, V. 4, No. 1, March 2012, p Simchi-Levi, David. “OM Research: From Problem-Driven to Data-Driven Research,” M&SOM, 16 (1) 2014 pp © Richard C. Larson 2014
Queuing Analysis Based on noted from Appendix A of Stallings Operating System text 6/10/20151.
WOOD 492 MODELLING FOR DECISION SUPPORT Lecture 24 Simulation.
Spreadsheet Modeling & Decision Analysis A Practical Introduction to Management Science 5th edition Cliff T. Ragsdale.
Managerial Decision Making Chapter 13 Queuing Models.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
1 1 © 2003 Thomson /South-Western Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Network Design and Analysis-----Wang Wenjie Queueing System IV: 1 © Graduate University, Chinese academy of Sciences. Network Design and Analysis Wang.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
Chapter 81 Queuing Theory Basic properties, Markovian models, Networks of queues, General service time distributions, Finite source models, Multiserver.
1 1 Slide Review of Probability Distributions n Probability distribution is a theoretical frequency distribution. Example 1. If you throw a fair die (numbered.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
-188- HMP654/EXECMAS Queuing Theory Queuing Theory represents the body of knowledge dealing with waiting lines. Most queuing problems focus on determining.
1 Chapters 8 Overview of Queuing Analysis. Chapter 8 Overview of Queuing Analysis 2 Projected vs. Actual Response Time.
1 Waiting Lines Also Known as Queuing Theory. 2 Have you ever been to the grocery store and had to wait in line? Or maybe you had to wait at the bank.
Spreadsheet Modeling and Decision Analysis, 3e, by Cliff Ragsdale. © 2001 South-Western/Thomson Learning Queuing Theory Chapter 13.
Queueing Theory Models Training Presentation By: Seth Randall.
Mohammad Khalily Islamic Azad University. Usually buffer size is finite Interarrival time and service times are independent State of the system.
Waiting Line Theory Akhid Yulianto, SE, MSc (log).
Queueing Theory-1 Queueing Theory Chapter 17. Queueing Theory-2 Basic Queueing Process Arrivals Arrival time distribution Calling population (infinite.
1 An Optimal Design of the M/M/C/K Queue for Call Centers William A. Massey Department of Operations Research and Financial Engineering, Princeton University.
STAT 497 APPLIED TIME SERIES ANALYSIS INTRODUCTION 1.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
OS Spring ’ 03 Performance Evaluation Operating Systems Spring 2003.
Maximum likelihood (ML) Conditional distribution and likelihood Maximum likelihood estimator Information in the data and likelihood Observed and Fisher’s.
Queuing Theory. Introduction Queuing is the study of waiting lines, or queues. The objective of queuing analysis is to design systems that enable organizations.
Chapter 20 Queuing Theory to accompany Operations Research: Applications and Algorithms 4th edition by Wayne L. Winston Copyright (c) 2004 Brooks/Cole,
1 1 Slide Chapter 12 Waiting Line Models n The Structure of a Waiting Line System n Queuing Systems n Queuing System Input Characteristics n Queuing System.
1 1 Slide © 2001 South-Western College Publishing/Thomson Learning Anderson Sweeney Williams Anderson Sweeney Williams Slides Prepared by JOHN LOUCKS QUANTITATIVE.
© 2006, Monash University, Australia CSE4884 Network Design and Management Lecturer: Dr Carlo Kopp, MIEEE, MAIAA, PEng Lecture 5 Queueing Theory Concepts.
1 Elements of Queuing Theory The queuing model –Core components; –Notation; –Parameters and performance measures –Characteristics; Markov Process –Discrete-time.
Cmpt-225 Simulation. Application: Simulation Simulation A technique for modeling the behavior of both natural and human-made systems Goal Generate.
Maximum likelihood (ML) and likelihood ratio (LR) test Conditional distribution and likelihood Maximum likelihood estimator Information in the data and.
Waiting Lines and Queuing Models. Queuing Theory The study of the behavior of waiting lines Importance to business There is a tradeoff between faster.
1 1 Slide © 2009 South-Western, a part of Cengage Learning Slides by John Loucks St. Edward’s University.
WINTER 2012IE 368. FACILITY DESIGN AND OPERATIONS MANAGEMENT 1 IE 368: FACILITY DESIGN AND OPERATIONS MANAGEMENT Lecture Notes #3 Production System Design.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 8 Section 1 – Slide 1 of 29 Chapter 8 Section 1 Distributions of the Sample Mean.
WAITING LINES The study of waiting lines, called queuing theory, is one of the most widely used and oldest management science techniques. The three basic.
Data Communication and Networks Lecture 13 Performance December 9, 2004 Joseph Conron Computer Science Department New York University
Simulation Output Analysis. Summary Examples Parameter Estimation Sample Mean and Variance Point and Interval Estimation Terminating and Non-Terminating.
2 September, 2001J. Hagstrom, U. of Illinois1 Queuing Systems Jane Hagstrom.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
1 BIS 3106: Business Process Management (BPM) Lecture Nine: Quantitative Process Analysis (2) Makerere University School of Computing and Informatics Technology.
Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.
Section Sampling Distributions and the Central Limit Theorem © 2012 Pearson Education, Inc. All rights reserved. 1.
Sampling and estimation Petter Mostad
MIT Fun queues for MIT The importance of queues When do queues appear? –Systems in which some serving entities provide some service in a shared.
Lecture 5 This lecture is about: Introduction to Queuing Theory Queuing Theory Notation Bertsekas/Gallager: Section 3.3 Kleinrock (Book I) Basics of Markov.
A bit on Queueing Theory: M/M/1, M/G/1, GI/G/1 Yoni Nazarathy * EURANDOM, Eindhoven University of Technology, The Netherlands. (As of Dec 1: Swinburne.
© 2017 SlidePlayer.com Inc. All rights reserved.