WALMART RECRUITING – STORE SALES FORECASTING Chaoyi Liu, Yuqing Lu, Haoran Wu Rajendran, Goutham.

Slides:



Advertisements
Similar presentations
Chapter 6 Forecasting.
Advertisements

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
1 Data Mining: and Knowledge Acquizition — Chapter 5 — BIS /2014 Summer.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Consumer Behavior Prediction using Parametric and Nonparametric Methods Elena Eneva CALD Masters Presentation 19 August 2002 Advisors: Alan Montgomery,
DETERMINING SALES FORECASTS
Confidence Intervals for Proportions
Lecture 14 – Neural Networks
Neural Network Based Approach for Short-Term Load Forecasting
Chapter 12 - Forecasting Forecasting is important in the business decision-making process in which a current choice or decision has future implications:
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Forecasting.
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
Randomness and Probability
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
1 The Sample Mean rule Recall we learned a variable could have a normal distribution? This was useful because then we could say approximately.
FORECASTING Operations Management Dr. Ron Lembke.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Copyright © 2011 Pearson Education, Inc. Samples and Surveys Chapter 13.
1 Work Sampling Can provide information about men and machines in less time and lower cost. It has three main uses: 1.Activity and delay sampling To measure.
{ Measuring Inflation Warning: May not be suitable for SL students.
4.2 One Sided Tests -Before we construct a rule for rejecting H 0, we need to pick an ALTERNATE HYPOTHESIS -an example of a ONE SIDED ALTERNATIVE would.
Macroeconomics THE BIG PICTURE
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Statistics and Quantitative Analysis Chemistry 321, Summer 2014.
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Stat 1301 Chapter 6 MEASUREMENT ERROR. National Bureau of Standards l NB 10 –supposed to be 10 grams –weighed weekly under the “same conditions”  same.
Artificial Intelligence Lecture No. 29 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Role of Menus in Marketing n Should further the goals of the marketing concept n Should contribute to establishing the perceived image of the firm n Should.
Online Shopping.
A Taylor Rule with Monthly Data A.G. Malliaris Mary.E. Malliaris Loyola University Chicago.
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Cyclical Unemployment Occurs because of a downturn in the economy. (SSEMA1_d)
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
UNECE Workshop on Consumer Price Indices Session 2: Sampling of Outlets and Products Presentation by Cengiz Erdoğan, TurkStat October Istanbul, Turkey.
Today I will: Take notes with the intent of learning the seven functions of marketing So I can: Identify the 7 functions using real life examples I will.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
MAT 1000 Mathematics in Today's World. Last Time 1.Collecting data with experiments 2.Practical problems with experiments.
Planning to Buy.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
Today’s Topics Graded HW1 in Moodle (Testbeds used for grading are linked to class home page) HW2 due (but can still use 5 late days) at 11:55pm tonight.
EXCEL DECISION MAKING TOOLS BASIC FORMULAE - REGRESSION - GOAL SEEK - SOLVER.
CSCI 347, Data Mining Evaluation: Training and Testing, Section 5.1, pages
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
1 Forecasting. 2 Should you carry an umbrella today? Part of the answer for you most likely depends on how much you care about getting wet! Assuming you.
3-1Forecasting CHAPTER 3 Forecasting McGraw-Hill/Irwin Operations Management, Eighth Edition, by William J. Stevenson Copyright © 2005 by The McGraw-Hill.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 13 Samples and Surveys.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Managerial Economics & Decision Sciences Department cross-section and panel data  fixed effects  omitted variable bias  business analytics II Developed.
CHAPTER 18 DETERMINING SALES FORECASTS. Importance of Forecasting Sales  “How many guests will I serve today?" – "This week?" - "This year?"  Guests.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
GRAPHING.
Animals Including Humans
A Taylor Rule with Monthly Data
Data Mining Lecture 11.
Animals Including Humans
An Inteligent System to Diabetes Prediction
Dr. Morgan C. Wang Department of Statistics
DESICION TABLE Decision tables are precise and compact way to model complicated logic. Decision table is useful when input and output data can be.
Representing Data Unit 1 Lesson 3.
Somi Jacob and Christian Bach
Predicting Body Movement and Recognizing Actions: an Integrated Framework for Mutual Benefits Boyu Wang and Minh Hoai Stony Brook University Experiments:
Presentation transcript:

WALMART RECRUITING – STORE SALES FORECASTING Chaoyi Liu, Yuqing Lu, Haoran Wu Rajendran, Goutham

The dataset we choose   ·Store - the store number  ·Date - the week  ·Temperature - average temperature in the region  ·Fuel Price - cost of fuel in the region  ·MarkDown1-5 - data is only available after Nov 2011,  ·CPI - the consumer price index  ·Unemployment - the unemployment rate  ·IsHoliday - whether the week is a special holiday week  provided parameters may affect weekly sales, but did not provide weekly sales.   Store - the store number  ·Dept - the department number  ·Date - the week  ·Weekly Sales - sales for the given department in the given store  ·IsHoliday - whether the week is a special holiday week  provided sales data of 45 stores with up to 99 departments in more than 421,000 records, and didn’t sum each store’s weekly sales up.

Then we integrated two datasets  So initially, we integrated these two massive tables into one that has everything we need with 6,435 records like this:  Store  Date  Temperature  Fuel_Price  MarkDown1-5  CPI  Unemployment  IsHoliday  Weekly_Sales We decide to divide the whole 6,435 records equally into 5 groups each contain 1,287 records by quinquesection from small to big like this: Mark asDescription Level 1DMore than $ 0.00 Level 2C More than $ 497, Level 3B More than $ 748, Level 4A More than $ 1,056, Level 5S More than $ 1,414,343.53

Neural Network Model It is for complicated prediction problems Visualization or understanding of the rules are not needed Accuracy is very important

Result Learning Rate / Training Cycles = 0.03/2000 Accuracy = 70.61% true Dtrue Strue Ctrue Btrue A class precision pred. D % pred. S % pred. C % pred. B % pred. A % class recall 78.32%68.09%76.92%65.00%65.67% It is easy to find out that Accuracy achieve 70.61% when Learning Rate is 0.03 and will increase as well as Training Cycles increasing

Neural Network Weights Node 1Node 2Node 3Node 4Node 5 Node 6 Node 7Node 8Node 9Node 10 Store Date Temperature Fuel_Price MarkDown MarkDown MarkDown MarkDown MarkDown CPI Unemployme nt IsHoliday Bias Hidden Layer :

Class 'S'Class 'A'Class 'B'Class 'C'Class 'D' Node Node Node Node Node Node Node Node Node Node Threshold Output:

Naïve Bayes Accuracy = 18.63% true Dtrue Strue Ctrue Btrue A class precisio n pred. D % pred. S % pred. C % pred. B % pred. A % class recall % 0.00% Why Naïve Bayes performances “idiot” on this sample? Because variable Store, Data to IsHoliday are independent on each other, so: P(Store,Date,Temperature, … & IsHoliday)=P(Store)*P(Date)*…..*P(IsHoliday) Due to so many numbers in columns Store, Date, … IsHoliday that do not repeat, the probability of each Variables is too small. So P(Store)*P(Date)*…..*P(IsHoliday) will be far lower than 1/6435. This means the probability of sales basing such a model is infeasible.

When K = 1, Accuracy = 26% true Dtrue Strue Ctrue Btrue A class precision pred. D % pred. S % pred. C % pred. B % pred. A % class recall 32.11%37.84%15.58%15.00%25.17% When K = 10, Accuracy = 29.03% true Dtrue Strue Ctrue Btrue A class precision pred. D % pred. S % pred. C % pred. B % pred. A % class recall 35.78%50.98%15.50%28.92%6.00% K-NN

Conclusion  MarkDown 1 to 5 has the highest weight as 16 which mean it really makes an enormous impact on the sales. Promotion will increase weekly sales remarkably.  Fuel price and temperature also makes a positive impact, higher price makes higher sales.  CPI and Unemployment rate having a heavy negative impact on the prospects of sales. The higher CPI and unemployment rate, the less weekly sales.  Holidays affect weekly sales slightly. I think customers don’t care whether today is holiday or not, the only reason they buy items is promotion.