WALMART RECRUITING – STORE SALES FORECASTING Chaoyi Liu, Yuqing Lu, Haoran Wu Rajendran, Goutham.

WALMART RECRUITING – STORE SALES FORECASTING Chaoyi Liu, Yuqing Lu, Haoran Wu Rajendran, Goutham

The dataset we choose   ·Store - the store number  ·Date - the week  ·Temperature - average temperature in the region  ·Fuel Price - cost of fuel in the region  ·MarkDown1-5 - data is only available after Nov 2011,  ·CPI - the consumer price index  ·Unemployment - the unemployment rate  ·IsHoliday - whether the week is a special holiday week  provided parameters may affect weekly sales, but did not provide weekly sales.   Store - the store number  ·Dept - the department number  ·Date - the week  ·Weekly Sales - sales for the given department in the given store  ·IsHoliday - whether the week is a special holiday week  provided sales data of 45 stores with up to 99 departments in more than 421,000 records, and didn’t sum each store’s weekly sales up.

Then we integrated two datasets  So initially, we integrated these two massive tables into one that has everything we need with 6,435 records like this:  Store  Date  Temperature  Fuel_Price  MarkDown1-5  CPI  Unemployment  IsHoliday  Weekly_Sales We decide to divide the whole 6,435 records equally into 5 groups each contain 1,287 records by quinquesection from small to big like this: Mark asDescription Level 1DMore than $ 0.00 Level 2C More than $ 497,250.22 Level 3B More than $ 748,435.20 Level 4A More than $ 1,056,282.91 Level 5S More than $ 1,414,343.53

Neural Network Model It is for complicated prediction problems Visualization or understanding of the rules are not needed Accuracy is very important

Result Learning Rate / Training Cycles = 0.03/2000 Accuracy = 70.61% true Dtrue Strue Ctrue Btrue A class precision pred. D 9394983369578.12% pred. S 171114587510181.61% pred. C 1601029231482967.77% pred. B 60959378018764.20% pred. A 232764316178861.04% class recall 78.32%68.09%76.92%65.00%65.67% It is easy to find out that Accuracy achieve 70.61% when Learning Rate is 0.03 and will increase as well as Training Cycles increasing

Neural Network Weights Node 1Node 2Node 3Node 4Node 5 Node 6 Node 7Node 8Node 9Node 10 Store -30.22427.949-20.9-31.4015.072-17.0140.86227.08818.26616.821 Date -1.661-7.4922.1590.942.7821.022.5192.277-3.3980.436 Temperature -0.1920.68-0.965-0.6010.038-0.847-0.436-1.513-1.0940.126 Fuel_Price 1.0541.459-0.460.632-1.242-1.7130.6981.3032.589-0.007 MarkDown1 0.114-2.679-0.24-0.636-4.109-1.3221.0040.648-1.147-2.402 MarkDown2 0.880.476-0.032.65-0.387-1.3060.3280.6331.983-0.821 MarkDown3 9.209-0.6350.6317.522-6.247-2.194-6.743-0.0751.1576.285 MarkDown4 -0.613-0.5540.1311.257-2.967-0.4450.7530.691-0.1150.599 MarkDown5 0.41-3.728-3.6971.786-14.643-2.1110.341-0.999-7.048-0.908 CPI -6.31629.004-12.179-12.9750.568 11.21 7 -27.856-15.442-5.7641.045 Unemployme nt -12.906-17.8764.064-1.4468.8953.9087.05612.174-31.0681.142 IsHoliday 0.0560-0.191-0.484-0.068-0.244-0.151-0.27-0.036-0.025 Bias -13.162-10.766-9.052-8.572-25.5969.488-10.87520.103-2.328-9.733 Hidden Layer :

Class 'S'Class 'A'Class 'B'Class 'C'Class 'D' Node 15.157-8.0271.098-24.10826.999 Node 2-19.113.6414.706-13.2147.698 Node 3-22.372.2631.269-3.5793.409 Node 42.287-9.1095.79111.71-13.986 Node 5-10.202-0.7491.64610.48817.509 Node 618.02411.672-1.3757.909-29.879 Node 7-10.33413.09519.417-9.269-20.259 Node 89.433.386-17.172-17.69411.896 Node 9-5.568-7.9464.50313.568-4.628 Node 1013.851-13.848-8.73113.088-9.074 Threshold1.257-21.581-8.326-2.985-0.562 Output:

Naïve Bayes Accuracy = 18.63% true Dtrue Strue Ctrue Btrue A class precisio n pred. D119916361200 18.63% pred. S000000.00% pred. C000000.00% pred. B000000.00% pred. A000000.00% class recall 100.00 % 0.00% Why Naïve Bayes performances “idiot” on this sample? Because variable Store, Data to IsHoliday are independent on each other, so: P(Store,Date,Temperature, … & IsHoliday)=P(Store)*P(Date)*…..*P(IsHoliday) Due to so many numbers in columns Store, Date, … IsHoliday that do not repeat, the probability of each Variables is too small. So P(Store)*P(Date)*…..*P(IsHoliday) will be far lower than 1/6435. This means the probability of sales basing such a model is infeasible.

When K = 1, Accuracy = 26% true Dtrue Strue Ctrue Btrue A class precision pred. D38520737318019928.65% pred. S19061925931130136.85% pred. C35227118722518815.29% pred. B11326712418021020.13% pred. A15927225730430223.34% class recall 32.11%37.84%15.58%15.00%25.17% When K = 10, Accuracy = 29.03% true Dtrue Strue Ctrue Btrue A class precision pred. D42923440316820629.79% pred. S27283428842142837.18% pred. C2191231869721322.20% pred. B19526616234728127.74% pred. A841791611677210.86% class recall 35.78%50.98%15.50%28.92%6.00% K-NN

Conclusion  MarkDown 1 to 5 has the highest weight as 16 which mean it really makes an enormous impact on the sales. Promotion will increase weekly sales remarkably.  Fuel price and temperature also makes a positive impact, higher price makes higher sales.  CPI and Unemployment rate having a heavy negative impact on the prospects of sales. The higher CPI and unemployment rate, the less weekly sales.  Holidays affect weekly sales slightly. I think customers don’t care whether today is holiday or not, the only reason they buy items is promotion.

WALMART RECRUITING – STORE SALES FORECASTING Chaoyi Liu, Yuqing Lu, Haoran Wu Rajendran, Goutham.

Similar presentations

Presentation on theme: "WALMART RECRUITING – STORE SALES FORECASTING Chaoyi Liu, Yuqing Lu, Haoran Wu Rajendran, Goutham."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

WALMART RECRUITING – STORE SALES FORECASTING Chaoyi Liu, Yuqing Lu, Haoran Wu Rajendran, Goutham.

Similar presentations

Presentation on theme: "WALMART RECRUITING – STORE SALES FORECASTING Chaoyi Liu, Yuqing Lu, Haoran Wu Rajendran, Goutham."— Presentation transcript:

Similar presentations

About project

Feedback