Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School

Similar presentations


Presentation on theme: "Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School"— Presentation transcript:

1 Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School www.wi.hs-wismar.de/~laemmel Uwe.Laemmel@hs-wismar.de

2 Neural Networks and Data Mining Folie 2 Content  Data Mining  Classification: approach  Data Mining Cup –2004: Who will cancel? –2007: Who will get a rebate coupon? –2008: How long will someone participate in a lottery? –2009: Forecast of book sales figures –2010 ?  Clustering: approach –Behaviour of bank customers

3 Neural Networks and Data Mining Folie 3 Data Mining Data Mining is a –systematic and automated discovery and extraction –of previously unknown knowledge –out of huge amount of data. "KDD – Knowledge Discovery in Data bases" – synonym Notion wrong: Gold Mining  Data Mining

4 Neural Networks and Data Mining Folie 4 Data Mining – Applications  classification  clustering  association  prediction  text mining  web mining clustering  partitioning a data set into subsets (clusters), so that the data in each subset (ideally) share some common features –similarity or proximity for some defined distance measure  is building classes classification  items are placed in subsets (classes)  classes have known properties –customer is bad, average, good –pattern recognition –…  set of training items is used to train the classification algorithm

5 Neural Networks and Data Mining Folie 5 Data Mining Process CRISP-DM model

6 Neural Networks and Data Mining Folie 6 Content  Data Mining  Classification: approach using NN  Data Mining Cup  Clustering: approach

7 Neural Networks and Data Mining Folie 7 Classification using NN prerequisite  set of training pattern (many patterns) approach  code the values  divide set of training pattern into: –training set –test set  build a network  train the network using the training set  check the network quality using the test set real data training p. coded p. training set test set

8 Neural Networks and Data Mining Folie 8 Development of an NN-application calculate network output compare to teaching output use Test set data evaluate output compare to teaching output change parameters modify weights input of training pattern build a network architecture quality is good enough error is too high quality is good enough

9 Neural Networks and Data Mining Folie 9 Build an Artificial Neural Network  Number of Input Neurons? –depends on the number of attributes –depends on the coding  Number of Output Neurons? –depends on the coding of the class attribute  Number of Hidden Neurons? –experiments necessary –generally: not more than input neurons –quarter … half of number of input neurons may work –see capacity of a neural network

10 Neural Networks and Data Mining Folie 10 Experiments using the JavaNNS  Build a network  Load training-pattern  open the Error Graph  open the Control Panel  Initialize the network  try different learning parameter: 0.1, 0.2, 0.5, 0.8  Start Learning

11 Neural Networks and Data Mining Folie 11 Getting Results  value the error  Finally: –make the test-Pattern the actual one –Save Data … – include output files – save as a.res-file  Evaluate the.res-file

12 Neural Networks and Data Mining Folie 12 Experiments How can we improve the results? –Data pre-processing? –Architecture of ANN? –Learning Parameters? –Evaluation of the results: post-processing? record your work!

13 Neural Networks and Data Mining Folie 13 Content  Data Mining  Classification: approach  Data Mining Cup –2004: Who will cancel? –2007: Who will get a rebate coupon? –2008: How long will someone participate in a lottery? –2009: Forecast of book sales figures –2010 ?  Clustering: approach –Behaviour of bank customers

14 Neural Networks and Data Mining Folie 14 Data Mining Cup www.data–mining–cup.de  annual competition for students  runs April – May /June  real world problem: –problem –set of training data –set of data for classification –to be developed: classification  supported by many companies (data/software)  ~ 200 – 300 participants  workshop (user day)

15 Neural Networks and Data Mining Folie 15 DMC2004: A Mailing Action  mailing action of a company: –special offer –estimated annual income per customer:  given: –10,000 sets of customer data containing 1,000 cancellers (training)  problem: –test set contains 10,000 customer data –Who will cancel ? –Whom to send an offer? customer will cancel will not cancel gets an offer 43.80€66.30€ gets no offer 0.00€72.00€

16 Neural Networks and Data Mining Folie 16 Mailing Action – Aim?  no mailing action: –9,000 x 72.00 = 648,000  everybody gets an offer: –1,000 x 43.80 + 9,000 x 66.30 = 640,500  maximum (100% correct classification): –1,000 x 43.80 + 9,000 x 72.00 = 691,800 customer will cancel will not cancel gets an offer 43.80€66.30€ gets no offer 0.00€72.00€

17 Neural Networks and Data Mining Folie 17 Goal Function: Lift basis: no mailing action: 9,000 · 72.00 goal = extra income: lift M = 43.8 · c M + 66.30 · nk M – 72.00· nk M customer will cancel will not cancel gets an offer 43.80€66.30€ gets no offer 0.00€72.00€

18 Neural Networks and Data Mining Folie 18 Data results> <important ^missing values^  ----- 32 input data ------ 

19 Neural Networks and Data Mining Folie 19 Feed Forward Network – What to do?  train the net with training set (10,000)  test the net using the test set ( another 10,000) –classify all 10,000 customer into canceller or loyal –evaluate the additional income

20 Neural Networks and Data Mining Folie 20 Results data mining cup 2002 neural network project 2004 gain: –additional income by the mailing action if target group was chosen according analysis

21 Neural Networks and Data Mining Folie 21 DMC 2007: Rebate System Check-out couponing allows an individual coupon generation at the check-out The coupon is printed at the end of the sales slip depending on the current customer. Questions: –How can the retailer identify whether a customer is a potential couponing customer? –On what coupons he will respond?

22 Neural Networks and Data Mining Folie 22 Couponing  Print: –coupon A –coupon B –No coupon  50,000 customer cards for training  Classify another 50,000 customer!  Cost function: –coupon not redeemed (false assignment to A or B): –1 –coupon A redeemed (correct assignment to A): +3 –coupon B redeemed (correct assignment to B): +6 Maximize the value!

23 Neural Networks and Data Mining Folie 23 Data Understanding  What is the meaning of the attributes?  Type and range of values?

24 Neural Networks and Data Mining Folie 24 20–20–2 Network Profit = 3  AA + 6  BB – (NA+NB+BA+AB) results:  winner 20077,890  my version6,714  our students6,468 (73/230)

25 Neural Networks and Data Mining Folie 25 DMC2008: Participation in a Lottery Predicting, at the beginning of the lottery, how long participants will participate:  0 – The first ticket has not been paid for  1 – Only the ticket for the first class has been paid for  2 – Only the first two classes were played  3 – The lottery was played until the end but no ticket purchased for the following lottery  4 – At least first ticket for the following lottery purchased cost matrix

26 Neural Networks and Data Mining Folie 26 Data  113,476 pattern!  69 attributes –new customer (yes/no) –age –bank –car –…

27 Neural Networks and Data Mining Folie 27 100–40–20–5 Network results:  1,030,240 RWTH Aachen (1) … 1,024,535 RWTH Aachen (8)  865,565 Bauhaus Univ. Weimar (100)  Univ. Wismar: 878,550 – 835,035  – 1,494,315 (212)

28 Neural Networks and Data Mining Folie 28 DMC 2009 – online bookshop „Libri“  Sales figures training: –more than 1.800 books –2.418 shops  Sales figures forecast –8 books –2.394 shops

29 Neural Networks and Data Mining Folie 29 DMC 2009 – online bookshop „Libri“

30 Neural Networks and Data Mining Folie 30 DMC 2009 – 83-25-9-3 network

31 Neural Networks and Data Mining Folie 31 DMC 2010: Revenue maximisation by intelligent couponing  Many customers only make an order in an online shop once  decision whether to send a voucher worth € 5.00  voucher for those who would not have decided to re-order by themselves.  32,427 data sets for training  32,428 data sets for prediction  37 attributes per set + target attribute in training set

32 Neural Networks and Data Mining Folie 32 DMC 2010  out of 67 teams!

33 Neural Networks and Data Mining Folie 33 Content  Data Mining  Classification: approach  Data Mining Cup  Clustering: approach –Behaviour of bank customers

34 Neural Networks and Data Mining Folie 34 Clustering Transaction Data Co–operation  Hochschule Wismar  HypoVereinsbank  Medienhaus Rostock Issue  What information can be extracted from turnover time series? Strategy 1.Clustering time series data 2.Assign customers/accounts to clusters 3.Examine clusters

35 Neural Networks and Data Mining Folie 35 Transaction Data & Time Series Original financial data not suitable:  Order of values is important  Time displacements are problematic Corporate clients  223 branches Cumulated transactions per  Month  Account  Type of transaction... for a total of 6 years

36 Neural Networks and Data Mining Folie 36 Fourier versus Original Data No displacement Similarity detected on both:  transaction curve and  frequency spectrum Data is displaced frequency spectrum shows similarity

37 Neural Networks and Data Mining Folie 37 Using a classification model Clustering Sequence A Initial Cluster Preprocessin g Classification Model t0t0 tmtm 1. Building the Model Customer Turnover... New Cluster Sequence B Preprocessin g t 0+n t m+n 2. Applying the model Identical ? 3. Comparing cluster assignments Different Initial Cluster

38 Neural Networks and Data Mining Folie 38 Clustering & Prediction Results  140.000 records  1 record = 1 account  6x5 SOM = max. 30 clusters  average changes of cluster assignments: ca. 19% Variability per Business Sector 22,3%Taxi239/1070 22,3%Ship Broker Offices64/471 20,9%Churches228/1091 20,2%Trucking1010/5008

39 Neural Networks and Data Mining Folie 39 Ende


Download ppt "Neural Networks and Data Mining Folie 1 Artificial Neural Networks and Data Mining Uwe Lämmel Wismar Business School"

Similar presentations


Ads by Google