Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Prof. Dr.-Ing. Wolfgang Lehner | Komplexpraktikum Datenbank-Anwendungen.

Similar presentations


Presentation on theme: "© Prof. Dr.-Ing. Wolfgang Lehner | Komplexpraktikum Datenbank-Anwendungen."— Presentation transcript:

1 © Prof. Dr.-Ing. Wolfgang Lehner | Komplexpraktikum Datenbank-Anwendungen

2 © Claudio Hartmann| | 2 Organisatorisches Wer? Claudio Ulrike Für wen? Diplom PO 2004 Informatik, Medieninformatik Komplexpraktikum (Schein) Diplom PO 2010 Informatik Modul PM-FPA Bachelor Informatik B-510, B-520 Medieninformatik B-530, B-540 Master Informatik PM-FPA Medieninformatik E-4 Leistungsumfang 4 SWS oder 8 SWS Komplexpraktikum Datenbank-Anwendungen

3 © Claudio Hartmann| | 3 Organisatorisches Ziele Selbständig… In Sachverhalte einarbeiten Probleme erkennen und Lösungsansätze entwickeln Eigene Ansätze und Ideen umsetzen und Evaluieren Ablauf Kick-Off in erster Vorlesungswoche (heute) Sync-Treffen in regelmäßigen Abständen (Termin?) Alle 1-2 Wochen Abschlusspräsentation Vorstellen der entworfenen Ansätze und Ergebnisse Am Ende des Semesters Kommunikation/Material Diskussion - Auditorium (https://auditorium.inf.tu-dresden.de/courses/ )https://auditorium.inf.tu-dresden.de/courses/ Code & Testdaten - SVN (Zugang folgt) Komplexpraktikum Datenbank-Anwendungen

4 © Claudio Hartmann| | 4 Scenario & Challenges Monthly Report Komplexpraktikum Datenbank-Anwendungen Adjust ForecastImputeRefine Aggregate Updates Estimation PTV COOL MFD … ESCC … Historie FIRA Targets Sneak Peak Reporting Missing data: fill gaps Further targets Outlier Detection Fraud Detection Development Reports

5 © Claudio Hartmann| | 5 The data look like this… Time column period (monthly date stamp) Measure columns sales_units / _nc_ne / _nc_e / _c_ne purchase_units stock_new_units Attribute columns Some product group specific (id, price; e.g. color, energy_label, size, brand, …) Some outlet specific (id, distributionfactor, extrapolationfactor, turnover_class, nuts1, channel, …) Size (cooling) 2.3 mio item x outlet-tuple on 5051 items, 1116 outlets and 36 periods FIRA / SIS outlet item

6 © Claudio Hartmann| | 6 Common scenario Model usage Train a statistical model on historical data Model seasonal and trend effects Requires equidistant values Calculate forecast values Komplexpraktikum Datenbank-Anwendungen 1 st year2 nd year3 rd year Model optimize Liebherr KT 1434

7 © Claudio Hartmann| | 7 Problem Too short time series Very sparse data on low aggregation levels No statistical model available for some specific time series Komplexpraktikum Datenbank-Anwendungen 1 st year2 nd year3 rd year Liebherr KT 1434 Bosch KSl 20s53 SEG MS210 A …

8 © Claudio Hartmann| | 8 Solution Cross-sectional forecasting Assume similar behavior of some groups of time series Use transitions over months from a set of time series Train model on transitions of several time series Use last known period as input to calculate forecasts Komplexpraktikum Datenbank-Anwendungen 1 st year2 nd year3 rd year Liebherr KT 1434 Bosch KSl 20s53 SEG MS210 A Report calculation on previous period … Model

9 © Claudio Hartmann| | 9 Attribute hierarchies FIRA / SIS outlet item outlet item ESP TSS channel YES NO outlet item no_frost channel X no_frost outlet item ESP - YES ESP - NO TSS - YES TSS - NO All x item Many different ways to partition the data Different forecast error on different forecast targets Research goal What is the best partition for which forecast target?

10 © Claudio Hartmann| | 10 Parallel FIRA processes Distribute each value of each attribute to an dedicated node Parallel execution of FIRA-process for each value Split into two phases Komplexpraktikum Datenbank-Anwendungen Node 1 channel ESP Node 2 channel TS Node 3 nofrost YES Node 4 nofrost NO Node 5 nofrost N.A. Node 6 brand AEG Relation cooling Node 7 brand SIEMENS Node 8 Channel x nofrost ESP & YES Node 9 Channel x nofrost ESP & NO … DB-Server Adjust ForecastImputeRefine Aggregate Updates Estimation PTV COOL MFD … ESCC … Historie FIRA process configuration system configuration Self-Adjusting Imputation System configuration repository database Estimation process PTV COOL MFD … FIRA Variant 2 Variant 3 ESCC … model exploitation / usage

11 © Claudio Hartmann| | 11 Phases of prediction approach Prediction phase Fetch all necessary data for model training Transitions of all time series covered by the attribute value Train the model Fetch data from the pre period Calculate predictions Evaluation phase Fetch data of predicted period Join with predicted data Calculate forecast error Komplexpraktikum Datenbank-Anwendungen Model error

12 © Claudio Hartmann| | 12 Prediction phase one attribute 1 st Map-Reduce Map: Create Model on item x attribute level Calculate forecasts on item x attribute level Modell training data: Komplexpraktikum Datenbank-Anwendungen Query workload Fetch time series corresponding to nodes task Once per time slice SELECT … FROM (SELECT … FROM cooling WHERE nindex IN ( 15, 3 ) AND channel = ELECTRICSP AND sales_units>0 group by itemid,nindex ) AS foo, (SELECT … FROM cooling WHERE nindex IN ( 14, 2 ) AND channel = ELECTRICSP GROUP BY itemid,nindex ) AS bar WHERE foo.nindex = bar.nindex+1 AND foo.itemid=bar.itemid AND sales_units_1>0 Zielperiode: 27 outlet item ESP Node 1 channel ELECTRICSP DB-Server

13 © Claudio Hartmann| | 13 Prediction phase one attribute 1 st Map-Reduce Map: Create Model on item x attribute level Calculate forecasts on item x attribute level Modell input data: Komplexpraktikum Datenbank-Anwendungen Query workload Fetch time series corresponding to nodes task Once per time slice SELECT * FROM ( SELECT … FROM cooling WHERE nindex = 26 AND channel = ELECTRICSP GROUP BY itemid )foo WHERE sales_units_1>0 AND stock_new_units_1>=0 Zielperiode: 27 Node 1 channel ELECTRICSP DB-Server outlet item ESP

14 © Claudio Hartmann| | 14 Prediction phase one attribute 1 st Map-Reduce Map: Create Model on item x attribute level Calculate forecasts on item x attribute level Modell training data: Komplexpraktikum Datenbank-Anwendungen Query workload Fetch time series corresponding to nodes task Once per time slice Node 2 channel TECSUPERST DB-Server SELECT … FROM (SELECT … FROM cooling WHERE nindex IN ( 15, 3 ) AND channel = TECSUPERST AND sales_units>0 group by itemid,nindex ) AS foo, (SELECT … FROM cooling WHERE nindex IN ( 14, 2 ) AND channel = TECSUPERST GROUP BY itemid,nindex ) AS bar WHERE foo.nindex = bar.nindex+1 AND foo.itemid=bar.itemid AND sales_units_1>0 Zielperiode: 27 outlet item TSS

15 © Claudio Hartmann| | 15 Prediction phase two attributes 1 st Map-Reduce Map: Create Model on item x attribute level Calculate forecasts on item x attribute level Modell training data: Komplexpraktikum Datenbank-Anwendungen Query workload Fetch time series corresponding to nodes task Once per time slice SELECT … FROM (SELECT … FROM cooling WHERE nindex IN ( 15, 3 ) AND channel = ELECTRICSP AND nofrost = YES AND … group by itemid,nindex ) AS foo, (SELECT … FROM cooling WHERE nindex IN ( 14, 2 ) AND channel = ELECTRICSP AND nofrost = YES GROUP BY itemid,nindex ) AS bar WHERE foo.nindex = bar.nindex+1 AND foo.itemid=bar.itemid AND sales_units_1>0 Zielperiode: 27 outlet item ESP - YES Node 8 Channel x nofrost ELECTRICSP & YES DB-Server

16 © Claudio Hartmann| | 16 Evaluation phase 2 nd Map-Reduce: Map Get real data from database Join with predictions Reduce: Aggregate Data to demanded aggregation level Calculate error Komplexpraktikum Datenbank-Anwendungen Query workload Fetch all data for one task to join with predictions and calculate errors Only once SELECT … FROM ( SELECT nindex AS time, itemid AS itemid FROM cooling WHERE nindex>12 AND channel = ELECTRICSP GROUP BY time, itemid HAVING sum(sales_units)>0 ) AS t1, ( SELECT … FROM cooling WHERE nindex>13 AND channel = ESP GROUP BY … ) AS t2 WHERE t1.time+1=t2.time AND t1.itemid=t2.itemid Node 1 channel ELECTRICSP DB-Server error

17 © Claudio Hartmann| | 17 Parallel FIRA processes Distribute each value of each attribute to an dedicated node Parallel execution of FIRA-process for each value Split into two phases Komplexpraktikum Datenbank-Anwendungen Node 1 channel ESP Node 2 channel TS Node 3 nofrost YES Node 4 nofrost NO Node 5 nofrost N.A. Node 6 brand AEG Relation cooling Node 7 brand SIEMENS Node 8 Channel x nofrost ESP & YES Node 9 Channel x nofrost ESP & NO … DB-Server Adjust ForecastImputeRefine Aggregate Updates Estimation PTV COOL MFD … ESCC … Historie FIRA process configuration system configuration Self-Adjusting Imputation System configuration repository database Estimation process PTV COOL MFD … FIRA Variant 2 Variant 3 ESCC … model exploitation / usage

18 © Claudio Hartmann| | 18 Zielstellungen Einarbeiten in notwendige Technologien Hadoop, RDBMS, verteilte Datenbanken Prognoseansatz (Workload) Verkürzung der Ausführungszeit durch Optimierung des Datentransfers Erarbeiten verschiedener Ansätze zur Datenhaltung Hadoop-basierte Lösungen RDBMS-basierte Lösungen Andere Lösungsansätze? Mglw. einschließlich angepasster Prognoseverarbeitung Evaluation (Erweiterter Aufgabenbereich für 8 SWS) Vergleich gegen SetUp mit zentraler Datenbank Besondere Eignung einzelner Ansätze herausstellen und begründen Komplexpraktikum Datenbank-Anwendungen

19 © Claudio Hartmann| | 19 Einstieg Einarbeitung in Hadoop Hadoop v Einrichten eines Single node clusters als erste Testumgebung Einrichten von R und RHadoop https://github.com/RevolutionAnalytics/RHadoop/wiki Komplexpraktikum Datenbank-Anwendungen


Download ppt "© Prof. Dr.-Ing. Wolfgang Lehner | Komplexpraktikum Datenbank-Anwendungen."

Similar presentations


Ads by Google