Komplexpraktikum Datenbank-Anwendungen

Komplexpraktikum Datenbank-Anwendungen

Organisatorisches Wer? Für wen? Leistungsumfang
Claudio Hartmann Ulrike Fischer Für wen? Diplom PO 2004 Informatik, Medieninformatik Komplexpraktikum (Schein) Diplom PO 2010 Informatik Modul PM-FPA Bachelor Informatik B-510, B-520 Medieninformatik B-530, B-540 Master Informatik PM-FPA Medieninformatik E-4 Leistungsumfang 4 SWS oder 8 SWS Komplexpraktikum Datenbank-Anwendungen

Organisatorisches Ziele Ablauf Kommunikation/Material Selbständig…
In Sachverhalte einarbeiten Probleme erkennen und Lösungsansätze entwickeln Eigene Ansätze und Ideen umsetzen und Evaluieren Ablauf Kick-Off in erster Vorlesungswoche (heute) Sync-Treffen in regelmäßigen Abständen (Termin?) Alle 1-2 Wochen Abschlusspräsentation Vorstellen der entworfenen Ansätze und Ergebnisse Am Ende des Semesters Kommunikation/Material Diskussion - Auditorium ( Code & Testdaten - SVN (Zugang folgt) Komplexpraktikum Datenbank-Anwendungen

Scenario & Challenges Monthly Report Targets Further targets
PTV COOL MFD … ES CC Historie Targets Sneak Peak Reporting Missing data: fill gaps Further targets Outlier Detection Fraud Detection Development Reports Updates Forecast Impute Refine Adjust Aggregate FIRA Estimation Komplexpraktikum Datenbank-Anwendungen

The data look like this…
Time column period (monthly date stamp) Measure columns sales_units / _nc_ne / _nc_e / _c_ne purchase_units stock_new_units Attribute columns Some product group specific (id, price; e.g. color, energy_label, size, brand, …) Some outlet specific (id, distributionfactor, extrapolationfactor, turnover_class, nuts1, channel, …) Size (cooling) 2.3 mio “item x outlet”-tuple on 5051 items, 1116 outlets and 36 periods outlet item FIRA / SIS

Common scenario Model usage
Train a statistical model on historical data Model seasonal and trend effects Requires equidistant values Calculate forecast values 1st year 2nd year 3rd year Liebherr KT 1434 optimize Model Komplexpraktikum Datenbank-Anwendungen

… Problem Too short time series
Very sparse data on low aggregation levels No statistical model available for some specific time series 1st year 2nd year 3rd year Liebherr KT 1434 Bosch KSl 20s53 SEG MS210 A … Komplexpraktikum Datenbank-Anwendungen

… Solution Cross-sectional forecasting
Assume similar behavior of some groups of time series Use transitions over months from a set of time series Train model on transitions of several time series Use last known period as input to calculate forecasts 1st year 2nd year 3rd year Liebherr KT 1434 Bosch KSl 20s53 SEG MS210 A … Model Report calculation on previous period Komplexpraktikum Datenbank-Anwendungen

Attribute hierarchies
Many different ways to partition the data Different forecast error on different forecast targets Research goal What is the best partition for which forecast target? channel X no_frost channel no_frost outlet item outlet item ESP TSS YES NO outlet item outlet item ESP - YES NO TSS All x item FIRA / SIS

Parallel FIRA processes
process configuration system configuration Self-Adjusting Imputation System configuration repository database Estimation process PTV COOL MFD … FIRA Variant 2 Variant 3 ES CC model exploitation / usage Distribute each value of each attribute to an dedicated node Parallel execution of FIRA-process for each value Split into two phases Node 1 channel ESP Node 2 channel TS Node 3 nofrost YES Node 4 NO Node 5 N.A. Node 6 brand AEG Relation cooling Node 7 SIEMENS Node 8 Channel x nofrost ESP & YES Node 9 ESP & NO … DB-Server Adjust Forecast Impute Refine Aggregate Updates Estimation PTV COOL MFD … ES CC Historie FIRA Komplexpraktikum Datenbank-Anwendungen

Phases of prediction approach
Prediction phase Fetch all necessary data for model training Transitions of all time series covered by the attribute value Train the model Fetch data from the pre period Calculate predictions Evaluation phase Fetch data of predicted period Join with predicted data Calculate forecast error Model Model error Komplexpraktikum Datenbank-Anwendungen

Prediction phase one attribute
1st Map-Reduce Map: Create Model on item x attribute level Calculate forecasts on item x attribute level Modell training data: Query workload Fetch time series corresponding to nodes task Once per time slice Zielperiode: 27 SELECT … FROM (SELECT … FROM cooling WHERE nindex IN ( 15, 3 ) AND channel = ‘ELECTRICSP’ AND sales_units>0 group by itemid,nindex ) AS foo, WHERE nindex IN ( 14, 2 ) AND channel = ‘ELECTRICSP’ GROUP BY itemid,nindex ) AS bar WHERE foo.nindex = bar.nindex+1 AND foo.itemid=bar.itemid AND sales_units_1>0 outlet item ESP DB-Server Node 1 channel ‘ELECTRICSP’ Komplexpraktikum Datenbank-Anwendungen

1st Map-Reduce Map: Create Model on item x attribute level Calculate forecasts on item x attribute level Modell input data: Query workload Fetch time series corresponding to nodes task Once per time slice Zielperiode: 27 SELECT * FROM ( SELECT … FROM cooling WHERE nindex = 26 AND channel = ‘ELECTRICSP’ GROUP BY itemid )foo WHERE sales_units_1>0 AND stock_new_units_1>=0 outlet item ESP DB-Server Node 1 channel ‘ELECTRICSP’ Komplexpraktikum Datenbank-Anwendungen

1st Map-Reduce Map: Create Model on item x attribute level Calculate forecasts on item x attribute level Modell training data: Query workload Fetch time series corresponding to nodes task Once per time slice Zielperiode: 27 SELECT … FROM (SELECT … FROM cooling WHERE nindex IN ( 15, 3 ) AND channel = ‘TECSUPERST’ AND sales_units>0 group by itemid,nindex ) AS foo, WHERE nindex IN ( 14, 2 ) AND channel = ‘TECSUPERST’ GROUP BY itemid,nindex ) AS bar WHERE foo.nindex = bar.nindex+1 AND foo.itemid=bar.itemid AND sales_units_1>0 outlet item TSS DB-Server Node 2 channel ‘TECSUPERST’ Komplexpraktikum Datenbank-Anwendungen

Prediction phase two attributes
1st Map-Reduce Map: Create Model on item x attribute level Calculate forecasts on item x attribute level Modell training data: Query workload Fetch time series corresponding to nodes task Once per time slice Zielperiode: 27 SELECT … FROM (SELECT … FROM cooling WHERE nindex IN ( 15, 3 ) AND channel = ‘ELECTRICSP’ AND nofrost = ‘YES’ AND … group by itemid,nindex ) AS foo, WHERE nindex IN ( 14, 2 ) AND channel = ‘ELECTRICSP’ AND nofrost = ‘YES’ GROUP BY itemid,nindex ) AS bar WHERE foo.nindex = bar.nindex+1 AND foo.itemid=bar.itemid AND sales_units_1>0 outlet item ESP - YES DB-Server Node 8 Channel x nofrost ‘ELECTRICSP’ & ‘YES’ Komplexpraktikum Datenbank-Anwendungen

Evaluation phase 2nd Map-Reduce: Map Reduce: Query workload
Get real data from database Join with predictions Reduce: Aggregate Data to demanded aggregation level Calculate error Query workload Fetch all data for one task to join with predictions and calculate errors Only once SELECT … FROM ( SELECT nindex AS time, itemid AS itemid FROM cooling WHERE nindex>12 AND channel = ‘ELECTRICSP’ GROUP BY time, itemid HAVING sum(sales_units)>0 ) AS t1, ( SELECT … FROM cooling WHERE nindex>13 AND channel = ESP GROUP BY … ) AS t2 WHERE t1.time+1=t2.time AND t1.itemid=t2.itemid DB-Server Node 1 channel ‘ELECTRICSP’ error Komplexpraktikum Datenbank-Anwendungen

Parallel FIRA processes
process configuration system configuration Self-Adjusting Imputation System configuration repository database Estimation process PTV COOL MFD … FIRA Variant 2 Variant 3 ES CC model exploitation / usage Distribute each value of each attribute to an dedicated node Parallel execution of FIRA-process for each value Split into two phases Node 1 channel ESP Node 2 channel TS Node 3 nofrost YES Node 4 NO Node 5 N.A. Node 6 brand AEG Relation cooling Node 7 SIEMENS Node 8 Channel x nofrost ESP & YES Node 9 ESP & NO … DB-Server Adjust Forecast Impute Refine Aggregate Updates Estimation PTV COOL MFD … ES CC Historie FIRA Komplexpraktikum Datenbank-Anwendungen

Zielstellungen Einarbeiten in notwendige Technologien
Hadoop, RDBMS, verteilte Datenbanken Prognoseansatz (Workload) Verkürzung der Ausführungszeit durch Optimierung des Datentransfers Erarbeiten verschiedener Ansätze zur Datenhaltung Hadoop-basierte Lösungen RDBMS-basierte Lösungen Andere Lösungsansätze? Mglw. einschließlich angepasster Prognoseverarbeitung Evaluation (Erweiterter Aufgabenbereich für 8 SWS) Vergleich gegen SetUp mit zentraler Datenbank Besondere Eignung einzelner Ansätze herausstellen und begründen Komplexpraktikum Datenbank-Anwendungen

Einstieg Einarbeitung in Hadoop Hadoop v.1.2.1
Einrichten eines Single node clusters als erste Testumgebung Einrichten von R und RHadoop Komplexpraktikum Datenbank-Anwendungen

Komplexpraktikum Datenbank-Anwendungen

Similar presentations

Presentation on theme: "Komplexpraktikum Datenbank-Anwendungen"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Komplexpraktikum Datenbank-Anwendungen

Similar presentations

Presentation on theme: "Komplexpraktikum Datenbank-Anwendungen"— Presentation transcript:

Similar presentations

About project

Feedback