Presentation is loading. Please wait.

Presentation is loading. Please wait.

The role of the data scientist

Similar presentations


Presentation on theme: "The role of the data scientist"— Presentation transcript:

1 The role of the data scientist
Why a car part manufactory company  needs data experts ICT Innovation Andrea Condorelli, Manager Data Scientist Statale di Milano, Italy 5° of June, 2017

2 MAGNETI MARELLI Quick overview on MM with a special focus on most innovative products
WHY BIG DATA An overview on some hot topics/opportunities/ongoing projects in manufactory regarding Big Data FROM PRODUCT TO SERVICES PARADIGM Some companies where the business model moved from selling pieces of hardware to selling services WHO WORKS WITH DATA Data Scientist vs Data Engineer: identikit of the perfect data expert ENTERPRISE DATA SCIENTIST TOOLBOX Technologies and frameworks, security and privacy in companies OPPORTUNITIES IN MAGNETI MARELLI Q&A

3

4 HR Leadership Development Team
HR ORGANIZATION Giovanni Quaglia CHRO Stefano Facchetti Head of Leadership Development and Process & Systems Donatella Callerio Staffing and Recruitment Manager / Leadership Development Finance & ICT Marta Ragazzi Staffing and Recruitment / Leadership Development ICT

5 Data Science team ICT GOVERNANCE ICT INNOVATION Dario Castello
CIO ICT INNOVATION Luca Demarchi Head of ICT Innovation Condorelli Andrea Data Science Manager Valentina Arrigoni Alberto Catena Data Science team

6 MAGNETI MARELLI

7 Company Overview Magneti Marelli is an international company committed to the design and production of hi-tech systems and components for the automotive sector. AUTOMOTIVE LIGHTING (Headlamp, Rearlamp, Lighting and Body Electronics) POWERTRAIN (Gasoline and Diesel engine control, Electric Motor, Inverter and Transmission) ELECTRONICS (Instrument Clusters, Infotainment & Telematics) SUSPENSION SYSTEMS AND SHOCK ABSORBERS (Suspension Systems, Shock Absorbers and Dynamic Systems) EXHAUST SYSTEMS (Manifolds, Catalytic converter, Diesel Particulate Filter and Mufflers) PLASTIC COMPONENTS AND MODULES (Bumper, Dashboard, Central Console, Pedals, Hand Brake Levers and Fuel System) AFTERMARKET PARTS & SERVICES (Mechanical, Body Work, Electrics and Electronic and Consumables) MOTORSPORT (Injection Systems, Electronic Control Units, Hybrid Systems, Telemetry Systems, Electric Actuators)

8 PP: Production Plant R&D: R&D Center AC: Application Center
Magneti Marelli Worldwide Presence PP: Production Plant R&D: R&D Center AC: Application Center FRANCE PP - R&D – AC GERMANY PP - R&D – AC UK POLAND PP - AC CZECH REP. PP - AC USA PP - R&D – AC SLOVAKIA PP MEXICO PP - AC RUSSIA PP - AC SERBIA PP BRASIL PP – R&D - AC CHINA PP - R&D – AC ARGENTINA PP JAPAN AC MALAYSIA PP - AC ITALY PP – R&D – AC INDIA PP – R&D - AC KOREA SPAIN PP - AC TURKEY PP - AC 12 R&D Centers 5.9% R&D (of sales) 7.9 bn € Sales 2016 86 Production units 5.8% Investments (of sales) 30 Application Centers 42,830 Employees Sales (€ bn) 2009 ACT 2010 ACT 2011 ACT 2012 ACT 2013 ACT 2014 ACT 2015 ACT 2016 ACT

9 Organization COUNTRY/REGION REPRESENTATIVES GLOBAL KEY ACCOUNT
LATAM NAFTA COUNTRY/REGION REPRESENTATIVES INDIA INFORMATION & COMMUNICATION TECHNOLOGY CHINA HUMAN RESOURCES MANUFACTURING JAPAN TECHNOLOGY INNOVATION QUALITY MARKETING COMMUNICATION PROJECT MANAGEMENT OFFICE RISK GOVERNANCE GLOBAL KEY ACCOUNT PURCHASING GENERAL AFFAIRS BUSINESS DEVELOPMENT FINANCE BUSINESS AREAS MOTORSPORT CENTRAL FUNCTIONS AUTOMOTIVE LIGHTING SHOCK ABSORBERS AFTER MARKET PARTS & SERVICES ELECTRONICS EXHAUST SYSTEMS PLASTIC COMPONENTS & MODULES POWERTRAIN SUSPENSION SYSTEMS

10 WHY BIG DATA

11 How to turn data in money
Piece cost reduction: decrease number of scraps lower stocks enhance productivity Making new business: sell new services

12 Finite Product Warehouse
Complexity behind a “simple” product Factory Material Warehouse Pre Production Lines Raw materials External Logistic Internal Logistic Customer WIP Warehouse External Logistic Assembly Lines External Logistic WIP Finite Product Warehouse Internal Logistic

13 Industry 4.0

14 Finite Product Warehouse
Industry 4.0 – Traceability and IOT Factory External Logistic Material Warehouse Pre Production Lines Step 1 Step 2 Step 3 Raw material Material ID Step 1 data Machine 1 … Leave TS Item ID Arrive TS Lot ID Arrive TS Supplier Material info Item ID ts Step 1… Internal Logistic WIP Warehouse Assembly Lines Finite Product Warehouse Internal Logistic Step 4 Step 3 Step 2 Step 1

15 Industry 4.0 – Traceability and IOT
Enhancing recall campaigns Deeply understanding of each process Compute the real cost of each piece

16 Industry 4.0 – Predictive Quality
Step 1 Worker Machine Parameters Machine Sensors Step 2 Worker Machine Parameters Machine Sensors Step 3 Worker Machine Parameters Machine Sensors Material Warehouse WIP Warehouse SCRAP SCRAP SCRAP SCRAP SCRAP SCRAP

17 ? ? Industry 4.0 – Predictive Quality SCRAP Step 1 Step 2 Step 3
Worker Machine Parameters Machine Sensors Step 2 Worker Machine Parameters Machine Sensors Step 3 Worker Machine Parameters Machine Sensors ? Material Warehouse SCRAP ?

18 Industry 4.0 – Predictive Quality
Classification/Prediction problem “Given a context, predict the probability the specific item will arrive to the following station/it will be discarded” A scrap could be done due to several reasons: Human error Some HW/SW machine failure Material problem Wrong process/issues on line design

19 Industry 4.0 – Predictive Quality
The context is pretty hard to describe (feature engineering): Each piece worked before each scrap has a very similar context Tasks are complex and different People are involved, it is hard to quantify: fatigue experience in a given task mood stress A lot of machines are sensorless The data change over time The problem is not linear and has “memory”

20 Industry 4.0 – Predictive Quality
Data Extraction Feature Engineering Classification Show Results Precision: very high zero

21 Descriptive Statistics
Industry 4.0 – Predictive Quality Data Extraction X Feature Engineering Classification Show Results Precision: Medium Low Descriptive Statistics Visual Exploration Visual Exploration Cleaning Data

22 Descriptive Statistics
Industry 4.0 – Predictive Quality Data Extraction Feature Engineering Classification Show Results Precision: >Medium >Low Lesson learned: some shift must be filtered out we must add additional pieces of information Descriptive Statistics Visual Exploration Visual Exploration Cleaning Data

23 Industry 4.0 – Predictive Quality
Data Extraction Hard Cleaning CLEAN DATA Feature Engineering Classification Show Results Precision: High High

24 Industry 4.0 – Predictive Quality
Reducing scraps working on “critical” context Simulating different context to “explore” new configurations (e.g., one arm bandit on team configurations) Reducing the cost of each scrap

25 FROM PRODUCT TO SERVICES PARADIGM

26 Rolls Royce 1904 F H Royce is founded in 1904 by Charles Stewart Rolls and Frederick Henry Royce 1915 The Rolls-Royce Eagle was the first aero engine to be developed by Rolls-Royce Limited. 1987 In April 1987 the government offered for sale all Rolls-Royce plc shares. 1996 Birth of TotalCare® as a service for America Airlines for motor repairing 2013 47% of total revenue (7.3B£) on plane engines are from services 2016 80% of Rolls-Royce engines are not sold, but rented out on a hourly basis.

27 WHO WORKS WITH DATA

28 Data team members Data Scientist Data Engineer Data Architect

29 Data Scientist Data Scientist Definition: Must know:
“Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.” Josh Wills, Slack Director of Data Engineering Must know: Python, Sql, Supervised/Unsupervised models, linear algebra, statistic Main everyday tasks: Formalizing any given problem into specific research questions and looking for State of the Art solutions for them Designing and developing Proof of Concepts and Prototypes to show the real value behind data and algorithms Translating Proof of Concepts into something Business people can understand and creating stunning presentation

30 Data Engineer Data Engineer Definition: Must know:
A Data Engineer is a Data Scientist who prefers talking about infrastructures and design patterns over Bayesian statistics and XGBoost classifier Must know: Java/Scala, Python, Sql and noSQL DBs, design patterns Main everyday tasks: Moving Proof of Concepts from data scientist playground to production Designing, constructing, installing, testing and maintaining highly scalable data management systems Employing a variety of languages and tools (e.g. scripting languages) to marry systems together

31 Data Architect Data Architect Definition: Must know:
A Data Architect is a aged Data Scientist/Engineer with a lot of experience in enterprise infrastructures. Must know: Hadoop, SQL and noSQL DBs, design patterns, enterprise infrastructures, security, lambda/gamma architecture, Docker Main everyday tasks: Building data products from scraps Design and deploy complex data processing workflows Deploy Big Data ready environments to support Data Engineer and Data Scientist work

32 Ideal world Real world The team Data Superman 1*
1 Data Architect/Data team manager 2-3 Data engineer to support the development and go-lives 3-5 Data Scientist for fast prototyping and complex models/analysis 1* 2-3* 3-5*

33 Data Superman Learn how to use the whole data toolbox He must be able to face the majority of IT challenges by yourself, from bash to Dockerfile

34 Data Superman Learn how to use the whole data toolbox He must be able to face the majority of IT challenges by yourself, from bash to Dockerfile Learn how to explain your work A good story with average results is better than a boring story with good results (stunning results always win)

35 Data Superman Learn how to use the whole data toolbox He must be able to face the majority of IT challenges by yourself, from bash to Dockerfile Learn how to explain your work A good story with average results is better than a boring story with good results (stunning results always win) Learn how to write “good code” Working like the person you are going to show your code is a psychotic killer.

36 Data Superman Learn how to use the whole data toolbox He must be able to face the majority of IT challenges by yourself, from bash to Dockerfile Learn how to explain your work A good story with average results is better than a boring story with good results (stunning results always win) Learn how to write “good code” Working like the person you are going to show your code is a psychotic killer. Learn how to understand business needs Well posed questions are very rare

37 Data Superman Learn how to use the whole data toolbox He must be able to face the majority of IT challenges by yourself, from bash to Dockerfile Learn how to explain your work A good story with average results is better than a boring story with good results (stunning results always win) Learn how to write “good code” Working like the person you are going to show your code is a psychotic killer. Learn how to understand business needs Well posed questions are very rare Learn how to make models work Data Superman needs to be comfortable with mathematics and statistics

38 Data Superman Learn how to use the whole data toolbox He must be able to face the majority of IT challenges by yourself, from bash to Dockerfile Learn how to explain your work A good story with average results is better than a boring story with good results (stunning results always win) Learn how to write “good code” Working like the person you are going to show your code one day is a psychotic killer. Learn how to understand business needs Well posed questions are very rare Learn how to make models work Data Superman needs to be comfortable with mathematics and statistics

39 ENTERPRISE DATA SCIENTIST TOOLBOX

40 2-3 tools…

41 A short list basket Programming Language

42 A short list basket – Programming Language
Python R High-Level Lower-Level C++ Scala Java

43 A short list basket – Programming Language
Python Why Python: Fast prototyping language Easy to install and to manage Lambda functions Tensorflow + sklearn

44 A short list basket Programming Language Python Big Data framework

45 A short list basket – Big Data Framework
Apache Flink Apache Spark MapReduce Apache Storm

46 A short list basket – Big Data Framework
Why Spark: In-memory computation Great community Stable and easy to manage ML-LIB is great for fast prototyping Offered “as-a-service” by different players (Amazon EMR, Google Dataproc, Cloudera, DataBricks, …) Apache Spark

47 A short list basket Programming Language Python
Big Data framework Apache Spark DB

48 ElasticSearch + Kibana
A short list basket – DB MySQL MariaDB ElasticSearch + Kibana Cassandra MongoDB

49 ElasticSearch + Kibana
A short list basket – DB ElasticSearch + Kibana Why ElasticSearch+Kibana: Big Data ready Fast full text search (based on Lucene) Ultra-fast dashboarding (Kibana) MongoDB Why MongoDB: Big Data ready Document based queries Bson + array

50 A short list basket Programming Language Python
Big Data framework Apache Spark DB ElasticSearch+Kibana or MongoDB Other

51 A short list basket – Other
Plotly SkLearn Tensorflow Jupyter

52 A short list basket Programming Language Python
Big Data framework Apache Spark DB ElasticSearch+Kibana or MongoDB Other Python libraries: Plotly, SkLearn, Tensorflow “IDE”: Jupyter

53 A data product Spark Python HTML+JS Backend Frontend GUI
ML Algorithm ETL Plotly Django HTML5 Bootstrap MongoDB Backend Frontend GUI

54 A short list basket Programming Language Python
Big Data framework Apache Spark DB ElasticSearch+Kibana or MongoDB Other Python libraries: Plotly, SkLearn, Tensorflow “IDE”: Jupyter Container engine Docker

55 OPPORTUNITIES IN MAGNETI MARELLI

56 Opportunities in MAGNETI MARELLI
Marta Ragazzi: Andrea Condorelli: Big Data Workshop October 31st, 2015


Download ppt "The role of the data scientist"

Similar presentations


Ads by Google