Presentation is loading. Please wait.

Presentation is loading. Please wait.

EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE.

Similar presentations


Presentation on theme: "EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE."— Presentation transcript:

1 EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA 29 OCTOBER, SANTA CLARA, CA, USA

2 OUTLINE Introduction Related Work Data Collection Findings Observed Issues Possible Improvements SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY2

3 INTRODUCTION Data science teams do not have an explicit data science team-based process methodology: What steps should be done first? How long each phase of a project should take? Which people with what skills should be involved in the project? SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY3

4 RELATED WORK Data is increasingly being viewed as a strategic resource for the organization [Wade, 2004]. Bid Data can enable new and improved business models that have not been feasible in the past [Tiefenbacher, 2015] Lack of focus on the process teams should use to actually do a data science project [Saltz, 2015] Teams doing data analysis and data science work in an ad hoc fashion, using trial and error to identify the right tools [Bhardwaj, 2015] Data science as a step-by-step process: o Acquisition, information extraction and cleaning, data integration, modeling, analysis, interpretation and deployment [Jagadish, 2014] o Preparation, Analysis, Reflection and Dissemination [Guo, 2013] Understanding of what might be an appropriate data science process methodology is to document case studies of how teams are actually doing data science, especially within a corporate context [Saltz, 2015] SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY4

5 BACKGROUND AND STAKEHOLDER ANALYSIS One of the researchers was embedded within the data science team A global media advertising software company headquartered in New York City The company had a total of 100 people distributed globally SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY5

6 RESEARCH QUESTIONS RQ1. What is the current methodology that they follow? RQ2. What are some possible ways to improve the current methodology, i.e. to make the projects more efficient in time and cost? SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY6

7 DATA COLLECTION Phase I: information was collected prior to one of the researchers being embedded within the data science team Phase II: during a 9 week period, one of the researchers participated as part of the data science team, and in addition to collecting data and observing how the team functioned, actually helped the team with various tasks Phase III: interview with the VP of Data Science SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY7

8 DATA SCIENCE TEAM 2 Data Scientists, including VP of Data Science 3 Data Operations people 3 Software Developers 1 Data Engineer The team was divided across multiple locations SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY8

9 FINDINGS: TYPES OF PROJECTS Routine Projects o on a regular basis o more external o data transformation and pre-processing o performed by data group o deadlines Exploratory Projects o research oriented o no standard methodology is used o performed by VP of data science and embedded researcher o duration of these projects can vary from a week to a year o no official deadlines SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY9

10 FINDINGS: ROLES Data Science o Explores the data and generates insight from the data, including tasks such as data mining and data visualization. This team included the data scientist who was the embedded observer. Data Operations o Getting data from data providers, transformation and preparation of the data for analysis (i.e., for use by the data science team) Software Development o Develop software tools to help the data science team perform data analysis Data Engineering o Supports and improves the existing system and participates in some data science projects SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY10

11 FINDINGS: HIGH LEVEL PROCESS DESCRIPTION SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY11

12 PROCESS FLOW DESCRIPTION: PREPARATION SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY12

13 PROCESS FLOW DESCRIPTION: ANALYSIS SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY13

14 PROCESS FLOW DESCRIPTION: DISSEMINATION SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY14

15 OBSERVED ISSUES / CHALLENGES No specific deadlines for the whole data science project or for any individual phase of the project Project organization and planning Whenever the data science team needs to have a task completed, they send a request to the developers, but the developers typically respond the following day Developers are involved in several projects at the same time SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY15

16 POSSIBLE PROCESS IMPROVEMENTS Documenting the current process Better structuring developer interactions Imposing deadlines Process automation Better preparation SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY16

17 FEEDBACKS ON SUGGESTION SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY17 Suggestion Make sense?Can Implement?Short or Long Term? Documenting the current process 44short Better structuring developers interactions 32long Imposing deadlines 43long Process Automation 43long Better Preparation 33short

18 EFFECTIVE PRACTICES OBSERVED Pre-processing Frequent dialog with senior management: Engaging Senior Management Using a defined SDLC with the software team SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY18

19 CONCLUSION The data science team was not thinking about the process of doing the projects Suggestions received positive feedbacks Studying additional organizations might be helpful to examine if the suggestions and feedback from this study are related to the current size, organizational structure or domain of the company, and if there are any patterns observed across the organizations doing data science projects SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY19

20 THANK YOU SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY20


Download ppt "EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE."

Similar presentations


Ads by Google