Presentation is loading. Please wait.

Presentation is loading. Please wait.

Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015.

Similar presentations


Presentation on theme: "Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015."— Presentation transcript:

1 Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015

2 2PJ Giabbanelli Outline Toolbox of a data scientist: multiple approaches to work with behavioural data Toolboxdata scientist behavioural data 1 – What’s data science? 2 – What questions can we ask of behavioural data? 3 – How do we use data science tools to get answers? Food behavioursDrinking behaviours Insurgencies

3 What’s data science? VisualizationData miningSimulation and modelling 3PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data

4 Imagine that people have completed some kind of questionnaire. Typically you get an Excel spreadsheet. And you’d like to understand what relates to the target behaviour. 4PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data Tableau

5 5PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data Imagine that you have a very complex system, where tons of variables interact… You may want to look at it as a network. Gephi

6 PJ Giabbanelli6 Toolbox of a data scientist: multiple approaches to work with behavioural data

7 7PJ Giabbanelli What if you have a lot of text instead? Toolbox of a data scientist: multiple approaches to work with behavioural data

8 Here I am primarily concerned with visualization as seen from a data scientist’s viewpoint. I would use… ToolData Tableau, Qlik, SpotfireRelational (spreadsheet) Gephi or VisoneNetwork DatawatchStreaming relational Many-eyesA bit of everything GeoTimeSpatial data over time Jigsaw, CZSaw, InSpire, Leximancer, Text $ $ $ $ $ $ $ Viz as data scientist ≠ Making pretty pictures If you’re producing a visual for an audience, you show what you found. When you start with viz as a data scientist, you want to find something! Visual Capitalist

9 9PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data Abusing the tool If you watch CSI, you’ll see that when they search for a fingerprint match, the software shows all fingerprints it has! Wasting computer resources for useless displays Proper statistical testing If it looks like your data is normally distributed, that must be it, right? Relying on visuals instead of doing proper statistics

10 10PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data Abusing the tool When all you have is a hammer, everything starts looking like a nail.

11 What’s data science? VisualizationData miningSimulation and modelling 11PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data

12 What’s data science? ? ? ? 12PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data Imagine that you’re working for CSI (again!) and you want to identify the dude in the picture. When you know what you’re after, and it can be mathematically expressed, data mining helps.

13 PJ Giabbanelli13 Rules Communication OftenVery often Daily Weekly Never Binge drinker Non-binge drinker A: rules If ≥ often If < often B: comm. If

14 PJ Giabbanelli14 Toolbox of a data scientist: multiple approaches to work with behavioural data What’s data science? Data mining involves automatically testing lots of hypotheses by searching for combinations of variables that might show a correlation. Which variables are in the winning combination? You partly do data mining to answer this question… A. Wood Data Manager « For every variable that you seek to collect, provide a detailed rationale. » V. Lo Ethics Board

15 What’s data science? VisualizationData miningSimulation and modelling 15PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data

16 16PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data I offered coupons to some customers. Would they spend more? Who should I target? I raised prices of fast foods. Would it curb obesity? Who would benefit the most? I put people on antiretroviral therapy when they don’t have AIDS. Would it help? For whom? There are lots of big questions for which you don’t necessarily have all the data. Also, methods that help you understand what happened may not be helpful to know what may happen if…

17 What’s data science? Imagine that you want to change the urban environment to see if it helps people exercise more. PJ Giabbanelli17 Toolbox of a data scientist: multiple approaches to work with behavioural data You hopefully won’t be doing that. Rather you might want to create a virtual environment that simplifies reality so you can test your hypothesis safely.

18 What’s data science? PJ Giabbanelli18 Toolbox of a data scientist: multiple approaches to work with behavioural data

19 PJ Giabbanelli19 Toolbox of a data scientist: multiple approaches to work with behavioural data What’s data science? There are lots of ways to do modelling, depending on desired spatial & individual resolution. The most common approaches are agent-based modelling and system dynamics. ToolApproach AnylogicABM / SD NetLogoABM Vensim, iThinkSD $ $ $

20 PJ Giabbanelli20 Also: The emergence of Computational Sociology (J. of Math. Soc., ‘95); Why model? (JASS ’08) What’s data science? Toolbox of a data scientist: multiple approaches to work with behavioural data

21 PJ Giabbanelli21 Toolbox of a data scientist: multiple approaches to work with behavioural data VisualizationModelling & SimulationData mining & Machine Learning Data Science as a Technique Applications DefenseHealth Chronic diseasesInfectious diseases

22 PJ Giabbanelli22 Why? Tell me what people will do in the future! Toolbox of a data scientist: multiple approaches to work with behavioural data

23 PJ Giabbanelli23 Applications of Data Science How would climate change policies impact the health of Canadians by 2030? Simulated data for 2030 Dietary patternsBuilt environmentSocio-economics InputsOutputs Systems model Expected health impacts Physical health Well-being Toolbox of a data scientist: multiple approaches to work with behavioural data

24

25 PJ Giabbanelli25 Applications of Data Science There are many reasons other than prediction to do data science. Explaining To simulate far into the future, you need to understand what you have now and how it changes Explain 2 - Predict Toolbox of a data scientist: multiple approaches to work with behavioural data

26 PJ Giabbanelli26 Toolbox of a data scientist: multiple approaches to work with behavioural data Applications of Data Science There are many reasons other than prediction to do data science. Explaining “Electrostatics explains lightning, but we cannot predict when or where the next bolt will strike.” “Plate tectonics explains earthquakes, But does not permit us to predict the time and place of their occurence"

27 PJ Giabbanelli27 Toolbox of a data scientist: multiple approaches to work with behavioural data Applications of Data Science There are many reasons other than prediction to do data science. Explaining Schelling’s model of segregation A preference that one's neighbors be of the same color, or even a preference for a mixture "up to some limit", could lead to total segregation.

28 PJ Giabbanelli28 Toolbox of a data scientist: multiple approaches to work with behavioural data Applications of Data Science There are many reasons other than prediction to do data science. What are the core dynamics in my problem? Where are the gaps? Where do I need to collect data? What would happen if? How can we best do monitoring and surveillance?

29 PJ Giabbanelli29 Illuminate core dynamics “There is increasing evidence that social influence and social network structures are significant factors in obesity.” Eating Exercising Toolbox of a data scientist: multiple approaches to work with behavioural data

30 PJ Giabbanelli30 Illuminate core dynamics To which extent could social influences account for the dynamics of obesity? Toolbox of a data scientist: multiple approaches to work with behavioural data Let’s tackle the question using modelling & simulation.

31 PJ Giabbanelli31 Illuminate core dynamics Toolbox of a data scientist: multiple approaches to work with behavioural data

32 PJ Giabbanelli32 Illuminate core dynamics Toolbox of a data scientist: multiple approaches to work with behavioural data

33 PJ Giabbanelli Motivating question: to which extent is this model supported by interviewees? 33 Toolbox of a data scientist: multiple approaches to work with behavioural data Let’s tackle this question using interactive visualizations. Illuminate core dynamics

34 PJ Giabbanelli We measured the strength of a relationship between two factors as the number of responses in the interviews that used words relevant to both factors. 34 Toolbox of a data scientist: multiple approaches to work with behavioural data

35 PJ Giabbanelli35 Explaining Process You select peers with whom to drink… …and then, their drinking habits influence yours. Structure Can we explain why people engage in binge drinking? Let’s start with modelling and simulation, and make some hypotheses. Toolbox of a data scientist: multiple approaches to work with behavioural data

36 PJ Giabbanelli36 If we assume: that individuals select similar peers that individuals are prompted to drink if at least a fraction of their peers drink that one’s context known from drinking motives may deter/promote drinking Then we can correctly infer the behaviour of half of the binge drinkers and 4 out of 5 non binge drinkers. Explaining Toolbox of a data scientist: multiple approaches to work with behavioural data But without making any assumptions ourselves, if we just used data mining we would get roughly the same accuracy. The computer would build an explanation for us.

37 March 2011: EmergenceEscalationEarly 2012: Militarisation Monitoring The situation might change as you are intervening. How can you monitor changes and adapt? PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data 37

38 Visualizations allows the analyst to interactively explore the data and improve the model. The model guides the analyst in the exploration of the new data. PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data 38

39 PJ Giabbanelli There is a lot of potential in the tight coupling of techniques (e.g., modelling / interactive visualizations) but currently you’d have to come up with a technical solution yourself for that. Toolbox of a data scientist: multiple approaches to work with behavioural data 39

40 PJ Giabbanelli40 Toolbox of a data scientist: multiple approaches to work with behavioural data VisualizationModelling & SimulationData mining & Machine Learning DefenseHealth Chronic diseasesInfectious diseases Interdisciplinary: shock of cultures Getting good quality data Needing to understand a very wide range of tools Continuously need to improve the tools Data science in the world Challenges

41 Challenges – Need new tools PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data 41

42 Challenges – Interdisciplinary PJ Giabbanelli Toolbox of a data scientist: multiple approaches to work with behavioural data 42

43 Challenges – Interdisciplinary PJ Giabbanelli In my field, good papers are published in conferences. In my field, good papers are published in journals. In my field, we just put data on our website for others. In my field, we own the data and selectively share it. Why don’t I just pick a book and learn your whole field? Why don’t I just watch a couple videos to learn your job? We need to build mutual trust and accomodate each other in a system that’s unsupportive. Toolbox of a data scientist: multiple approaches to work with behavioural data 43

44 Challenges – Getting good data PJ Giabbanelli There is a lot of data out there. But most is unstructured (text, video…) and hard to deal with. There are public repositories for data but a lot of that are lists of junk, localisations, or population-level data split at best per age and gender Toolbox of a data scientist: multiple approaches to work with behavioural data 44

45 Challenges – Getting good data PJ Giabbanelli Kaggle Toolbox of a data scientist: multiple approaches to work with behavioural data 45

46 PJ Giabbanelli Investigator Scientist University of Cambridge Get in touch? Founder Vancouver Computational Modelling PJ Giabbanelli. Modelling the spatial and social dynamics of insurgency. Security Informatics ‘14 (Simulation & Modelling in Defense) Pratt, Giabbanelli & Mercier. Detecting unfolding crises with visual analytics and conceptual maps: emerging phenomena and big data. Proc of IEEE ISI ‘13 (Visual Analytics + Simulation & Modelling in Defense) Crutzen & Giabbanelli. Using classifiers to identify binge drinkers based on drinking motives. Substance use & misuse ‘14. (Data mining in health) Giabbanelli et al. Modeling the influence of social networks and environment on energy balance and obesity. Journal of Computational Science ‘12. (Simulation & Modelling in Health) Toolbox of a data scientist: multiple approaches to work with behavioural data 46


Download ppt "Toolbox of a data scientist: multiple approaches to work with behavioural data Philippe J. Giabbanelli, PhD Data Insight Meetup, February 5th 2015."

Similar presentations


Ads by Google