Presentation on theme: "Division of Social Sciences Social Network Data Dan Ryan Mills College Spring 2012."— Presentation transcript:
Division of Social Sciences Social Network Data Dan Ryan Mills College Spring 2012
Outline Methods Issues Representations & data structures Pragmatics Research on methods General considerations and best practices
METHODS Surveys and questionnaires Ethnography, observation, fieldwork Unobtrusive observation, trace studies Data mining
Doing it by Hand If you wanted to get something improved or done on behalf of a student at Mills whom would you contact?
Doing it by Hand If you wanted to get something improved or done on behalf of a student at Mills whom would you contact? AxelBernie CormacDagmar Ernesto
Doing it by Hand If you wanted to get something improved or done on behalf of a student at Mills whom would you contact? ErnestoBernie CormacFlorian Goldi Axel
Node List or Edgelist Axel Bernie Cormac Ernesto Florian Goldi Ernesto Axel Bernie Cormac Dagmar Axel Bernie Axel Cormac Axel Ernesto Axel Florian Axel Goldi Ernesto Axel Ernesto Bernie Ernesto Cormac Ernesto Dagmar
Actual (ego) Network Survey Check boxes for “yes” or “no”
Class Exercise Select a partner Collect partner demographics Get partner to generate list of “confidantes” For each confidante, ask a set of link Qs Construct Square matrix for a CSS interview Include: have you X with A and B
Ink or Electrons? Paper surveys – Researcher fill-out vs. respondent fill-out – May feel more confidential, etc. to respondents – Expense (copying, postage, etc.) – Data entry errors and time Electronic – , web, tablet – Learning curve on tool – Span distances, time zones, larger N – Lower response rate and less respondent selectivity – Data integrity advantages (errors and time)
How often do you turn to colleague for… Problem SolvingSupport Atkins Bair Baker Benson Calder Carlson Church Daven Fiola Fleming Harris Hoberman Huttle Problem Solving: How often do I go to this person for help with technical work-related problems? Support: How often do I go to this person when I need help with a difficult situation at work? Answer each question on a scale from 0 to 4:(0) Rarely or never; (1) Every few months; (2) Every few weeks; (3) Every week; (4) Every day Hoppe, B.
META-ISSUES Ethics, human subjects, informed consent Validity Reliability Accuracy Precision Scaling and calibration
Ethics Fundamental principle: informed consent & do no harm Problem: in network research we “name names” Example: Mills College residential life “social network project” ‘‘But the data is already public’’: on the ethics of research in Facebook” Michael Zimmer Ethics Inf Technol June 2010 Who benefits from network analysis: ethics of social network research Charles Kadushin Social Networks Volume 27, Issue 2, May 2005, Pages 139–153 (Ethical Dilemmas in Social Network Research) Social NetworksVolume 27, Issue 2
Measurement Validity = is it what we say it is? – Does “I ask for info” = “alter is expert”? – Does “on CC list” = “considered on the team”? Reliability = same result next time? – Do two twitter dumps give the same tweets? Accuracy = is it correct? – “List everyone you share meals with…” Precision = how many decimal places? – How often do you get from her?
DATA STRUCTURES Edglists Nodelists Full matrix Databases
PRAGMATICS Collecting ego network data Snowball methods Web crawling, scraping, databases, APIs Wearable computers "Data exhaust" Cognitive social structure interviews Bounding and sampling issues
Snowball Sampling Start with small set and ask about alters Then repeat with the alters
What is a bot? aka web robots, WWW robots program that visits web pages & does things… …like… …recording what page points to …indexing all the words on the page …scraping page content into a database
Web Crawling X = getStartURL() createEdgesForPage(X) createEdgesForPage(url) stopIfDone if anyUnvisitedLinks(url) then Y=getNextUrl createVertex(Y) createEdge(url,Y) createEdgesForPage(Y)
STOP AND THINK Does web crawling produce a directed or an undirected graph? What does the degree of a vertex in this graph tell you? Might the degree mislead? How can we improve on this?
Web Scraping aka web harvesting or web data extraction Start with knowledge of data on page Scraper told “look here for this and put it here”
Data Base Maintained by Scraper CRNSUBSECCREDCOURSEDAYSTIMEINSTRBLDGROOM 10283SOC Methods of Social ResearchTR11:00AM-2:15PMStrykerNSB SOC1121Social ControlTR9:30AM-10:45AMRyanNSB212
Example of Commercial Scraper
STOP AND THINK Is scraped data automatically a network?
Web Indexing X = getStartURL() createIndexForPage(X) createIndexForPage(url) stopIfDone for all word in page(url) createEdge(url,word) for all Y in unvisitedLinks(url) createIndexForPage(Y)
STOP AND THINK What will index look like as a network? What relationships can we derive… – …between words? – …between pages?