Presentation is loading. Please wait.

Presentation is loading. Please wait.

Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu.

Similar presentations


Presentation on theme: "Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu."— Presentation transcript:

1 Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu

2 What have I done so far? Geographical Movement of Mailing List Participants Seminar SET Capita Selecta SET Who’s who in GNOME: using LSA to merge software repository identities ICSM 2012 ERA Track / Software Engineering & Technology PAGE 116-5-2015

3 What are the main topics? Human migration of open-source contributors Identity matching Case study: GNOME / Software Engineering & Technology PAGE 216-5-2015

4 Why is human migration of open-source contributors interesting? A passionate contributor would visit a conference. Don't program on Fridays! Contributors that appear as weekend commuters are less likely to introduce bugs on Fridays. Translators that reside in a different country than the country of the target language are expected to deliver translations of lower quality. / Software Engineering & Technology PAGE 316-5-2015

5 What’s so interesting about this human migration of open-source contributors? What (geographical) patterns does the migration of open-source contributors follow? Which patterns (source  destination) are most popular? −Commute −Conferences What are the factors that influence this migration? Which factors are most influential? / Software Engineering & Technology PAGE 416-5-2015

6 How am I planning to trace these migrations? Extract emails from mailing list archive Resolve emails to location Email A is sent from locationA at timestampA Email B is sent from locationB at timestampB + = migration! But what if the contributor uses multiple email addresses? / Software Engineering & Technology PAGE 516-5-2015

7 What exactly is Identity Matching? Identifying which aliases belong to the same individual Common in the form george.stefanakis@domainA g.stephanakis@domainB Needs some similarity measure (e.g. edit distance) / Software Engineering & Technology PAGE 616-5-2015

8 How am I going to match these identities? / Software Engineering & Technology PAGE 716-5-2015

9 What will I be doing to improve the identity matching? Increase confidence when merging email addresses Look at fellow recipients (mailing list) Look at coauthors (source code repository) Use multiple similarity measures Currently Levenshtein and Cosine Similarity Compare performance with others (e.g. Jaccard, Jaro- Winkler, Dice’s coefficient, etc.) Improve implementation Currently slow Data set limited to system’s memory Release the tool as open-source (e.g. Github) Compare to current implementations / Software Engineering & Technology PAGE 816-5-2015

10 So, what will I be doing? 1.Improve the identity matching algorithm’s performance 2.Run the algorithm on the data from the mailing list archive 3.Send out a questionnaire to verify the results 4.While waiting for the questionnaire, improve the algorithm with more advanced techniques 5.When we have received sufficient responses on questionnaire, analyse the data and look for patterns / Software Engineering & Technology PAGE 916-5-2015

11 A questionnaire? What about privacy? Only the individual can access the data Participation by entering their email address Unique URL (hash) mailed to the email address Data will not be made public Research published based on the data will be anonymised / Software Engineering & Technology PAGE 1016-5-2015

12 How do I confirm the identity matching? / Software Engineering & Technology PAGE 1116-5-2015

13 How do I confirm the migrations? / Software Engineering & Technology PAGE 1216-5-2015

14 Looks promising… / Software Engineering & Technology PAGE 1316-5-2015

15 And what am I hoping to achieve? A more advanced and better performing identity matching algorithm than currently exists Versatile and open-source tool According to which patterns and why skilled workers (open-source contributors) migrate Work during holiday  Hobbyist Visits conferences  High activity in project More publications! / Software Engineering & Technology PAGE 1416-5-2015

16 Thank you! Questions? / Software Engineering & Technology PAGE 1516-5-2015


Download ppt "Human Migration of Open- Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu."

Similar presentations


Ads by Google