Presentation is loading. Please wait.

Presentation is loading. Please wait.

Growing the Site Reliability Team at LinkedIn: Hiring is Hard Greg Leffler Manager, Site Reliability

Similar presentations

Presentation on theme: "Growing the Site Reliability Team at LinkedIn: Hiring is Hard Greg Leffler Manager, Site Reliability"— Presentation transcript:

1 Growing the Site Reliability Team at LinkedIn: Hiring is Hard Greg Leffler Manager, Site Reliability

2 Who am I?  Site Reliability Manager (New York)  MS in Industrial/Organizational Psychology  Responsible for interview process for SREs -Took this responsibility as an IC, so originated from the bottom up  Team grew 10x from August 2011 to May 2014

3 Who are SREs at LinkedIn?  100+ SREs  5 sites, 2 countries  1000+ SW Engineers  8 th busiest website in the world  10k+ prod machines per DC: 2 DCs today +1 in 2014  300+ RESTful services, 300MM+ members  Services with 99 th %ile latencies as low as 10 ms

4 What matters for a great company?  Funding?  Good idea?  Execution?  Product?  People.

5 Obligatory LinkedIn culture plug  Talent is our #1 operating priority.  Our culture is what sets us apart. We are committed to supporting the career transformation of our employees.  Transparency is encouraged and emphasized at every company all- hands meeting -Which occur every other week  Our commitment to our employees is emphasized in how we behave -Everyone is encouraged to do interviews! Yes, everyone. -~60% of SREs participate in the interview process

6 What do we want from SREs?  Excited about LinkedIn and the SRE role -We have the luxury of being picky  Fit our culture and embody our values -These matter. If you haven’t set them or can’t articulate them, you need that 1st  Have the skills needed to do the job -These also matter. You need to know what these are before you screen for them  AND NOTHING ELSE

7 These don’t always work  Coding puzzles  “Fermi problems”  Algorithm design questions  If you were a zebra, what pattern would your stripes have?  Homework  Personality tests  Trivia (quick, which signal is #7 in RHEL 6.4 on x86?)

8 Here’s why  Industrial Psychology has figured this out already  Schmidt & Hunter, 1998 -The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings  Even if they hadn’t, you should collect your own data -And not rely on hunches or cargo cults  Further reading in the notes on this slide.

9 What does work?  Good funnel at the start  Realistic job previews  Structured interviews  Situational judgment tests

10 The LinkedIn SRE funnel  Sourcing/screening  Recruiter prescreen  Operationally-focused phone screen (TPS 1)  Code-focused phone screen (TPS 2)  By the time onsite, we expect they will pass. 82% 24 %

11 Realistic job preview

12 Structured Interview

13 Situational Judgment Test

14 How do we implement these?  Live Troubleshooting (Realistic job preview)  Systems Internals, Web Architecture (Structured interviews)  Triage & Investigation (Situational judgment test)  Host Manager (structured interview for culture and role fit)  Lunch (not an interview… or is it?)

15 Live Troubleshooting  Here’s a broken service (in EC2)  Fix it  (As realistic as it gets)  No ‘man voldemort’  You are probably the 1 st person in the world to troubleshoot the exact situation in question

16 Technical Modules  Added structure and scoring guidelines  Scoring guidelines are what matter  Consistency is the only way you can scientifically prove if these are working  High # of interviewers = need to be able to compare results

17 Triage & Investigation Module  Situational Judgment and Triage -It’s your first day oncall and the NOC calls to say the site is on fire. Here’s the alert board – what do you look at first? Why?  Assesses standard troubleshooting/investigation ability -“The CEO calls you and says ‘the site is slow’ – what do you do?” -“Disk is full. You delete a file but df still shows the disk being full. What’s wrong?”

18 Results of implementing changes  Happier candidates -In fact, no unhappy candidates -“The troubleshooting module was the most fun I’ve ever had in an interview” -“I thought the troubleshooting module was hard but I learned so much”  Happier interviewers -Some hesitation at first -Live Troubleshooting is stressful for the interviewer too! -Solve with training and apprenticeships

19 Data, data, data  We’re collecting scores from each module  Correlating them to performance ratings  Re-evaluating the utility of each module -If a module doesn’t predict performance, get rid of it -This is hard, especially with things people ‘need’ -However, if there’s no correlation, it is worthless.

20 How to make your process better  Make talent your first priority  Implement the good stuff from I/O psych -Realistic job previews -Situational judgment tests -STRUCTURED interviews  Collect data on interview performance (module scores) -Correlate this to job performance! -Re-evaluate your process

21 Want to experience it for real?  We’re hiring. See me afterwards.  Office hours are at 2 pm -Any hiring or culture related questions are fair game

Download ppt "Growing the Site Reliability Team at LinkedIn: Hiring is Hard Greg Leffler Manager, Site Reliability"

Similar presentations

Ads by Google