Presentation is loading. Please wait.

Presentation is loading. Please wait.

Growing the Site Reliability Team at LinkedIn: Hiring is Hard

Similar presentations


Presentation on theme: "Growing the Site Reliability Team at LinkedIn: Hiring is Hard"— Presentation transcript:

1 Growing the Site Reliability Team at LinkedIn: Hiring is Hard
Here to tell you why hiring is hard How to make it not so hard And how to know if it’s still working Greg Leffler Manager, Site Reliability

2 Who am I? Site Reliability Manager (New York)
MS in Industrial/Organizational Psychology Responsible for interview process for SREs Took this responsibility as an IC, so originated from the bottom up Team grew 10x from August 2011 to May 2014

3 Who are SREs at LinkedIn?
5 sites, 2 countries 1000+ SW Engineers 8th busiest website in the world 10k+ prod machines per DC: 2 DCs today +1 in 2014 300+ RESTful services, 300MM+ members Services with 99th %ile latencies as low as 10 ms 3 time zones  (PT, ET, IST)

4 What matters for a great company?
Funding? Good idea? Execution? Product? People. They all matter, but people matter the most. Great people can make up for weaknesses in other areas. Great people can help you make your plans better. Great people are what make a company great.

5 Obligatory LinkedIn culture plug
Talent is our #1 operating priority. Our culture is what sets us apart. We are committed to supporting the career transformation of our employees. Transparency is encouraged and emphasized at every company all- hands meeting Which occur every other week Our commitment to our employees is emphasized in how we behave Everyone is encouraged to do interviews! Yes, everyone. ~60% of SREs participate in the interview process 60% Stat cited from HireIn. Talent is our #1 operating priority. Our culture is what sets us apart. We are committed to supporting the career transformation of our employees. We will be transparent in our programs to ensure trust, quality and effectiveness. If something is wrong, we will fix it. We will strive for consistency but distinguish programs where business or market necessitates. We will provide opportunities for high performing employees to achieve top rewards. We believe in recognition that goes beyond the cash and imbeds itself in our day to day interactions with our employees. We strive to create a working environment where people want to come and stay.

6 What do we want from SREs?
Excited about LinkedIn and the SRE role We have the luxury of being picky Fit our culture and embody our values These matter. If you haven’t set them or can’t articulate them, you need that 1st Have the skills needed to do the job These also matter. You need to know what these are before you screen for them AND NOTHING ELSE Not skills that are ‘nice to have’, not somebody you’d want to hang out with on the weekend, not somebody who has won a programming contest, not somebody who went to an elite school.

7 These don’t always work
Coding puzzles “Fermi problems” Algorithm design questions If you were a zebra, what pattern would your stripes have? Homework Personality tests Trivia (quick, which signal is #7 in RHEL 6.4 on x86?) First 2: you can buy book & study Aside: coding is important to LI, but we test it pragmatically by having them do actual SRE coding tasks before they even come onsite – we don’t bring people in who can’t code Algo design: likely not appropriate except in very limited situations where that is part of the role. REALLY think about if they need it Zebra question: what’s the right answer? Really? You sure? Homework: CIO mag surveyed self-appointed “expert” devs, only 8% would even consider doing homework Personality tests: not correlated very well to performance in technical fields (re: signal it’s SIGBUS – 7 in i for other architectures. Would you know that if I hadn’t said x86 in the question, or told you just now?) Coding puzzle article:

8 Here’s why Industrial Psychology has figured this out already
Schmidt & Hunter, 1998 The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings Even if they hadn’t, you should collect your own data And not rely on hunches or cargo cults Further reading in the notes on this slide. Industrial psych has been around for almost 100 years at this point, coming into its own with the development of the assembly line and Frederick Taylor’s principles of “scientific management”. Industrial Psychology applies the scientific method to hiring, screening, motivation, and everything else that happens at work. Scientists have been studying how work works for a loooong time and have published these findings. They apply across industries and to nearly every work situation that we’ve been able to dream up, including studies of space shuttle pilots, elite military units, etc. Further reading: Google “Hawthorne Works” and read about the Hawthorne Effect Anderson, N., Ones, D. S., Sinangil, H. K., & Viswesvaran, C. (Eds.). (2002). Handbook of industrial, work and organizational psychology, Volume 1: Personnel psychology. Thousand Oaks, CA: Sage Publications Ltd. Campbell, J. P., Gasser, M. B., & Oswald, F. L. (1996). The substantive nature of job performance variability. In K. R. Murphy (Ed.), Individual differences and behavior in organizations (pp. 258–299). San Francisco: Jossey-Bass. Guion, R. M. (1998). Assessment, measurement and prediction for personnel decisions. Mahwah, NJ: Lawrence Erlbaum Associates. Muchinsky, P. M. (Ed.). (2002). Psychology Applied to Work. Wadsworth Publishing Company. (this is a textbook but is very well written) Sackett, P. R., & Wilk, S. L. (1994). Within group norming and other forms of score adjustment in pre-employment testing. American Psychologist, 49,

9 What does work? Good funnel at the start Realistic job previews
Structured interviews Situational judgment tests These have the highest standalone validity (i.e., the best predictors) as well as high incremental validity (adding them to other predictors also helps.) They also have high face validity – which means the candidates like them and they look like they actually evaluate things that are relevant to the job.

10 The LinkedIn SRE funnel
Sourcing/screening Recruiter prescreen Operationally-focused phone screen (TPS 1) Code-focused phone screen (TPS 2) By the time onsite, we expect they will pass. 82% Only combined pass rate for TPSes due to way data is stored. #s are for 2013. 24%

11 Realistic job preview RJP is a way to show the candidate what they’ll do on a day to day basis. It gives them a chance to put themselves in the shoes of an SRE at LinkedIn, and more importantly, it gives them a chance to figure out if they like doing it or not

12 Structured Interview A structured interview is different from most interviews in one very key way – it is structured. The same candidates get the same questions and you can then compare apples to apples.

13 Situational Judgment Test
Situational judgment tests ask candidates to determine how they would react in a given situation. SREs or devops folks or star-ops, have the keys to the kingdom and need to be trusted. Their judgment is a HUGE part of their qualifications for the role, and as such, SJTs are the way to get at that as early as possible.

14 How do we implement these?
Live Troubleshooting (Realistic job preview) Systems Internals, Web Architecture (Structured interviews) Triage & Investigation (Situational judgment test) Host Manager (structured interview for culture and role fit) Lunch (not an interview… or is it?) This is LinkedIn’s interview process. This is everything we do – onsite. We also do some phone screens, see earlier slides.

15 Live Troubleshooting Here’s a broken service (in EC2) Fix it
(As realistic as it gets) No ‘man voldemort’ You are probably the 1st person in the world to troubleshoot the exact situation in question

16 Technical Modules Added structure and scoring guidelines
Scoring guidelines are what matter Consistency is the only way you can scientifically prove if these are working High # of interviewers = need to be able to compare results This is systems internals & web architecture/design

17 Triage & Investigation Module
Situational Judgment and Triage It’s your first day oncall and the NOC calls to say the site is on fire. Here’s the alert board – what do you look at first? Why? Assesses standard troubleshooting/investigation ability “The CEO calls you and says ‘the site is slow’ – what do you do?” “Disk is full. You delete a file but df still shows the disk being full. What’s wrong?” You’ll notice all of these are still situational judgment questions. They aren’t technical trivia or require “a-ha” moments. You can reason through it even if you haven’t been in that exact situation before.

18 Results of implementing changes
Happier candidates In fact, no unhappy candidates “The troubleshooting module was the most fun I’ve ever had in an interview” “I thought the troubleshooting module was hard but I learned so much” Happier interviewers Some hesitation at first Live Troubleshooting is stressful for the interviewer too! Solve with training and apprenticeships The candidate statements do not contradict each other. Especially for this role, we want candidates who find the interviews hard – but then who find that intellectually stimulating and want to come work for us! Also, even candidates we don’t accept: we still want to have a good experience! Selfishly (they might have a friend who’d be great) but for them too – no need to be mean to people. It’s not our culture. We trained candidates, but we also provided practice sessions with other employees & we also have a voluntary apprenticeship program where we let people shadow an experienced interviewer for any module

19 Data, data, data We’re collecting scores from each module
Correlating them to performance ratings Re-evaluating the utility of each module If a module doesn’t predict performance, get rid of it This is hard, especially with things people ‘need’ However, if there’s no correlation, it is worthless. Don’t mention the actual LI numbers here, use a nominal scale Analysis to occur after a full year of data due to power issues If power is interesting to you, come talk to me at office hours 

20 How to make your process better
Make talent your first priority Implement the good stuff from I/O psych Realistic job previews Situational judgment tests STRUCTURED interviews Collect data on interview performance (module scores) Correlate this to job performance! Re-evaluate your process

21 Want to experience it for real?
We’re hiring. See me afterwards. Office hours are at 2 pm Any hiring or culture related questions are fair game


Download ppt "Growing the Site Reliability Team at LinkedIn: Hiring is Hard"

Similar presentations


Ads by Google