9 Infrastructure Icebergs 90k lines of tooling and monitoring, 30k lines of logicDedicated engineers, operationsTrainingFirst three nines come from operations
10 This is (still) a very immature space. Which systems should we have? Good news for users, bad news for distributed systems nerdsFilesystems take a decade to mature. Don’t expect this will be easier.
11 Infrastructure is sculpted by applications and constraints Projects are defined by trade-offs
12 Constraints Hardware Other Jeff Dean: Numbers everyone should know David Patterson: Latency lags bandwidth$$$OtherPath dependenceComplexityResources
27 Batch: Hadoop Uses Ecosystem Ad hoc Production batch Hive, Pig Azkaban (workflow)Avro dataData in: KafkaData out: Voldemort, Kafka
28 Why do batch if you have real-time? Batch advantagesSafetyEasyThroughputSimplicityEconomicsTricky bit: engineering the data cycle
29 Why do streaming? You have to glue all these systems together Throughput as good as batchLatency much betterMetaphor more natural for low latency than Hadoop
30 What makes successful infrastructure systems? Operability and OperationsMonitoringSimplicityDocumentationBroad adoptionLazy usersOpen source
31 Open Source Data > Infrastructure Open source creates better code—even with few outside contributorsCommercial infrastructure not interesting
32 Open Source Projects We made We stole Voldemort: Key/Value storage Sensei, Bobo, Zoie: Elastic, faceted, real-time search with LuceneKafka: Persistent, distributed data streamsNorbert: Cluster aware RPC, load balancing, and group membershipAnd others…We stoleHadoop, Pig, HiveLuceneNetty, JettyZookeeperAvroApache Traffic Server
Your consent to our cookies if you continue to use this website.