Download presentation
Presentation is loading. Please wait.
Published byWinifred McDonald Modified over 9 years ago
1
PIER & PHI Overview of Challenges & Opportunities Ryan Huebsch † Joe Hellerstein † °, Boon Thau Loo †, Sam Mardanbeigi †, Scott Shenker †‡, Ion Stoica † p2p@db.cs.berkeley.edu † UC Berkeley, CS Division ‡ International Computer Science Institute, Berkeley CA ° Intel Research Berkeley STREAM DAY 5/7/04
2
PIER P2P Information Exchange & Retrieval A wide-area distributed dataflow engine Outfitted with relational operators Designed to scale to thousands or millions of nodes Motivation: It’s an interesting challenge Lowers the barrier of entry for large-scale applications No massive infrastructure for server farms Cost is distributed among participants Provide a viable solution where other options are not socially acceptable We are NOT trying be better than other (centralized) solutions, we are trying to be different.
3
Challenges Physical Network Overlay Network Query Plan Declarative Queries Query Optimization Multi-Query Optimization Catalogs Persistent Storage Recursion Query Dissemination Replication Soft-State Quality of Service Resilience Route Flapping Efficiency Security Privacy Quality of Service General Challenges
4
Applications & Requirements File sharing Flooding works for popular items Need something better for rare items May want ‘triggers’ when a new item matches an old search Network Monitoring Aggregation & grouping very common Continuous queries with well defined semantics PHI is one use of PIER…
5
PHI Public Health for the Internet Community-based monitoring The metaphor: Old way – Treat computers with medicine Virus protection New way – Monitor the community Like the Center for Disease Control Global CDC has social implications Central repository, privacy, who controls it, who pays for it… PHI wants to create the Center for Disease Control without the Center (of control) Motivation is to inform users about the dangers of the Internet
6
PHI Example PIER is currently deployed on 150-300 PlanetLab nodes. ~100 sites Some nodes on DSL, 1Mbps, 10 Mbps, etc. Very unreliable SNORT is the primary data source ~2400 rules 10’s - 1000’s of tuples per day per node Schema: time, rule, source socket, destination socket Quick Demo: Shows the top ten sources of events across all of PlanetLab (live), i.e. who are the bad guys?
8
What’s next… PIER Lots of problems, including the meta-problem of what problem to work on No streaming semantics, no language to describe windows, etc… Additional challenges: Interaction with soft-state, no synchronized clocks, unknown (changing) network latencies PHI Create a complete application Gets intrusion data from a variety of sources (including the built-in Windows Firewall Develop a snazzy visualization Release to the world, first using PlanetLab as the query processor, eventually the world Scale to at least 10,000’s nodes and explore the design space
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.