Presentation is loading. Please wait.

Presentation is loading. Please wait.

“Creating Data Repositories..” Sanjay Rao ECE Dept, Purdue University.

Similar presentations


Presentation on theme: "“Creating Data Repositories..” Sanjay Rao ECE Dept, Purdue University."— Presentation transcript:

1 “Creating Data Repositories..” Sanjay Rao ECE Dept, Purdue University

2 Group Members Dave Maltz Rebecca Issacs Ratul Mahajan Yin Zhang Aditya Akella David Kotz Charles DiFatta …..

3 Motivation Network Management Research: –Barrier to entry is high –Data/insights from operators/industry critical Examples: –Failure characterization of enterprise network –VLAN characterization and use –Configuration Management

4 What happens today..? End-user centric measurement studies –Network “black-box”: no operator involvement –Real need: “white-box” Campus Networks –Difficulties in bootstrapping relationships with operators Enterprise/Operator Network –Sprint or AT&T (Microsoft with end-user) –Limited pool of researchers Data across multiple enterprises?? Trends over many years ??

5 Bottomline Need a data repository –Contributors from operators, researchers, industry –Accessible to all researchers Facilitate research much like Planetlab Vital to have “critical mass” of researchers on Network Management – Research along high-impact real problems

6 Data Sharing: what inhibits it? Sensitivity of data –Security Issues (firewall policies, network structure) –Privacy Issues (records of individual activity) Proprietary nature of data –E.g. how many calls got, mobility models –Possible to have others use it? “Secret weapon” for research –Competition Vs. collaboration Inertia/ too much effort

7 Solutions Carrots/sticks to promote data sharing –“Must release data” to publish –IMC: best paper award only to work releasing data. Technical ways to addressing concerns with sharing

8 Positive Example Example: HSARPA “PREDICT”: make research on network security possible. Firewalls and IDS network security data

9 Research: Anonymization Hiding provider, hiding individual information Need framework to reason about it –What trade-offs do you make? –What risks are posed? –How to expose trade-offs in a way we can appreciate? Anonymization very domain specific –E.g. configuration file Vs. packet trace –Are there common themes? Other Models: –NDA-based –“Give me a question” -> “return answer” –“Exploratory” nature of research

10 Community effort: Cooperate on IRB Social Sciences: –Lots of experience with IRB Networking: –Lack of clear guidelines on IRB process –Admins feel happier if IRB can “sanction” things As community: –Must appreciate need/process for IRB –Develop guidelines for IRB process –Share IRB documents

11 Creating shareable data 75% of time spent figuring how to use data Researcher needs vary –Different forms of datum –Historical Vs. Streaming Dated? Trending? –Assumptions made/gaps in data –“timing info crucial at sub-RTT level”? Sharing hard, many idiosyncrasies –Data collection infrastructure, annotate

12 User Diagnostics One-on-one: exact data provided Create shared repository(ies) –What data do most users want? –Is that 20% of stuff most critical to provide? Data Collection Tools Meta-data part of problem –Create data in standard formats –“Observatory”: How to discover, describe, explain data Access policy, use policy

13 Other Streaming Data: Online Vs Offline Scalable collection: –What to collect? Over how long? –Compression techniques –Fine-grained: overhead, coarse-grained: information loss What does it take to build this infrastructure? –Get all types of data as painlessly as possible –Massage, orchestrate data to fit researcher needs –Simple APIs to get data out – fast analysis tools –Federated Access –DataManagement - Lifecycle of data

14 Action Items Community-Wide Efforts: –Initiate efforts to create data repository How to manage? Who contributes? Who arbitrates How much storage? Lifecycle - How long to store data? –Create IRB guidelines for networking data Research: –Anonymization –Usage diagnostics -> what to collect,release: widely applicable –Data Collection Tools, metadata information Industry,operators must be as actively involved as possible


Download ppt "“Creating Data Repositories..” Sanjay Rao ECE Dept, Purdue University."

Similar presentations


Ads by Google