Presentation is loading. Please wait.

Presentation is loading. Please wait.

Https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox

Similar presentations


Presentation on theme: "Https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox"— Presentation transcript:

1 https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox gcf@indiana.edu Informatics, Computing and Physics Indiana University Bloomington

2 https://portal.futuregrid.org Goal of Day Come up with a few (3-5) projects that advance Social Sciences Cyberinfrastructure Choose so that together they cover spectrum of characteristics 2 Characteristics ABC….Z Project 1XXX Project 2XXX ….. Project NXX

3 https://portal.futuregrid.org Data Type What is large? #Collections v. Collection Size v. #Users “Big (Social) Science” v Long Tail # rows v # columns v time dependence Structured (defined) v unstructured (inferred/discovered) metadata granularity of metadata Data modality: Streaming, video, image, text, “binary” – vector space or not (genomics, network) distributed v centralized data (production/storage/processing) Complex objects v. tables Observed v. simulation or modeling 3

4 https://portal.futuregrid.org Data Nature (“ilities”) Open data Sharable Data Publication model / Data citation models? – DOI or Handler Reproducibility Sustainability Standards Management Integration Dramatic change in next 10 years Data availability as in Public Windy Grid 4

5 https://portal.futuregrid.org Mining/Analyzing data Access: role of Community comments, crowd sourcing, Processing: “Simple” statistics, Linkage software, data visualization, GIS, analytics (SVM, LDA, Clustering...); (new) management tools Data Mining (discovering the unexpected) v. Data Analysis (discovering with excellence the ~expected) Modeling for data components and regression More data v more/better algorithms (in simulation, algorithm advances ~ as important as machine advances) Programming model: Excel, SQL, R, SPSS, Other Scripting, MapReduce, "Fortran/C++/Java", Libraries, workflow, portal/gateway Open software & sustainability of it 5

6 https://portal.futuregrid.org Security & Privacy Support sharing The law Risk of identification, harm from disclosure Differential Privacy and nifty obfuscation ideas IRB Federated Identity Enclave 6

7 https://portal.futuregrid.org The Infrastructure Repository/Archive v. Active (compute + storage) data Bring Computing to data Commercial Clouds v. XSEDE v. University Local v. cloud v. department/university Distributed (Federated) clouds as collections distributed DropBox, Google docs, Skype etc. v customized Generality of DuraCloud, Dataverse DataUp etc. Tool repository/library Cloudbursting (public-private hybrid cloud) Connectivity to cloud (can be addressed by I2?) Backup v Main Home 7

8 https://portal.futuregrid.org Other Characteristics Satisfying NSF Data Management requirements Breadth of applicability of solutions # Organizations collaborating on project Interdisciplinary collaborations Data (science) Curricula Relation to issues in other fields Support and Governance Industry ahead of Academia 8


Download ppt "Https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox"

Similar presentations


Ads by Google