Presentation is loading. Please wait.

Presentation is loading. Please wait.

Center for Computational Intelligence, Learning, and Discovery Artificial Intelligence Research Laboratory Department of Computer Science Supported in.

Similar presentations


Presentation on theme: "Center for Computational Intelligence, Learning, and Discovery Artificial Intelligence Research Laboratory Department of Computer Science Supported in."— Presentation transcript:

1 Center for Computational Intelligence, Learning, and Discovery Artificial Intelligence Research Laboratory Department of Computer Science Supported in part by grants from the National Science Foundation (IIS 0219699, IIS 0711356) to Vasant Honavar. Students: Computer Science: Doina Caragea (Ph.D., 2004), Jun Zhang (Ph.D., 2005), Jie Bao (Ph.D., 2007), Jyotishman Pathak (Ph.D., 2007), Cornelia Caragea, Oksana Yakhnenko, Neeraj Koul, Yeaser El-Manzalawy, Kewei Tu, Raphael Osorio, Flavian Vasile, Adrian Silvescu.. Students, Bioinformatics: Changhui Yan (Ph.D., 2004), Michael Terribilini, Feihong Wu, Tim Alcon, Carson Andorf, Laron Hughes. Algorithms and Software for Distributed, Collaborative, Integrative e-Science From data to knowledge Statistically based machine learning offers one of the most cost-effective approaches to data-driven knowledge discovery in emerging data-rich application domains (e.g., Bioinformatics, Security Informatics, Medical Informatics, Social Informatics). Cyber-enabled Discovery Applications Bioinformatics and Computational Molecular and Systems Biology Plant Genome Annotation (with Brendel) Protein Function Prediction (with Dobbs funded by NIH GM 066387) Prediction of Protein-Protein, Protein-DNA, and Protein-RNA interfaces (with Dobbs and Jernigan, funded by NIH GM066387) Integrating Quantitative and Functional Genomics (with Tuggle et al., funded by USDA) Synthesis of Gene Networks (with Greenlee and Serb) Cross-species Comparative Animal Genomics (with Reecy, funded by USDA) Critical Infrastructure Protection  Distributed power systems management, monitoring, and protection (with McCalley et al, funded by NSF CNS 0540293) Work in Progress INDUS (with Caragea, KSU, funded by NSF IIS 0711356) Ontology Federation and Distributed Inference (with Slutzki, funded by NSF IIS 0639230) Interactive service composition and adaptation (with Basu and Lutz, funded by NSF CCF 702758) Challenges  Scalability: Massive, distributed autonomous data  Differences in data semantics: terminological differences, different levels of abstraction  Access constraints: e.g., due to privacy,  Multiple points of view Research Questions  Can we construct predictive models without centralized access to data?  Can we learn in the presence of semantic gaps between user and data sources?  How do the results compare with the centralized setting? Learning from Distributed Data [Caragea et al., 2004]  Decompose learning into an interleaving of statistical queries and computation  Reduce learning classifiers from distributed data reduces to statistical query answering from distributed data under  Different types of data fragmentation  Different constraints on access and query capabilities  Different bandwidth and resource constraints Results  Efficient algorithms for learning predictive models from distributed data  Strong performance guarantees relative to centralized counterparts  Scalable implementations of the resulting algorithms Learning from Semantically Heterogeneous Distributed Data [Caragea et al., 2005]  Make data sources self-describing: ontology-extended data sources (OEDS)  Data source schema ontology  Data source content ontology  Establish semantic correspondences from data source ontology to user ontology  Query data sources from a user’s point of view User Ontology O U (is-a) Data Source Ontologies O 1 (is-a) O 2 (is-a) Mappings between Ontologies  Rainy : O 1 = Rain : O U  Snow : O 1 = Snow : O U  NoPrec : O U < Outlook : O 1  {Sunny, Cloudy} : O 1 = NoPrec : O U  Unit conversion (e.g. deg. F to deg. C) Results:  Tools for associating ontologies with data, specifying mappings between ontologies  Algorithms for querying distributed semantically heterogeneous data Learning from Partially Specified Data [Zhang et al., 2003, 2004, 2006]  Semantic gaps lead to partially specified data  Different data sources may describe data at different levels of abstraction  If the description of data at source is more abstract than what the user expects, additional statistical assumptions become necessary Results:  Efficient algorithms for learning concise predictive models from partially specified data under user-specified statistical assumptions INDUS: Open source software for building predictive models from distributed, semantically heterogeneous, autonomous data sources


Download ppt "Center for Computational Intelligence, Learning, and Discovery Artificial Intelligence Research Laboratory Department of Computer Science Supported in."

Similar presentations


Ads by Google