Correlation: Reform Voters vs Votes for Buchanan Palm Beach
2. Internet Infrastructures for Data Data Webs, Semantic Webs, Data Grids, Distributed Data Mining, Digital Libraries and all that
Data Mining Data mining is the semi-automatic extraction of patterns, models, changes, associations, and anomalies from large data sets. data mining algorithm <tree-node node-id=8 threshold = 0.239494 etc. > learning set statistical model
Data Mining Process - End to End Viewpoint NCAR WHO Phase 1. Exploratory Analysis Phase 2. Data Analysis & Mining Phase 3. Deployment & Decision 50%0%50% DataSpace
DataSpace – One Approach to Making Data Useful 16 terabytes of documents 4 billion documents Todays Multi-media Web Tomorrows Data Web petabytes of data tens of billions to trillions of records html http search by keyword workstations servers pmml & dtml dstp correlate & mine data & compute clusters Complementary to the grid, which we view as a distributed computer.
View Data as a Collection of Distributed Columns
Data Servers and Data Browsers NCAR data in Boulder WHO data in Geneva DataSpace Data browser in Chicago
attributes [aid] UCK [uckid] k[i], y[j] k[i], x[i] DSTP Server 1 DSTP Server 2 Click to obtain graph
Terra Mining Testbed Optical testbed for distributed tera mining of scientific data. Goal also to be testbed for broadband based business services.
Lessons Learned 1.Its the data stupid. Cycles, cylinders & lambdas are all commodities. 2.The fundamental challenge: lower the cost to make data useful. 3.The emergence of internet infrastructure for data is inevitable. Opens up possibilities for new types of scientific discoveries.
For More Information DataSpace http://www.dataspaceweb.net http://www.ncdm.uic.edu DataSpace Standards http://www.dmg.org Selected articles http://www.twocultures.net Magnify –http://www.magnify.com