Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002 Dimension.

Similar presentations


Presentation on theme: "Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002 Dimension."— Presentation transcript:

1 Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002 Dimension Reduction and Sampling First SDM ISIC All-Hands Meeting UCRL. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract W-7405-Eng-48.

2 Dimension Reduction and Sampling at LLNL-2 CASC The SDM ISIC aims to minimize the effort researchers spend in managing their data l LLNL is participating in several of the tasks, including —data mining to improve the management of data l Problem: data from simulations and experiments is high dimensional (i.e. many features) l Querying the features can help in understanding the data — but, searching in a high-dimensional space is difficult l May want to cluster similar objects for efficient access —but, clustering is expensive in high dimensions  We plan to address the problem of high dimensionality using techniques for dimension reduction and sampling originally developed in data mining.

3 Dimension Reduction and Sampling at LLNL-3 CASC Our work on dimension reduction will help both data management and mining l Reducing the dimensions will improve —searching (task 3.1, LBNL) —clustering (task 2.1, ORNL) l Dimension reduction is expensive if many data items —use a sample of the data items —techniques for sampling in presence of rare events l We will focus on climate and high-energy-physics data —complements work at ORNL (climate), LBNL (HEP) —but, techniques applicable to other data as well  We only report the.8 FTE work funded under SciDAC; however, our data mining research is more extensive. See www.llnl.gov/casc/sapphire

4 Dimension Reduction and Sampling at LLNL-4 CASC There are two different ways in which we can view dimension reduction l Reduce the number of features representing a data item l Reduce the number of basis vectors used to describe the data: if some of the are small, they can be ignored Features Features Data items

5 Dimension Reduction and Sampling at LLNL-5 CASC Our work on climate data focuses on reducing the number of basis vectors l Domain expert Dr. Benjamin Santer (LLNL climate) l Climate scientists are interested in understanding the change in the earth’s surface temperature l Simulated and observed data are mixtures of volcano, El Niño, and other effects l Our goal is to separate the signals corresponding to different effects —traditional approaches such as principal component analysis (PCA) have not worked —separation difficult as El Chichón and Pinatubo volcano eruptions coincided with El Niño events —our approach is to use independent component analysis (ICA)  Dimension reduction supporting scientific discovery

6 Dimension Reduction and Sampling at LLNL-6 CASC The raw data is as monthly temperatures on a 144x73 spatial grid on 17 vertical levels ICA Volcano El Niño Other effects January 1979 raw temperatures (Kelvin) on the 144x73 latitude by longitude grid at 1000hPa pressure level. Data from NCEP.

7 Dimension Reduction and Sampling at LLNL-7 CASC Initially, we applied ICA to global monthly mean anomaly temperatures Time series of global monthly mean anomalies, Jan 1979 - Dec 2000 17 vertical levels level1: 1000hPa, lowest altitude level17: 10hPa, highest altitude

8 Dimension Reduction and Sampling at LLNL-8 CASC Next, we ran experiments with simulated data to understand the behavior of ICA (i) Two original sources (ii) Two mixed signals from the original ICA estimates correctly the shapes of the two independent components (ICs). With additional processing, we can also estimate the relative contributions of the two ICs in the two mixed signals. (iii) Sources (ICs) recovered from (ii) ICA mix

9 Dimension Reduction and Sampling at LLNL-9 CASC Original decomposition of the two mixed signals (-): sine (--) and volcano (-.) (i) Signal 1 (ii) Signal 2

10 Dimension Reduction and Sampling at LLNL-10 CASC l After proper post-processing, ICA estimates remarkably well the underlying independent components and their appropriate contributions in the mixed signals (i) Signal 1 (ii) Signal 2 ICA decomposition of the two mixed signals (-): sine (--) and volcano (-.)

11 Dimension Reduction and Sampling at LLNL-11 CASC ICA can also separate “noise” used as an extra component in the mixing 3 original sources 3 mixed signals 3 estimated ICs mix ICA

12 Dimension Reduction and Sampling at LLNL-12 CASC Original decomposition of 3 mixed signals (-) : El Niño (--), volcano (-.), and noise (..) Cooling in global series at the arrow is in fact a combination of an ENSO warming and a volcano cooling. Without the volcano eruption, the El Nino warming would dominate, resulting in warmer global temperatures. (i) Signal 1 (ii) Signal 2 (iii) Signal 3

13 Dimension Reduction and Sampling at LLNL-13 CASC ICA decomposition of 3 mixed signals (-): El Niño (--), volcano (-.), and noise (..) Although not perfect in terms of the exact amplitudes, ICA clearly separates the cooling effect of the volcano from the warming effect of El Nino. (i) Signal 1 (ii) Signal 2 (iii) Signal 3

14 Dimension Reduction and Sampling at LLNL-14 CASC Our future plans include work with HEP data and collaborators at ORNL and LBNL l Complete the work on the climate problem —our results with artificial data are encouraging —identify appropriate ICA model for climate data l Make the ICA software accessible to SciDAC scientists l Try ICA and other dimension reduction techniques in the context of the STAR high-energy-physics data —reduce number of features —investigate sampling to reduce computation —collaborate with LBNL (data, searching) l Investigate incremental PCA —monitor climate simulations using indices based on the principal components —collaborate with ORNL (data, clustering)


Download ppt "Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002 Dimension."

Similar presentations


Ads by Google