Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDM center Questions – Dave Nelson What kind of processing / queries / searches biologists do over microarray data? –Range query on a spot? –Range query.

Similar presentations


Presentation on theme: "SDM center Questions – Dave Nelson What kind of processing / queries / searches biologists do over microarray data? –Range query on a spot? –Range query."— Presentation transcript:

1 SDM center Questions – Dave Nelson What kind of processing / queries / searches biologists do over microarray data? –Range query on a spot? –Range query on multiple spots? –Correlation between spots? 10,000 arrays x 40,000 spots x 10 values/spot x = 4 billion values –Organize with 40,000 indexes over 10,000 values? –Organize as 4 billion values into one index? Explain problem of chip variation –How to pick winner of 10 4 tests How do you represent pathways in a data structure?

2 SDM center Question – P1 Who will do “workflow”? –NCSU, SDSC, GATech? –Why not use NCSU –What about WFS from W3C? Why have Xwrap composition –Can use workflow? How would dynamic workflow fit the services flow model?

3 SDM center Question – P1 Is there a chance that Matt will really use that in a year for real work? (Same question to P2, P3, P4)

4 SDM center Question – P2 Astrophysics –3D Hydro run – 20 TB 5 variables x (1024) 3 x 1000 time steps x 4 bytes –Can you apply data reduction techniques to that? –How long will it take? –Reduced data: 100:1 factor => 200 GB –Can you visualize that?

5 SDM center Questions – P2 Monitoring (Ghaleb) –How can you minimize monitoring interference? –Relationship to Grid monitoring activities Grid Forum WG: Defining schemas now

6 SDM center Questions – P3 How do you migrate a terabyte of data? –e.g. 1000 file each a gigabyte –At 10 MB/s – 10 5 seconds = 30 hours – OK? –At 1 MB/s - 300 hours - OK? –Is using HRMs and DRMs an OK solution?

7 SDM center Questions – P3 How do you do filtering of NetCDF files from HPSS (Randy)? –Can you select one variable (temp) over pacific ocean for 10 years? –With Hsi – pick “temp” value out of 30 variable? –Would it make sense for HRM to call NetCDF library instead? (discussion with Dean Williams)

8 SDM center Questions – P4 PVFS, MPI-IO, ROMIO –When to use what? –Give examples –Can they be layered? How? –When will you layer them? How would you use these over GridFTP?

9 SDM center Questions – P4 Suppose you got a hint to organize NetCDF files as one variable at a time. –How would you use that? Reorganize dataset over all time steps? How? Is using tiled data sufficiently effective so no reorganization is needed? AMR (Wei-Kang) –Tree structure, how stored? –How does it relate to Collela’s work?

10 SDM center Question – P4 If security is not important –Can SRMs use your specialized FTP? –What needs to be install on both ends? (can it be done dynamically?) –If you move data very effectively Why bother if network is bottleneck OK to hog network? What about doing simple concurrent GridFTP?

11 SDM center Organizational issues Next meeting –September (conflicts?) –San Diego –Reg fee - $100 Conference calls –Format OK? Useful? –Attendance? –Reports on web? Wedge white paper –Based on “study” – National Academy of Sciences –Something we are interested in seeing funded –Not an institutional view

12 SDM center General Intellectual property –DOE position: Layered collection of licenses –Do we have to have an open source license? –Leave it to project by project –Get DOE policy statement to all – ask Fred Johnson

13 SDM center General Public relations –Presence in conferences and meetings Tutorials in next all hands On-line tutorials Tutorials in conferences - Arie Acknowledgement in papers Consider workshop on addition to all hands –Web site –Conf calls – lab notebook –Services, products?

14 SDM center General Supercomputing –Integrate projects – one story for each of our project –Assuming centralized booth –Common view – a poster at least –Arie to ask for space! What’s the product of SDM center –Services (backed up by products) –Based on wsdl/soap? –Also “components”

15 SDM center Wedge white paper Data preservation across DOE Moving terabytes for security reasons –Evacuation / natural disaster –Move data to an emergency team Automatic replication –Resilience in case of a cyber-attack –Metadata – replicated / distributed –Look at Garcia-Molina Identify scientific experts –Automatic collections of metadata, etc. –Data mining for discovering Sensors in sensitive locations –Event monitoring of sensor databases –Data mining from image data –Soft sensors Collected from multiple distributed sesnsors

16 SDM center P3 - Next Getting ORNL involved with BNL and NERSC –Plan to get HRM to have up to a TBs –Plan to get tape cartridges up 3-5 TBs –Does PVFS work on Solaris? –Or run HRM on Linux?

17 SDM center P3 - Scenarios Bulk data transfer scenario –Have HRMs used to replicate STAR data Demo: show performance Develop: dynamic log visualization –Experimentation Parallel striping Partial file transfers Filtering of files (NetCDF, HDF)

18 SDM center P3 - Scenarios HENP data analysis scenario –STAR Analysis framework on BNL & NERSC –Replicate “medium” level access files in ORNL –Use bitmap index –Long term – use PAM –Demo: select files from the most accessible location


Download ppt "SDM center Questions – Dave Nelson What kind of processing / queries / searches biologists do over microarray data? –Range query on a spot? –Range query."

Similar presentations


Ads by Google