Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAXS Scatter Performance Analysis CHRIS WILCOX 2/6/2008.

Similar presentations


Presentation on theme: "SAXS Scatter Performance Analysis CHRIS WILCOX 2/6/2008."— Presentation transcript:

1 SAXS Scatter Performance Analysis CHRIS WILCOX 2/6/2008

2 Scatter Status Prototype of basic algorithm, arbitrary number of atoms and topology. Atom types: C, N, O, H, P, S, Zn, and very easy to add more. Matches results with original R prototype from Stefan, for several small molecules. Computes intensity function divided into specified number of steps.

3 Scatter Performance (Current) Original algorithm, no optimization, debug version: 5000 atoms = ~ 60 hours Original algorithm, no optimization, release version: 5000 atoms = ~ 4 hours Obvious restructuring, pre-compute factors, release version: 5000 atoms = ~39 minutes. Avoid redundant work, compiler flags, release version: 5000 atoms = ~19 minutes. Pentium Core Duo, mobile CPU, 166Mhz

4 Scatter Performance (Analysis) Scatter factors are pre-computed, requires ~0% of the fastest calculation. Distance calculations are step independent, requires ~3% only because of SQRT function. FSIN function appears to be consuming ~60% of processor cycles, is there an alternative? Intensity calculation itself uses ~86% of the cycles, need to verify again on latest calculation. No real optimization yet, compiler wins anyway!

5 Scatter Performance (Model) N = # of atoms, S = # of steps, A = # of type s Scatter factors are O(SA) * (4 exp+4 pow+4 fmul), i.e. 10K iterations for 1000 steps, 10 types. Distance math is O(N 2 /2) * (1 sqrt+3 fmul+2 fadd), i.e. 12.5M iterations for 1000 steps, 5000 atoms. Intensity math is O(SN 2 /2) * (1 fsin+9 fmul+2 fadd), i.e. 12.5G iterations for same as previous. Operations shown are based on code reading, actual floating point instructions are ~2X more frequent.

6 Scatter Performance (Future) Complete optimizations, convert sine function to lookup table: 5000 atoms = ~500 seconds? Find faster floating point performance, not hard to beat by 8x: 5000 atoms = ~60 seconds? Intensity calculations are independent, so use more processors: 5000 atoms = ~10 seconds? Question: How many molecules need to be run to represent non-rigid structure?

7 Next Steps (Short Term) Add precise timing, develop model to predict performance for arbitrary number of atoms. Analyze instructions in inner loop of scatter, but may be impossible to improve on compiler. Extend to read.pdb file format, or integrate with existing Python code. Try on processor with better floating point, or on parallel machine, what is required to do this? Project setup takes precedence for several weeks.

8 Next Steps (Long Term) Close the loop with experimental data on known molecule, algorithms changes as necessary. Develop streaming version of program that accepts multiple molecules and averages. New program for modeling elastic topology, previously called “parametric” model. Investigate change to streaming architecture, may prototype simple framework user interface.


Download ppt "SAXS Scatter Performance Analysis CHRIS WILCOX 2/6/2008."

Similar presentations


Ads by Google