3rd ACES WG mtg. 2003/06/06 Brisbane. 1 Current Target Transcurrent Plate Boundary Preliminary Study –e.g. San Andreas Faults, CA.
3rd ACES WG mtg. 2003/06/06 Brisbane. 2 San Andreas Faults, CA. Transcurrent Plate Boundaries > 1,000 km US Geological Survey
3rd ACES WG mtg. 2003/06/06 Brisbane. 3 Problem Configuration Double Fault Patches for Initial Condition. 150~1,200km(Length)×45km(Depth). –705~6,000 parameters. Plate Motion : 50mm/yr. Earth Simulator (1-64 PEs).
3rd ACES WG mtg. 2003/06/06 Brisbane. 4 Overview Hashimoto Code –Tectonic stress accumulation simulation at transcurrent plate boundaries –Boundary Integral Method –Fault Length: 150km – 1200km
3rd ACES WG mtg. 2003/06/06 Brisbane. 5 Parallel Matrix Assembling for Linear EQNs: MRQCOF do ip= 1, PETOT is= (ip-1)*gN if (iflagM.eq.1) then do j= 1, gN wt= dydamatP(is+j)*sig2imatP(ip) !CDIR NODEP do k= 1, gM k1= gMTBL(k) gA2(j,k)= gA2(j,k) + wt*dydamatP(is+k1) enddo gB2(j)= gB2(j) + dymatP(ip)*wt enddo endif chisq= chisq + dymatP(ip)*dymatP(ip)*sig2imatP(ip) enddo Original if (iflagM.eq.1) then do ip= 1, PETOT is= (ip-1)*gN k= 1 k1= gMTBL(k) !CDIR NODEP do j= 1, gN wt= dydamatP(is+j)*sig2imatP(ip) gA2(j,k)= gA2(j,k) + wt*dydamatP(is+k1) gB2(j) = gB2(j) + wt*dymatP (ip) enddo do k= 2, gM k1= gMTBL(k) !CDIR NODEP do j= 1, gN wt= dydamatP(is+j)*sig2imatP(ip) gA2(j,k)= gA2(j,k) + wt * dydamatP(is+k1) enddo chisq= chisq + dymatP(ip)*dymatP(ip)*sig2imatP(ip) enddo else !CDIR NODEP do ip= 1, PETOT is = (ip-1)*gN chisq= chisq + dymatP(ip)*dymatP(ip)*sig2imatP(ip) enddo endif Optimized gM=gN/PETOT x gM additional computation for wt
3rd ACES WG mtg. 2003/06/06 Brisbane. 6 Matrix Component: FUNCS called NDATA times at every time step gs_d= 0.d0 do is= 1, stepj-1 if (dp_d.ne.0) then do p= 1, ma do it= 0, itcnt-1 if ((t(it).le.tau(stepj)-tau(is)).and. & (t(it+1).gt.tau(stepj)-tau(is))) then gst= gss(p,it) goto 111 endif enddo gst= gss(p,itcnt) 111 continue gs_d= gs_d + aaj(p,is)*gst enddo endif enddo if (itflag.eq.0) then do is= 1, stepj-1 do it= 0, itcnt-1 if ((t(it).le.tau(stepj)-tau(is)).and. & t(it+1).gt.tau(stepj)-tau(is))) then itCUR(is)= it goto 111 endif enddo itCUR(is)= itcnt 111 continue enddo endif... gs_d= 0.0d0 if (dp_d.ne.0) then do is= 1, stepj-1 !CDIR NODEP do p= 1, ma gs_d= gs_d + aaj(p,is)*gss(p,ITcur(is)) enddo endif Original Optimized Additional array “ITCUR(is)” is defined and this is calculated just once at every time step. Operations for computations of “gs_d” is very simple and easy to be optimized. “Subroutine FUNCS” is called “NDATA” times. “stepj” is current step number, therefore computational amount for this part is increasing as the simulation proceeds. “gst” only depends on time and location of parameter point.
3rd ACES WG mtg. 2003/06/06 Brisbane. 7 Results on Earth Simulator Single PE, 15 steps for 150km length region PROG.UNIT FREQUENCY EXCLUSIVE AVER.TIME MOPS MFLOPS V.OP AVER. I-CACHE O-CACHE BANK TIME[sec]( % ) [msec] RATIO V.LEN MISS MISS CONF mrqcof ( 64.3) funcs ( 28.3) srcinput ( 3.7) pgauss ( 2.2) quasi_static ( 0.8) consti_parameter ( 0.7) mrqmin ( 0.0) total (100.0) Original Optimized PROG.UNIT FREQUENCY EXCLUSIVE AVER.TIME MOPS MFLOPS V.OP AVER. I-CACHE O-CACHE BANK TIME[sec]( % ) [msec] RATIO V.LEN MISS MISS CONF funcs ( 46.2) mrqcof ( 42.1) srcinput ( 5.7) pgauss ( 3.5) consti_parameter ( 1.2) quasi_static ( 1.2) mrqmin ( 0.0) total (100.0) Computational time reduced dramatically. “MRQCOF” speed up in spite of larger amount of computation. Bank conflict in FUNCS.
3rd ACES WG mtg. 2003/06/06 Brisbane. 8 Array Access Pattern: FUNCS in order to avoid bank conflict idX1= idint(zz_d) do p= 1, ma ipp= idnint(dabs(kk(p)-xx_d)) if (ipp.gt.xmax0) then uu(p)= 0.d0 !CDIR NODEP do it= 0, itcnt gss(p,it)= 0.d0 enddo else idX2= idint(ll(p)/3.d0) uu(p)= u(ipp, idX1, idX2) !CDIR NODEP do it= 0, itcnt gss(p,it)= gs(ipp, idX1, idX2, it) enddo endif enddo idX1= idint(zz_d) it= 0 !CDIR NODEP do p= 1, ma ipp= idnint(dabs(kk(p)-xx_d)) if (ipp.gt.xmax0) then uu (p) = 0.d0 gss(p,it)= 0.d0 else idX2= idint(ll(p)/3.d0) uu (p) = u (ipp, idX2, idX1) gss(p,it)= gs(ipp, it, idX2, idX1) endif enddo do it= 1, itcnt !CDIR NODEP do p= 1, ma ipp= idnint(dabs(kk(p)-xx_d)) if (ipp.gt.xmax0) then gss(p,it)= 0.d0 else idX2= idint(ll(p)/3.d0) gss(p,it)= gs(ipp, it, idX2, idX1) endif enddo Original Optimized Innermost loops “it” -> “p” for “gss(p,it)”. “gss(ipp,idX1,idX2,it)” -> “gss(ipp,it,idx1,idX2)”.
3rd ACES WG mtg. 2003/06/06 Brisbane. 9 Results on Earth Simulator Single PE, 15 steps for 150km length region PROG.UNIT FREQUENCY EXCLUSIVE AVER.TIME MOPS MFLOPS V.OP AVER. I-CACHE O-CACHE BANK TIME[sec]( % ) [msec] RATIO V.LEN MISS MISS CONF mrqcof ( 64.3) funcs ( 28.3) srcinput ( 3.7) pgauss ( 2.2) quasi_static ( 0.8) consti_parameter ( 0.7) mrqmin ( 0.0) total (100.0) Original Optimized PROG.UNIT FREQUENCY EXCLUSIVE AVER.TIME MOPS MFLOPS V.OP AVER. I-CACHE O-CACHE BANK TIME[sec]( % ) [msec] RATIO V.LEN MISS MISS CONF funcs ( 46.2) mrqcof ( 42.1) srcinput ( 5.7) pgauss ( 3.5) consti_parameter ( 1.2) quasi_static ( 1.2) mrqmin ( 0.0) total (100.0) PROG.UNIT FREQUENCY EXCLUSIVE AVER.TIME MOPS MFLOPS V.OP AVER. I-CACHE O-CACHE BANK TIME[sec]( % ) [msec] RATIO V.LEN MISS MISS CONF mrqcof ( 60.2) funcs ( 22.8) srcinput ( 8.4) pgauss ( 5.1) quasi_static ( 1.8) consti_parameter ( 1.7) mrqmin ( 0.0) total (100.0) Final
3rd ACES WG mtg. 2003/06/06 Brisbane. 10 Results on Earth Simulator Single PE, 50 steps for 150km length region OriginalOptimized Real Time (sec) : User Time (sec) : System Time (sec) : Vector Time (sec) : Instruction Count : Vector Instruction Count : Vector Element Count : FLOP Count : MOPS : MFLOPS : Average Vector Length : Vector Operation Ratio (%) : Memory size used (MB) : MIPS : Instruction Cache miss (sec): Operand Cache miss (sec): Bank Conflict Time (sec): Final Real Time (sec) : User Time (sec) : System Time (sec) : Vector Time (sec) : Instruction Count : Vector Instruction Count : Vector Element Count : FLOP Count : MOPS : MFLOPS : Average Vector Length : Vector Operation Ratio (%) : Memory size used (MB) : MIPS : Instruction Cache miss (sec): Operand Cache miss (sec): Bank Conflict Time (sec): Real Time (sec) : User Time (sec) : System Time (sec) : Vector Time (sec) : Instruction Count : Vector Instruction Count : Vector Element Count : FLOP Count : MOPS : MFLOPS : Average Vector Length : Vector Operation Ratio (%) : Memory size used (MB) : MIPS : Instruction Cache miss (sec): Operand Cache miss (sec): Bank Conflict Time (sec): Real Time (sec) : User Time (sec) : System Time (sec) : Vector Time (sec) : Instruction Count : Vector Instruction Count : Vector Element Count : FLOP Count : MOPS : MFLOPS : Average Vector Length : Vector Operation Ratio (%) : Memory size used (MB) : MIPS : Instruction Cache miss (sec): Operand Cache miss (sec): Bank Conflict Time (sec): SR PEs 2205.sec sec sec.
3rd ACES WG mtg. 2003/06/06 Brisbane. 11 Results on Earth Simulator Single PE, 5 steps for 300km length region Optimized PROG.UNIT FREQUENCY EXCLUSIVE AVER.TIME MOPS MFLOPS V.OP AVER. I-CACHE O-CACHE BANK TIME[sec]( % ) [msec] RATIO V.LEN MISS MISS CONF mrqcof ( 52.1) funcs ( 39.7) pgauss ( 5.7) srcinput ( 1.6) consti_parameter ( 0.5) quasi_static ( 0.2) mrqmin ( 0.0) total (100.0) PROG.UNIT FREQUENCY EXCLUSIVE AVER.TIME MOPS MFLOPS V.OP AVER. I-CACHE O-CACHE BANK TIME[sec]( % ) [msec] RATIO V.LEN MISS MISS CONF mrqcof ( 72.8) funcs ( 15.7) pgauss ( 8.0) srcinput ( 2.3) consti_parameter ( 0.7) quasi_static ( 0.3) mrqmin ( 0.0) total (100.0) Final
3rd ACES WG mtg. 2003/06/06 Brisbane. 12 Results on Earth Simulator 16PEs, 20 steps for 1200km length region Global Data of 16 processes : Min [U,R] Max [U,R] Average =========================== Real Time (sec) : [0,4] [0,8] User Time (sec) : [0,14] [0,1] System Time (sec) : [0,9] [0,14] Vector Time (sec) : [0,14] [0,8] Instruction Count : [0,8] [0,6] Vector Instruction Count : [0,14] [0,8] Vector Element Count : [0,7] [0,8] FLOP Count : [0,3] [0,0] MOPS : [0,7] [0,14] MFLOPS : [0,3] [0,14] Average Vector Length : [0,8] [0,14] Vector Operation Ratio (%) : [0,6] [0,8] Memory size used (MB) : [0,1] [0,0] MIPS : [0,8] [0,6] Instruction Cache miss (sec): [0,7] [0,0] Operand Cache miss (sec): [0,12] [0,0] Bank Conflict Time (sec): [0,15] [0,0] Optimized
3rd ACES WG mtg. 2003/06/06 Brisbane. 13 Results on Earth Simulator 16PEs, 20 steps for 1200km length region Global Data of 16 processes : Min [U,R] Max [U,R] Average =========================== Real Time (sec) : [0,4] [0,8] User Time (sec) : [0,2] [0,1] System Time (sec) : [0,5] [0,2] Vector Time (sec) : [0,6] [0,8] Instruction Count : [0,8] [0,6] Vector Instruction Count : [0,7] [0,8] Vector Element Count : [0,7] [0,12] FLOP Count : [0,3] [0,0] MOPS : [0,7] [0,2] MFLOPS : [0,4] [0,2] Average Vector Length : [0,8] [0,7] Vector Operation Ratio (%) : [0,6] [0,8] Memory size used (MB) : [0,3] [0,0] MIPS : [0,8] [0,6] Instruction Cache miss (sec): [0,4] [0,0] Operand Cache miss (sec): [0,3] [0,0] Bank Conflict Time (sec): [0,15] [0,0] Final
3rd ACES WG mtg. 2003/06/06 Brisbane. 14 Results on Earth Simulator Parallel Efficiency for 1st Linear Step Final ● : L=150km, ○ : L=300km, ■ : L=450km, □ : L=600km, ▲ : L=1200km Original Length of the Innermost Loop= m/PE