Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable systems for reservoir modeling on modern hardware platforms Dmitry Eydinov SPE London. November, 24 th 2015.

Similar presentations


Presentation on theme: "Scalable systems for reservoir modeling on modern hardware platforms Dmitry Eydinov SPE London. November, 24 th 2015."— Presentation transcript:

1 Scalable systems for reservoir modeling on modern hardware platforms Dmitry Eydinov SPE London. November, 24 th 2015.

2 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being a simple material balance estimator to full-physics numerical simulators As we follow the development of simulations over time, the models become more demanding and more complex: rock properties, fluids and reservoir description, wells models, surface network, compositional and thermal effects, EORs, etc. Grid dimensions are based on available resources and project time frames Proper uncertainty analysis is often skipped due to limited time.

3 Grid Resolution Effects Fine (1m x 50m x 0.7 m) Coarse (2m x 50m x 0.7m)

4 Moore’s Law “The number of transistors in a dense integrated circuit doubles approximately every two years” - Gordon Moore, co-founder of Intel, 1965

5 Evolution of microprocessors Only the number of transistors/cores continue to rise!

6 2005 - First Serial Multicore CPU’s In old clusters all computational cores are isolated by distributed memory (MPI required). Most of the conventional algorithms are designed based on this paradigm. With the shared memory systems all cores communicate directly, which is significantly faster than communication between the cluster nodes. Simulation software has to take it into account to maximize parallel performance.

7 7 Climate modeling, weather forecasting Digital content Financial analysis Space technologies Medicine Technical design HPC for Numerical Modeling All industries run massive high-performance computing simulations on a daily basis

8 In the meantime, in the reservoir simulations…

9 Desktops and Workstations Shared memory systems: Fast interactions between the cores No need to introduce grid domains The system of equations can be solved directly on the matrix level * Other names and brands may be claimed as the property of others Up to 30MB Shared Cache Intel Xeon Processor E5 v3 4 channels of up to DDR3 2133 MHz memory Intel Xeon Processor E5 v3 DDR3DDR3 Up to 30MB Shared Cache DDR3DDR3 DDR3DDR3 DDR3DDR3 DDR3DDR3 DDR3DDR3 DDR3DDR3 DDR3DDR3 4 channels of up to DDR3 2133 MHz memory up to 18 cores per CPU

10 Shared memory: blocks are selected automatically on the matrix level Non-Uniform Memory Access: memory is allocated dynamically through NUMA Hyperthreading: system threads accessed directly Fast CPU cache: big enough to fit matrix blocks All parts of code are parallel: not just linear solver Special compiler settings The software: for maximum performance the following hardware features are used: “Bandwidth machine“: up to 51GB/s ( ~10 times the Infiniband speed) DDR3 QPI NUMA Desktops and Workstations

11 High-end Desktops and Workstations Speed-up vs. single core Number of threads 2011: Dual Xeon X5650, (2x6) 12 cores, 2.66GHz, 3 channels DDR3 1333 MHz (e.g. HP Z800) 2012: Dual Xeon E2680, (2x8) 16 cores, 2.7GHz, 4 channels DDR3 1600 MHz (e.g. HP Z820) 2013: Dual Xeon E2697v2, (2x12) 24 cores, 2.7GHz, 4 channels DDR3 1866 MHz (e.g. HP Z820) 2014: Dual Xeon E2697v3, (2x12) 28 cores, 2.6GHz, 4 channels DDR4 2133 MHz (e.g. HP Z840)

12  10-core Xeon E5v2 2.8GHz  8 dual CPU nodes with 160 cores in total (= 8 workstations connected with Infiniband 56Gb/s)  1.024TB of DDR3 1866GHz memory ~ $75K 300 million Models with up to 300 million active grid blocks 80-100 Parallel speed-up ≈ 80-100 times Modern HPC clusters are not as complex as space shuttles anymore

13 Hybrid algorithm. Removing the bottlenecks. Solver MPI OS Threads matrix NUMA cluster network ~ 5GB/s ~ 50GB/s SPE 163090  Simulator solver software integrates both MPI and threads system calls  Node level: the parallelization between CPU cores is done on the level of solver matrix using OS threads  As a result, the number of MPI processes is limited to the number of cluster nodes, not the total number of cores Cluster node with 2 CPUs This removes one of the major performance bottlenecks – network throughput limit!

14 Model grid domains Suppose we have  Model: 3 phase with 2,5 mln active grid cels  Cluster: 10 nodes x 20 cores = 200 cores in total Conventional MPIMultilevel Hybrid method 200 grid domains exchanging boundary conditions 10 grid domains exchanging boundary conditions

15 Memory footprint for 8-core/node cluster Hybrid needs 5 times less memory for 64 nodes Number of nodes Total memory used, GB Memory usage

16 Number of cores Acceleration Xeons X5650Xeons E5-2680v2 Old cluster: 20 dual (12 core) nodes, 40 Xeons X5650, 240 cores, 24GB DDR3 1333MHz, Infiniband 40Gb/s New cluster: 8 dual (20 core) nodes, 16 Xeons E5-2680v2, 160 cores,128GB DDR3 1860MHz, Infiniband 56Gb/s Cluster Parallel Scalability

17 Top 20 cluster:  512 nodes used  Dual Xeon 5570  4096 cores  DDR3 1333MHz  21.8 million active blocks  39 wells Testing the limits Number of cores Acceleration 1328 3 phase “black oil” – 1328 times From 2,5 weeks to 19 minutes SPE 163090

18 Xeons E5-2680v2 3.2kW Easy to install – easy to use Bosch TWK 7603 3.0kW Tefal FV9630 2.6kW In house clusters: Can be installed in a regular office space Take only 4-6 weeks to build Need air-conditioned room and LAN connection Significantly more economical than 5-10 years ago Xeons X5650 6.4kW

19 In-house Cluster Setup Network Cluster network Shared storage Cluster nodes Users Head node GUI Data Control Dispatcher GUI

20 User Interface Job queue management (start, stop, results view) Full graphics simulation results monitoring at runtime (2D, 3D, wells, perforations, 3D streamlines)

21 Project Workload Typically strongly non-uniform due to decision making cycles in the companies The peaks require significant investment in the computational resources

22 Amazon Cloud Map Thousands of CPUs/cluster nodes can be accessed in the clouds for a very reasonable price

23 How does it work in the clouds?  Users choose how many nodes/cores they would like to use  Software get installed automatically in several minutes  Data to be uploaded once in a packed format and then the models can be changes directly in the cloud storage  All files in the cloud are encrypted to ensure the data security  Simulation results are visualized directly on a remote workstation connected to the cluster nodes  When the simulations are complete, the data can be deleted or left in the cloud storage  Users charged just for the time they access the technology

24  Three-phase black oil model  Complex geology with active gas cap  Production history – 45 years  Number of producers and injectors – 14,000 (vertical, inclined, horizontal)  2,7 billion tons of oil produced  8+ reservoir volumes injected SPE 171226 Case Study #1. Giant Field.

25  To select optimal spatial grid resolution 4 grids have been generated  Original model 150m х 150m 7 mil. blocks (4.5 mil. active)  Lateral grid refinement 50m х 50m 70 mil. blocks (40 mil. active)  Vertical grid refinement (4 times) 50m х 50m 280 mil. blocks (160 mil. active)  Vertical grid refinement (10 times) 50m х 50m 700 mil. blocks (404 mil. active) Model 7 million Model 70 million Model 700 million Model 280 million Reduction of block sizes in XY by 3 times Reduction of block sizes in Z by 4 times Reduction of block sizes in Z by 10 times Case Study #1. Giant Field.

26 The most complex cases were run using a massive cluster: 64 nodes 2 CPU Xeon E5 2680v2 2.8GHz per node 4 channels DDR3 with 128GB 1866MHz per node Local network - FDR Infiniband 56 Gb/s Total: 1280 CPU cores RAM 8.2TB, 200TB of disk space Active blocks Number of well perforations Total memory size Total CPU time for 1280 core cluster 40 mil 0.13 mil120 GB5 hours 30 min 162 mil 0.51 mil561 GB54 hours 04 min 404 mil 1.28 mil1.29 TB490 hours Taking into account the number of active grid blocks, well perforations, and history this is one of the World’s most complex dynamic models Case Study #1. Giant Field.

27  The distributions of reservoir pressure in grid blocks located directly beneath the gas cap calculated at the last historic time step are compared  A systematic shift of the reservoir pressure distributions for grid blocks under the gas cap are observed Reservoir pressure (bars) Relative frequency 3 - 4 bars shift Case Study #1. Giant Field. Bottom hole pressure (40 mil.) Producers To produce the same historic amount of liquid, for the model with 162 mil. active grid blocks more intense pumping is needed!!! Bottom hole pressure (162 mil.)

28  The presence of additional sub-layers in the model with 162 mil. active grid blocks causes reservoir liquids to be produced first from high permeability layers with typically higher bottom hole pressure at producers as compared to 40 mil. model  Then, with more liquids extracted, the production starts to affect layers with lower permeabilities and thus require reduced bottom hole pressure at production wells oil 40 mil. active 162 mil. active water gas Comparison of average pressure dynamics shown separately for different phases Case Study #1. Giant Field. SPE 171226

29 Case Study #2* Key objective: Maximise the value of the asset to the business. Optimise the development plan with account for uncertainty. Target the P70 value of the NPV. *From “Computer Optimisation of Development Plans in the Presence of Uncertainty” by Dr Jonathan Carter, Head Technology and Innovation Center for Exploration and Production, E.ON

30 Case Study #2. Conclusions* *From “Computer Optimisation of Development Plans in the Presence of Uncertainty” by Dr Jonathan Carter, Head Technology and Innovation Center for Exploration and Production, E.ON We used about 34,000 simulations, over a three week period. 31 nodes each with 16 cores We estimate that another well known simulator would have needed almost a whole year to complete the same task The final solution obtained is only slightly better than the engineer designed case (about 2%) The total effort was much reduced: Easier to set up 31 models to cover the uncertainty, rather than meetings about what the reference case should look like. Most of the strain was taken by the computer, leaving the engineer free to do other things. The final optimised well placements have some interesting features that challenge the normal design process

31 Geological variables Dynamic variables Probabilistic forecast with account for uncertaintyIntegrateduncertaintystudy Key objective: Probabilistic production forecast for a big field with account for uncertainty based on the defined development scenario for 25 years period Case Study #3

32 P10P50 P90 3 various structure models 300 geological Models (100 realizations for each of the structure models) 8100 simulation models 83 history matched colutions Probabilistic forecast Case Study #3

33 HPC cluster (96 nodes, 1920 cores) – great assistant for developing our good ideas 8100 simulation models for HM cycles in less than two days! Two weeks for the whole scope of work. Case Study #3

34 Conclusions The standard bottlenecks for parallel performance in the reservoir simulations are mostly on the software side. They can be solved by application of the modern software products that properly handle the modern hardware architecture. Today, the hardware and software technology allows to reach parallel acceleration rates of 100, 300 and even 1000+ times Technically the industry has everything to move towards o finer geological grids o uncertainty assessment workflows without significant growth in the project time frames and investment and dramatic investment in the computational resources

35 Thank you!

36 Simulations on GPU Deceleration with respect to 2 Xeon E5-2680 The matrices are sparse, lots of empty spaces and therefore it is rather difficult to enable all kernels of GPUs… Memory and I/O of Xeon E5 seem to handle it much better Matrix number Deceleration CUDA TESLA did well on one!! matrix


Download ppt "Scalable systems for reservoir modeling on modern hardware platforms Dmitry Eydinov SPE London. November, 24 th 2015."

Similar presentations


Ads by Google