Multi-core and tera-scale computing - Andrzej Nowak, CERN3 The “free” bonus > Silicon technology advances more quickly than design capabilities > Single CPU complexity is rising slowly > Moving from 90nm and 65nm processes to 45nm and 32nm processes > Free transistors available Take all you want… eat all you take
Multi-core and tera-scale computing - Andrzej Nowak, CERN4 The multi-core revolution > What do we do with extra silicon? Copy what we already have > First shot at the PC consumer market – Intel’s Hyper- Threading in the Xeons and Pentium 4 (SMT) Idea: do work when nothing is happening Some resources in the CPU core were shared The relation to extra space on die was not direct > First popular dual-core CPU for Joe Average – the Intel Core Duo Idea: copy a big part of the processor Less resources are shared > Next generations of x86-like CPUs are coming 6, 8, 16 cores
Multi-core and tera-scale computing - Andrzej Nowak, CERN5 Multi-core designs > Many other multi-core CPUs are on the market AMD x2 (and x4 coming soon) ARM specifications for multi-core CPUs (your iPod is dual core!) Sun’s Niagara processor (8 cores) Cell processor in Playstation 3 units > Programmers need to take advantage of the new features CERN openlab and Intel are organizing a multi- threading and parallelism workshop on the beginning of October!
Multi-core and tera-scale computing - Andrzej Nowak, CERN6 Tera-scale computing > Computer performance is traditionally expressed in FLOPS (floating point operations per second) CDC 6600 (1966) – 10 MFLOPS, 64kB memory Your iPod – 100 MFLOPS Your iMac – 3-4 GFLOPS Your graphics card: 300-500 GFLOPS > Not so far from the magical limit - 1 Teraflop…? Hence the name, tera-scale
Multi-core and tera-scale computing - Andrzej Nowak, CERN7 Processors in GPUs (digression) > Newest trend – heavily multi-core (up to 128) > Blazing fast > Toolkits available (i.e. NVIDIA CUDA) > But… Floating point operations are not precise enough or non-standard Data types are limited Memory handling is not optimized for general purpose computing Tiny cache, if at all ~150W… for the chip only
Multi-core and tera-scale computing - Andrzej Nowak, CERN8 Tera-scale computing ctd. > Intel’s Polaris 80-core prototype ~1 TFLOPS > Intel’s Larrabee design 16-24 core x86-GPU hybrid ~3 TFLOPS > Research directions How do you feed 80 hungry cores? Parallelism – fine grained or coarse? Effective virtualization Memory access and bus optimization Resource sharing
Multi-core and tera-scale computing - Andrzej Nowak, CERN9 Questions for the future > How many cores does your mother need? > How many cores do you, a scientist, need? > How do you effectively use what you have? > What is the best level to introduce parallelism? Do you need to redesign your software? > GRID computing or tera-scale homogenous computers? Will virtualization be effective enough?
Q&A (1 Swiss minute) This research project has been supported by a Marie Curie Early Stage Research Training Fellowship of the European Community’s Sixth Framework Programme under contract number (MEST-CT-2004-504054)