Presentation on theme: "The HPEC Challenge Benchmark Suite"— Presentation transcript:
1 The HPEC Challenge Benchmark Suite Ryan Haney, Theresa Meuse, Jeremy Kepner and James LebakMassachusetts Institute of Technology Lincoln LaboratoryHPEC 2005The title of this talk is, “The HPEC Challenge Benchmark Suite.”This work is sponsored by the Defense Advanced Research Projects Agency under Air Force Contract FA C Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.11
2 Acknowledgements Lincoln Laboratory PCA Team Shomo Tech Systems Matthew AlexanderJeanette Baran-GaleHector ChanEdmund WongShomo Tech SystemsMarti BancroftSilicon Graphics IncorporatedWilliam HarrodSponsorRobert Graybill, DARPA PCA and HPCS Programs
3 HPEC Challenge Benchmark Suite PCA program kernel benchmarksSingle-processor operationsDrawn from many different DoD applicationsRepresent both “front-end” signal processing and “back-end” knowledge processingHPCS program Synthetic SAR benchmarkMulti-processor compact applicationRepresentative of a real application workloadDesigned to be easily scalable and verifiableThe Polymorphous Computing Architectures (PCA), and High Productivity Computing Systems (HPCS) programs have combined work to form a new set of benchmarks called the “High Performance Embedded Computing Challenge Benchmarks.” From the PCA program, eight single-processor kernel-level benchmarks were drawn from a survey of DoD applications. These kernels are representative of operations found in “front-end” signal processing and “back-end” knowledge processing. From the HPCS program, a Synthetic SAR multi-processor application was developed. The application was designed to be easily scalable in terms of its parallelism as well as computation sizes, easily verifiable, and representative of workloads found in real-world applications.
5 Spotlight SAR System Principal performance goal: Throughput Maximize rate of resultsOverlapped IO and computingPrincipal performance goal is throughput (rate at which answers are produced by a supercomputer).Overlapping IO and computing is allowed.Intent of the Compact Application:Scalable – operates on a range of systems, from workstation to petascale computerHigh compute fidelity – representative computations of SAR processingLow physical fidelity – not a full spotlight SAR system (reduces unneeded benchmark complexity)Self-verifyingBenchmark is serial (sequential processing), and must be parallelizable.Intent of Compact App:ScalableHigh Compute FidelitySelf-Verifying
6 SAR System Architecture Front-End Sensor ProcessingScalable Dataand TemplateGeneratorSARImageKernel #1Data Readand ImageFormationTemplateInsertionKernel #2ImageStorageSARImageTemplatesRawSARFilesRawSARFileSARImageFilesGroups ofTemplateFilesTemplateFilesFile IOComputationRaw SARData FilesImageFilesGroups ofTemplateFilesSub-ImageDetectionFilesSARImageFilesThe full system architecture of the SAR system benchmark involves stressful computational and I/O requirements. However, the benchmark can be run in various modes turning on or off I/O or computation. For the purposes of this talk, we’ll be concentrating on the computational components only.TemplateFilesSub-ImageDetectionFilesTemplateFilesKernel #3ImageRetrievalSARImagesKernel #4DetectionValidationHPEC community has traditionally focused on Computation …Detections… but File IO performance is increasingly importantTemplatesBack-End Knowledge Formation
7 Data Generation and Computational Stages SARImageKnowledge FormationFilesRawFileTemplateGroups ofKernel #2StorageDetectionKernel #3RetrievalSub-ImageSensor ProcessingRaw SARData FilesRawSARTemplatesImageTemplateInsertionScalable Dataand TemplateGeneratorKernel #1FormationThere are 4 major computational components of the SAR system benchmark. The components are introduced here and discussed in more detail in the subsequent slides.The Scalable Data Generator produces raw SAR data as well as template images that will later be inserted as targets. The raw data is taken into the Image Formation kernel, converted from the temporal to the spatial domain, and interpolated from a polar to a rectangular swath, to form the desired image. The SAR images and target templates are passed to the Template Insertion block where the template targets are pseudorandomly inserted into the SAR image. Next the SAR image with templates is passed to the Detection kernel where, with simple image differencing, thresholding and small correlations, target detections are found and reported. These detections are passed on to a Validation block which requires 100% object recognition with no false positives.<see animation with three left mouse clicks>ValidationDetectionsKernel #4DetectionSARImageTemplates
8 SAR OverviewRadar captures echo returns from a ‘swath’ on the groundNotional linear FM chirp pulse train, plus two ideally non-overlapping echoes returned from different positions on the swathSummation and scaling of echo returns realizes a challengingly long antenna aperture along the flight pathSynthetic Aperture, LFixed to Broadside. . .The Scalable Synthetic Data Generator produces raw SAR data approximating what would be obtained from a real SAR system. As an airplane flies adjacent to the field of interest or ‘swath,’ pulse trains are transmitted. The echo returns are scaled (to mimic different reflection coefficients at various points on the swath) and time delayed (to mimic different times at which echoes are returned from different points on the swath) and summed. The size of the SAR synthetic aperture is then determined by the distance that the sensor flies while the radar is capturing returns from the ground; this realizes a challengingly long antenna aperture length.To alleviate its coding complexity, the benchmark makes some simplifications of the SAR problem (which are shown in red):Broadside only processing (instead of being able to process different angular looks)Its synthetic aperture is set equal to its swath’s cross-rangeRange,X = 2X0delayed transmitted SAR waveformreflection coefficient scale factor, different for each return from the swathCross-Range, Y = 2Y0received ‘raw’ SAR
9 Scalable Synthetic Data Generator Generates synthetic raw SAR complex dataData size is scalable to enable rigorous testing of high performance computing systemsUser defined scale factor determines the size of images generatedGenerates ‘templates’ that consist of rotated and pixelated capitalized lettersCross-RangeRangeSpotlight SAR ReturnsThe raw complex data generated by the Data Generator is scalable. A user defined scale factor determines the size of the ‘swath’ that an echo return is gathered from, determining the size of the images generated. This allows the user to scale data sizes to better stress high performance systems.Along with the raw complex SAR data, target templates are generated that consist of rotated pixelated capitalized letters. The templates will be passed on so that later they can be inserted as random targets into the raw SAR images; these templates are also passed on to the Detection kernel.
10 Kernel 1 — SAR Image Formation Spatial Frequency Domain Interpolations(w,ku)f(x,y)F(kx,ky)Interpolationkx = sqrt(4k2 –ku2)ky = kuMatchedFilteringFourierTransform(t,u)B(w,ku)InverseFourier Transform(kx,ky) B (x,y)s*0(w,ku)s(t,u)Received Samples Fit a Polar SwathProcessed Samples Fit a Rectangular SwathfokxkySpotlight SAR ReconstructionCross-Range, PixelsThe SAR image formation step, kernel 1, consists of:fast Fourier transform (FFT) into the frequency domain - to ease processing further down the processing chainpulse compression (also called matched filtering) – to remove the transmitted waveform’s spectral componentsspatial frequency domain interpolation – to change the swath’s representation from a polar to rectangular coordinate systemand two-dimensional inverse fast Fourier transform (IFFT) – to produce an observable spatial-domain imageThe synthesized SAR image displays a grid pattern of lobes corresponding to what would have been the placement of reflectors on the swath.Range, Pixels
11 Template Insertion (untimed) Inserts rotated pixelated capital letter templates into each SAR imageNon-overlapping locations and rotationsRandomly selects 50%Used as ideal detection targets in Kernel 4If inserted with%100 TemplatesImage only inserted with%50 random TemplatesThe template insertion step inserts pixelated rotated capital letter templates into the SAR image: these letters are the “targets” that will later be detected.Templates are placed at and non-overlapping locations and rotations (to later forgo image alignment problems), and in the valleys of the SAR lobes.Template magnitude is set to the average power of the original raw SAR data (well below the magnitude of the SAR peaks). [To make them visible, template magnitude is shown here is exaggerated.]Y PixelsY PixelsX PixelsX Pixels
12 Thresholded Difference Kernel 4 — DetectionDetects targets in SAR imagesImage differenceThresholdSub-regionsCorrelate with every template max is target IDComputationally difficultMany small correlations over random pieces of a large image100% recognition no false alarmsImage AImage DifferenceThresholded DifferenceSub-regionThe detection kernel compares two images to detect changes that represent moving targets.Two images are differenced, effectively removing the SAR components.1. If SAR reflector peaks are not formed (e.g.: under focused or blurred), templates may be irrecoverably buried in the blurred SAR image.2. If SAR reflector peaks are not well formed (floor-to-peak power ratio, and placement) part/all of a template could be irrecoverably buried under a SAR lobe (so it becomes unidentifiable).Image is threshold, to distinguish newly visible targets from targets that moved out of the picture and thus ceased to be of interest.Image is broken-up into sub-regions; each region is correlated against each template; max correlation decides the target’s ID.Requirement: Throughput cannot be meaningfully measured until 100% recognition and no false alarms is achieved.Image B
13 Benchmark Summary and Computational Challenges Front-EndSensor ProcessingBack-EndKnowledge FormationRawSARTemplatesImageTemplateInsertionScalable Dataand TemplateGeneratorKernel #1FormationValidationDetectionsKernel #4DetectionSARImageTemplatesSAR Image Formation, has large scale parallel two-dimensional (2D) Inverse Fast Fourier Transform (IFFT); may require a ‘corner turn’ or a ‘gather scatter’ (depending on architecture), with large quantities of data. Polar interpolation is known to be even more computationally intense than IFFT.Streaming image data storage to an I/O device (write) may involve large block data transfers, storing one large image after another (Kernel 2)Random location image sequence retrieval from an I/O device (read) also involving large quantities of data, with stressful memory access patterns, and locality issues (Kernel 3)Kernel 4, detection, involves performing many small correlations on random pieces of large images.Scalable synthetic data generationPulse compressionPolar InterpolationFFT, IFFT (corner turn)Sequential storeNon-sequentialretrieveLarge & small IOLarge Imagesdifference &ThresholdMany smallcorrelations onrandom pieces oflarge image
14 Outline Introduction Kernel Level Benchmarks SAR Benchmark Release InformationSummary
15 HPEC Challenge Benchmark Release Future site of documentation and softwareInitial release is available to PCA, HPCS, and HPEC SI program members through respective program web pagesDocumentationANSI C Kernel BenchmarksSingle processor MATLAB SAR System BenchmarkComplete release will be made available to the public in first quarter of CY06The HPEC challenge benchmarks will be made available to the public and released on the web site listed in the first quarter of calendar year In the meantime, a preliminary release of the benchmarks will be made through respective program web pages for PCA, HPCS, and HPEC-SI members. This release will include documentation, source code for the ANSI C Kernel Benchmarks, and a single processor Matlab SAR system benchmark.
16 SummaryThe HPEC Challenge is a publicly available suite of benchmarks for the embedded spaceRepresentative of a wide variety of DoD applicationsBenchmarks stress computation, communication and I/OBenchmarks are provided at multiple levelsKernel: small enough to easily understand and optimizeCompact application: representative of real workloadsSingle-processor and multi-processorFor more information, see