Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.

Kathy Grimes

Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog Repeatability Tolerances Difficulty storing information or implementing certain operations Leads us to DSP…

Represent signals by sequences of numbers Pros Repeatable Accuracy can be controlled Time-varying operations are easier to implement Cons Sampling cause loss of information Round-off errors A/D and D/A mixed-signal hardware

Analog to Digital Converter Continuous to Discrete time signal 11.1 shows the sampling of a signal Common Signals Step Discontinuity (Figure 11.2) Impulse (Figure 11.3) FIGURE 11.1 Discrete Time Signals. FIGURE 11.2 Step Function. FIGURE 11.3 Impulse Function.

Based off of three basic functions: Delay Add Multiply Raw Performance for DSP algorithm is usually by # of ops needed to execute FIGURE 11.4 Add Function. FIGURE 11.5 Multiply Function. FIGURE 11.6 Delay Function.

These two systems in combination can be used to develop any discrete difference equation FIGURE 11.7 Feedforward System. FIGURE 11.8 Feedback System.

Floating-Point DSP perform Integer Operation Dynamic operating range Fixed-Point DSP perform Integer and Floating Operation Fixed range – 16 bit = 65536 max range Analog world signals = infinite precision Floating-point mimic the “infinite” range better Easier to implement, avoids rounding and overflow errors Why not always use Floating-point? Cost, Availability, Price, and Performance Precision  Floating Point is good for smaller values but is poorer at larger values using same number of bits

SIMD Microarchitecture and Instructions One clock cycle for 4 data x(1 instruction)x 1 value Increase of performance for low-level DSP functions (MAC) FIGURE 11.10 SIMD Instruction.

Processor Clockspeed Cache size Usually DSP architectures manually partition the memory space in order to reduce number of accesses to external memory Latency = costly in terms of time and resources Intel architectures have large amounts of cache and can overcome the fast/slow memory, however, all memory starts in “far” caches Output data should be generated sequentially  Accessing memory in a scattered pattern (while using threads) should be avoided

Intrinsic Vectorization Intel Performance Primitives

C code that calls special built-in compiler capabilities that map closely to underlying SSE instruction set Added Data Types _m64, _m128, _m128d, _m128i Intrinsic Operation Types Arithmetic (fixed- and floating-point) Shift Logical Compare Set Shuffle Concatenation Adds four FP values packed into a and b and performs four additions in one instruction

Use compiler to apply vectorization techniques to loops within data processing iteration  looks for opportunities to convert loops from single set to vector-based implementation (so that multiple operands can be operated at the same time) Like GCC -- >aligned with SIMD instruction set Use #pragma directives to guide compiler to avoid overheads such as data dependces Listing 11.4 Explicitly Don’t Vectorize Loop. Listing 11.7 Memory Alignment Property and Discarding Assumed Data Dependences.

Comparisons on Performance This performance would be vastly different if the memory was not already aligned

Intel Libraries – highly optimized implementations for many different applications (include audio codecs, image processing, data compression, etc…) Libraries take full advantage of CPU and SIMD (and most are written for performance) Libraries are threaded and can obtain performance gains by parallelizing the algorithm Libraries that take advantage are: Signal Processing – Convolution and correlation, Finite impulse response (FIR) filter, FIR coefficints generation function, Infinite response filter (IIR), Transforms Image Processing Small Matrices and Realistic Rendering Cryptography

FIR filter equation Y[n] = a.x[n] + b.x[n-1] + c.x[n-2] Listing 11.8 FIR Filter C Code Example Listing 11.9 FIR Using Intel Performance Primitives.

Loop Unrolling to get rid of data dependences By changing the data elements, we can reduce the number of times we need to read data

Computation intensive Needs a significant amount of embedded computational performance Same basic algorithmic pattern  even though physical configurations, parameters, and functionality are different Beam forming Envelope Extraction Polar-to-Cartesian coordinate translation

FIGURE 11.12 Block Diagram of a Typical Ultrasound Imaging Application.

FIGURE 11.15 Block Diagram of the Envelope Detector.

FIGURE 11.16 Polar-to-Cartesian Conversion of a Hypothetically Scanned Rectangular Object. Listing 11.11 Code Sample for Envelope Detector.

Why such a large difference?

Digital Signal Processing in general-purpose processors Extend Processing Capabilities Simplifies overall application when platforms require Control, Communications, and General-purpose processing w/DSP Many ways to improve an Intel system by implementing special C code, vectorization, and specific libraries Performance is greatly enhanced when DSP is implemented properly

Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.

Similar presentations

Presentation on theme: "Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.

Similar presentations

Presentation on theme: "Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog."— Presentation transcript:

Similar presentations

About project

Feedback