3 Agenda Discuss overall DSP architecture Lab 4 Recap Clock domainsLab 4 RecapChannel selection filterDesignCoreGen ImplementationDiscuss the tools which will be useful in development / verification of our designUDP streaming of data (as in Lab 4)ISIM / ModelsimChipscope Introduction
4 Radio Receiver Core 25 Million 14-bit numbers / sec Audio/- 3 MHz25 Msps25 Million 14-bit numbers / sec48k samples / second
5 Tuning / Downconversion Given that we have sampled at 25MHz, the input to the signal processing blocks is depicted below2.759.250 (DC)12.5MHzMultiple information channels exist in this signal, a typical job of the DSP blockset would be to“Tune” to the appropriate section of spectrumMix it to basebandFilter out other channelsReduce data rate (decimate)Demodulate
6 Frequency Translation Graphics reprinted from :We are going to accomplish this tunable translation with a DDS for generation of a complex sinusoid, followed by a complex mixer. Details next week.
7 FilteringTwo goals :Remove contributions of signals from outside our channelReduce bandwidth so that data rate becomes manageable25 Msps is not necessary to represent our small channel bandwidth (ex : 150 kHz for FM radio)
8 Lab 5 : Filter Demonstration Implement the channel selection filter which will be used in the SDR25Msps48kHzADCL.P. Filter520FIFOuBlazeEthernet to PCFor analysis
10 FIR ImplementationDelayConvolution in time domain is computationally complex, yet fairly straightforward.
11 Example Filter Design Process Filter Specs:Fs = 25 Msps75kHz passband100k stopbandAttenuate stopband signals by > 60dBFIR Filter with <2048 taps will meet this requirementMatlab : “fdatool”
12 Scaling and Quantizing h[n] Output from FDAtool is floating point coefficients h[NUMTAPS] (aka “impulse response”). In absense of floating point multipliers, this is not directly implementable on our FPGA. We need to transform this impulse response to integers while preserving the function.Numint = int32(Num*ScaleFactor+0.5);Scaling factor,How to choose?Num
13 Quantized Coefficients Overall, making the coefficients integers, (after multiplying by 32700) doesn’t affect our response too badly. With this scaling factor, our coefficient width is really only 9 bits. Some optimization between coefficient width and filter order could be undertaken if we chose.
15 Scaling and Quantizing h[n] Numint = int32(Num* );fid=fopen('filt2030.coe','w');fprintf(fid,'radix=10;\ncoefdata= \n');fprintf(fid,'%d,\n',NumInt);fclose(fid);radix=10;coefdata=15,-1,…etcScaling factor,Choose this for a reasonablePerformance vs. utilization tradeoffResult is : impulse response of filter in a “coe” file, which we will use later when designing the filter. Note that filter gain has changed though over the unity gain filter we designed in Matlab. Now, signals in the passband will come out x32700 over the input level.
16 FIR ImplementationDelayHow many Multiply-Accumulate operations (MAC) are required per sample to implement the filter we designed?What is the overall rate of MACs per second?
18 Tool takes as input the “COE” file created previously to get h[n]
19 Implementation details show what we computed previously with a savings Due to symmetrical structure
20 As an example for high performance FPGA capabilities – consider Virtex 6. DSP48E slices run up to 600MHz clock ratesTheoretical :172 GMACs / sec – 1.2 TMACs/sec
21 If we let filter run at higher clock rate than input sample rate, structure Is automatically adapted such that a convolution takes multiple clocks usingShared multipliers
22 Spartan 6 LX45 DSP 58 * 390M = max 22.6 GMACs / sec Our FPGA is capable of doing about ½ of what we wantThere will be other features of the FPGA that need multipliers…These figures should help put FPGA capabilities for DSP in perspective
23 Allowing the filter to use the entire time between output samples for the covolution makes the task easily achievable (even with a slower clock to the Filter)
24 Decimating FIR Filter Core (“AXI-Stream Interface”) COMPONENT channel_selectorPORT (aclk : IN STD_LOGIC;s_axis_data_tvalid : IN STD_LOGIC;s_axis_data_tready : OUT STD_LOGIC;s_axis_data_tdata : IN STD_LOGIC_VECTOR(15 DOWNTO 0);m_axis_data_tvalid : OUT STD_LOGIC;m_axis_data_tdata : OUT STD_LOGIC_VECTOR(31 DOWNTO 0));END COMPONENT;S_axis_data_tvalidm..tvalidChanges with implementation and coeffs!s_axis_data_tdatam..tdatatreadyvalid : enable signal, result is present on output on rising edge of clock when this signal is highvalid : enable signal, new data is latched on rising edge of clock when this is highDin : data to filterReady : ready for new dataACLK
25 Issues / DecisionsClock domain for filter vs clock domain for A/D samples25, 50, higher?Higher clocks allow fewer multipliers and higher performance, but requires some thought at the clock domainsCoefficient width / scalingLess bits for coefficients will save RAM, but will decrease filter performanceScaling factor and its effect on filter output will need to be understood and compensated for.. Full scale input (14 bits) should generally map to full scale output (16 bits)Evaluate the various output rounding / precision options in the FIR filter block to decide how to achieve this objective. Simulate if confused!Full precision output is straightforward, output = input * scaling factorFIFO sizeNot super important to change yet; but with reduced data rate, you can now stream UDP data continuously and use a much smaller FIFO
26 Development Matlab Simulation Use this to solve all the real DSP issuesMixer frequency for tuningFilter number of taps / coefficientsScaling issuesFAST, easy to change
27 Development Modelsim / VHDL Simulation Is useful to prove functionality of your design in known situationsSometimes difficult to fully model real worldSimulations of individual pieces (particularly those which you did not write) can be very informativeEven more so than documentationCOREGEN cores easily simulated
28 SOC Debugging Premise : Full simulation often impractical Visibility of internal signals is helpful to thoroughly debug / verify a designEven external signals can be difficult to probe on high density boardsTo observe functionality of your system as it interacts with an unpredictable real world is crucial
29 SDR Debugging Build proven reliable datapipes first: i.e. your UDP or serial portBuild in the ability to send pieces of data from various points in the system out to be observed.Ethernet data pipe developed in lab 4 can be used to grab data from different points in your signal processing chainSimply provide a means for different things to be written into the FSL input FIFO.
30 Chipscope ProInternal FPGA Logic Resources are used to capture internal signals / eventsData is read out via JTAG cableEssentially a logic analyzer inside the FPGAFPGA resource limited
31 Example of Logic Analyzer view while system is running Example of Logic Analyzer view while system is running. Real data from target
42 NotesICON core uses a “BSCAN” resource much like the Microblaze MDM DebuggerSpartan 3A DSP has only 1!Effort beyond the scope of this demo is required to get both working concurrentlyOnline description will followSystem without Microblaze, or without debuggable Microblaze is the easiest to experiment with first