Selected MaxCompiler Examples

Slides:



Advertisements
Similar presentations
Simplifications of Context-Free Grammars
Advertisements

1
Basic Java Constructs and Data Types – Nuts and Bolts
Chapter 7 Constructors and Other Tools. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-2 Learning Objectives Constructors Definitions.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Chapter 5: Repetition and Loop Statements Problem Solving & Program.
10 Copyright © 2005, Oracle. All rights reserved. Reusing Code with Inheritance and Polymorphism.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Chapter 5 Input/Output 5.1 Principles of I/O hardware
Excel Functions. Part 1. Introduction 2 An Excel function is a formula or a procedure that is performed in the Visual Basic environment, outside the.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
1 Chapter 10 - Structures, Unions, Bit Manipulations, and Enumerations Outline 10.1Introduction 10.2Structure Definitions 10.3Initializing Structures 10.4Accessing.
Break Time Remaining 10:00.
Turing Machines.
PP Test Review Sections 6-1 to 6-6
EU market situation for eggs and poultry Management Committee 20 October 2011.
Semantic Analysis and Symbol Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
2000 Prentice Hall, Inc. All rights reserved. 1 Chapter 10 - Structures, Unions, Bit Manipulations, and Enumerations Outline 10.1Introduction 10.2Structure.
VOORBLAD.
1 public class Newton { public static double sqrt(double c) { double epsilon = 1E-15; if (c < 0) return Double.NaN; double t = c; while (Math.abs(t - c/t)
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
While Loop Lesson CS1313 Spring while Loop Outline 1.while Loop Outline 2.while Loop Example #1 3.while Loop Example #2 4.while Loop Example #3.
 2000 Prentice Hall, Inc. All rights reserved. Chapter 14 - Advanced C Topics Outline 14.1Introduction 14.2Redirecting Input/Output on UNIX and DOS Systems.
1 Variables and Data Types. 2 Variable Definition a location in memory, referenced by a name (identifier), where data of a given type can be stored, changed,
1 of 31 Images from Africa. 2 of 31 My little Haitian friend Antoine (1985)
Note to the teacher: Was 28. A. to B. you C. said D. on Note to the teacher: Make this slide correct answer be C and sound to be “said”. to said you on.
Chapter 5 Loops Liang, Introduction to Java Programming, Tenth Edition, (c) 2015 Pearson Education, Inc. All rights reserved.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Datorteknik TopologicalSort bild 1 To verify the structure Easy to hook together combinationals and flip-flops Harder to make it do what you want.
Analyzing Genes and Genomes
Types of selection structures
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Datorteknik TopologicalSort bild 1 To verify the structure Easy to hook together combinationals and flip-flops Harder to make it do what you want.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Energy Generation in Mitochondria and Chlorplasts
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Dataflow Programming with MaxCompiler.
User Defined Functions Lesson 1 CS1313 Fall User Defined Functions 1 Outline 1.User Defined Functions 1 Outline 2.Standard Library Not Enough #1.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Loops and cyclic graphs.
Sasa Stojanovic Veljko Milutinovic
Sasa Stojanovic Zivojin Sustran
Sasa Stojanovic Veljko Milutinovic
Sasa Stojanovic Zivojin Sustran
Sasa Stojanovic Veljko Milutinovic
Sasa Stojanovic 2/x  One has to know how to program Maxeler machines, in order to get the best possible speedup out of them!  For some.
Presentation transcript:

Selected MaxCompiler Examples Sasa Stojanovic stojsasa@etf.rs

Content E#1: Hello world E#2: Vector addition E#3: Type mixing E#4: Addition of a constant and a vector E#5: Input/output control E#6: Conditional execution E#7: Moving average 1D E#8: Moving average 2D E#9: Array summation E#10: Optimization of E#9

Example No.1: Hello World! Write a program that sends the “Hello World!” string to the MAX2 card, for the MAX2 card kernel to return it back to the host. To be learned through this example: How to make the configuration of the accelerator (MAX2 card) using Java: How to make a simple kernel (ops description) using Java (the only language), How to write the standard manager (config description based on kernel(s)) using Java, How to test the kernel using a test (code+data) written in Java, How to compile the Java code for MAX2, How to write a simple C code that runs on the host and triggers the kernel, How to write the C code that streams data to the kernel, How to write the C code that accepts data from the kernel, How to simulate and execute an application program in C that runs on the host and periodically calls the accelerator.

Standard Files in a MAX Project Example No. 1 Standard Files in a MAX Project One or more kernel files, to define operations of the application: <app_name>Kernel[<additional_name>].java One (or more) Java file, for simulation of the kernel(s): <app_name>SimRunner.java One manager file for transforming the kernel(s) into the configuration of the MAX card (instantiation and connection of kernels): <app_name>Manager.java Simulator builder: <app_name>HostSimBuilder.java Hardware builder: <app_name>HWBuilder.java Application code that uses the MAX card accelerator: <app_name>HostCode.c Makefile A script file that defines the compilation related commands

example1Kernel.java Example No. 1 package ind.z1; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; public class helloKernel extends Kernel { public helloKernel(KernelParameters parameters) { super(parameters); // Input: HWVar x = io.input("x", hwInt(8)); HWVar result = x; // Output: io.output("z", result, hwInt(8)); } It is possible to substitute the last three lines with: io.output("z", result, hwInt(8));

example1SimRunner.java Example No. 1 package ind.z1; import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager; public class helloSimRunner { public static void main(String[] args) { SimulationManager m = new SimulationManager(“helloSim"); helloKernel k = new helloKernel( m.makeKernelParameters() ); m.setKernel(k); m.setInputData("x", 1, 2, 3, 4, 5, 6, 7, 8); m.setKernelCycles(8); m.runTest(); m.dumpOutput(); double expectedOutput[] = { 1, 2, 3, 4, 5, 6, 7, 8 }; m.checkOutputData("z", expectedOutput); m.logMsg("Test passed OK!"); }

example1HostSimBuilder.java Example No. 1 package ind.z1; import static config.BoardModel.BOARDMODEL; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.managers.standard.Manager; import com.maxeler.maxcompiler.v1.managers.standard.Manager.IOType; public class helloHostSimBuilder { public static void main(String[] args) { Manager m = new Manager(true,”helloHostSim", BOARDMODEL); Kernel k = new helloKernel(m.makeKernelParameters(“helloKernel")); m.setKernel(k); m.setIO(IOType.ALL_PCIE); m.build(); }

example1HwBuilder.java Example No. 1 package ind.z1; import static config.BoardModel.BOARDMODEL; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.managers.standard.Manager; import com.maxeler.maxcompiler.v1.managers.standard.Manager.IOType; public class helloHWBuilder { public static void main(String[] args) { Manager m = new Manager(“hello", BOARDMODEL); Kernel k = new helloKernel( m.makeKernelParameters() ); m.setKernel(k); m.setIO(IOType.ALL_PCIE); m.build(); }

example1HostCode.c 1/2 Example No. 1 #include <stdio.h> #include <MaxCompilerRT.h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0"); max_maxfile_t* maxfile; max_device_handle_t* device; char data_in1[16] = "Hello world!"; char data_out[16]; printf("Opening and configuring FPGA.\n"); maxfile = max_maxfile_init_hello(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device);

Example No. 1 example1HostCode.c 2/2 printf("Streaming data to/from FPGA...\n"); max_run(device, max_input("x", data_in1, 16 * sizeof(char)), max_output("z", data_out, 16 * sizeof(char)), max_runfor(“helloKernel", 16), max_end()); printf("Checking data read from FPGA.\n"); max_close_device(device); max_destroy(maxfile); return 0; }

Example No. 1 Makefile # Root of the project directory tree BASEDIR=../../.. # Java package name PACKAGE=ind/z1 # Application name APP=example1 # Names of your maxfiles HWMAXFILE=$(APP).max HOSTSIMMAXFILE=$(APP)HostSim.max # Java application builders HWBUILDER=$(APP)HWBuilder.java HOSTSIMBUILDER=$(APP)HostSimBuilder.java SIMRUNNER=$(APP)SimRunner.java # C host code HOSTCODE=$(APP)HostCode.c # Target board BOARD_MODEL=23312 # Include the master makefile.include nullstring := space := $(nullstring) # comment MAXCOMPILERDIR_QUOTE:=$(subst $(space),\ ,$(MAXCOMPILERDIR)) include $(MAXCOMPILERDIR_QUOTE)/examples/common/Makefile.include

Example No. 1 BoardModel.java package config; import com.maxeler.maxcompiler.v1.managers.MAX2BoardModel; public class BoardModel { public static final MAX2BoardModel BOARDMODEL = MAX2BoardModel.MAX2336B; }

Types Hardware Types

Hardware Primitive Types Floating point numbers - HWFloat: hwFloat(exponent_bits, mantissa_bits); float ~ hwFloat(8,24) double ~ hwFloat(11,53) Fixed point numbers - HWFix: hwFix(integer_bits, fractional_bits, sign_mode) SignMode.UNSIGNED SignMode.TWOSCOMPLEMENT Integers - HWFix: hwInt(bits) ~ hwFix(bits, 0, SignMode.TWOSCOMPLEMENT) Unsigned integers - HWFix: hwUint(bits) ~ hwFix(bits, 0, SignMode.UNSIGNED) Boolean – HWFix: hwBool() ~ hwFix(1, 0, SignMode.UNSIGNED) 1 ~ true 2 ~ false Raw bits – HWRawBits: hwRawBits(width)

Example No. 2: Vector Addition Write a program that adds two arrays of floating point numbers. Program reads the size of arrays, makes two arrays with an arbitrary content (test inputs), and adds them using a MAX card.

example2Kernel.Java Example No. 2 package ind.z2; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; public class example2Kernel extends Kernel { public example2Kernel(KernelParameters parameters) { super(parameters); // Input HWVar x = io.input("x", hwFloat(8,24)); HWVar y = io.input("y", hwFloat(8,24)); HWVar result = x + y; // Output io.output("z", result, hwFloat(8,24)); }

example2SimRunner.java Example No. 2 package ind.z2; import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager; public class example2SimRunner { public static void main(String[] args) { SimulationManager m = new SimulationManager("example2Sim"); example2Kernel k = new example2Kernel( m.makeKernelParameters() ); m.setKernel(k); m.setInputData("x", 1, 2, 3, 4, 5, 6, 7, 8); m.setInputData("y", 2, 3, 4, 5, 6, 7, 8, 9); m.setKernelCycles(8); m.runTest(); m.dumpOutput(); double expectedOutput[] = { 3, 5, 7, 9, 11, 13, 15, 17 }; m.checkOutputData("z", expectedOutput); m.logMsg("Test passed OK!"); }

example2HostSimBuilder.java Example No. 2 package ind.z2; import static config.BoardModel.BOARDMODEL; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.managers.standard.Manager; import com.maxeler.maxcompiler.v1.managers.standard.Manager.IOType; public class example2HostSimBuilder { public static void main(String[] args) { Manager m = new Manager(true,"example2HostSim", BOARDMODEL); Kernel k = new example2Kernel( m.makeKernelParameters("example2Kernel") ); m.setKernel(k); m.setIO(IOType.ALL_PCIE); m.build(); }

example2HWBuilder.java Example No. 2 package ind.z2; import static config.BoardModel.BOARDMODEL; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.managers.standard.Manager; import com.maxeler.maxcompiler.v1.managers.standard.Manager.IOType; public class example2HWBuilder { public static void main(String[] args) { Manager m = new Manager("example2", BOARDMODEL); Kernel k = new example2Kernel( m.makeKernelParameters() ); m.setKernel(k); m.setIO(IOType.ALL_PCIE); m.build(); }

example2HostCode.c 1/2 Example No. 2 #include <stdio.h> #include <stdlib.h> #include <MaxCompilerRT.h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in1, *data_in2, *data_out; unsigned long N, i; printf("Enter size of array: "); scanf("%lu",&N); data_in1 = malloc(N * sizeof(float)); data_in2 = malloc(N * sizeof(float)); data_out = malloc(N * sizeof(float)); for(i = 0; i < N; i++){ data_in1[i] = i%10; data_in2[i] = i%3; } printf("Opening and configuring FPGA.\n");

example2HostCode.c 2/2 Example No. 2 maxfile = max_maxfile_init_example2(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); printf("Streaming data to/from FPGA...\n"); max_run(device, max_input("x", data_in1, N * sizeof(float)), max_input("y", data_in2, N * sizeof(float)), max_output("z", data_out, N * sizeof(float)), max_runfor("example2Kernel", N), max_end()); printf("Checking data read from FPGA.\n"); for(i = 0; i < N; i++) if (data_out[i] != i%10 + i%3){ printf("Error on element %d. Expected %f, but found %f.", i, (float)(i%10+i%3), data_out[i]); break; } max_close_device(device); max_destroy(maxfile); return 0;

Example No. 3: Type Mixing Do the same as in the example no 2, with the following modification: one input array contains floating point numbers, and the other one contains integers.

Type Conversion Example No. 3 Casting here means moving data from one form to another, without changing their essence. Type is: specified for inputs and outputs, propagated from inputs, down the dataflow graph to outputs, used to check that output stream has correct type. If conversion is needed, explicit conversion (cast) is required How to do it? use the method cast in class HWVar, Additional hardware required (especially for conversion to or from floating point numbers), introduces additional latency. Cast between a floating point number and an integer number is done by rounding to the nearest integer!

example3Kernel.Java Example No. 3 package ind.z3; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; public class example3Kernel extends Kernel { public example3Kernel(KernelParameters parameters) { super(parameters); // Input HWVar x = io.input("x", hwFloat(8,24)); HWVar y = io.input("y", hwInt(32)); HWVar result = x + y.cast(hwFloat(8,24)); // Output io.output("z", result, hwFloat(8,24)); }

example3SimRunner.java Example No. 3 package ind.z3; import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager; public class example3SimRunner { public static void main(String[] args) { SimulationManager m = new SimulationManager("example3Sim"); example3Kernel k = new example3Kernel( m.makeKernelParameters() ); m.setKernel(k); m.setInputData("x", 1, 2, 3, 4, 5, 6, 7, 8); m.setInputData("y", 2, 3, 4, 5, 6, 7, 8, 9); m.setKernelCycles(8); m.runTest(); m.dumpOutput(); double expectedOutput[] = { 3, 5, 7, 9, 11, 13, 15, 17 }; m.checkOutputData("z", expectedOutput); m.logMsg("Test passed OK!"); }

example3HostCode.c 1/2 Example No. 3 #include <stdio.h> #include <stdlib.h> #include <MaxCompilerRT.h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in1, *data_out; int *data_in2; unsigned long N, i; printf("Enter size of array: "); scanf("%lu",&N); data_in1 = malloc(N * sizeof(float)); data_in2 = malloc(N * sizeof(int)); data_out = malloc(N * sizeof(float)); for(i = 0; i < N; i++){ data_in1[i] = i%10; data_in2[i] = i%3; } printf("Opening and configuring FPGA.\n");

example3HostCode.c 2/2 Example No. 3 maxfile = max_maxfile_init_example3(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); printf("Streaming data to/from FPGA...\n"); max_run(device, max_input("x", data_in1, N * sizeof(float)), max_input("y", data_in2, N * sizeof(int)), max_output("z", data_out, N * sizeof(float)), max_runfor("example3Kernel", N), max_end()); printf("Checking data read from FPGA.\n"); for(i = 0; i < N; i++){ if (data_out[i] != i%10 + i%3){ printf("Error on element %d. Expected %f, but found %f.", i, (float)(i%10+i%3), data_out[i]); break; } max_close_device(device); max_destroy(maxfile); return 0;

Generating Graph Command: maxRenderGraphs <build_dir> <build_dir> - directory where the design is compiled In the virtual machine, directory “Desktop/MaxCompiler-Builds” contains the build directories. Example for application “example2”: maxRenderGraphs example2HostSim Renders graphs for the resulting max file

Final Kernel Graph for Example No 2 Generating Graph Final Kernel Graph for Example No 2

Final Kernel Graph for Example No 3 Generating Graph Final Kernel Graph for Example No 3

Example No. 4: Addition of a Constant and a Vector Write a program that adds a constant to an array that contains floating point numbers. Program: reads the size of the array and the constant that will add to elements of the array, makes one array in an arbitrary way, and adds the constant to the array using the MAX card.

Example4Kernel.java Example No. 4 package ind.z4; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; public class example4Kernel extends Kernel { public example4Kernel(KernelParameters parameters) { super(parameters); // Input HWVar x = io.input("x", hwFloat(8,24)); HWVar y = io.scalarInput("y", hwFloat(8,24)); HWVar result = x + y; // Output io.output("z", result, hwFloat(8,24)); }

Other Modifications in Example 4 Example No. 4 Other Modifications in Example 4 example4SimRunner.java: Before the kernel run, invoke: setScalarInput(“y”,2); example4HostCode.c: Read const from standard input, After the device is opened, but before run, set scalar inputs: max_set_scalar_input_f(device, “example4Kernel.y”, const_add, FPGA_A); max_upload_runtime_params(device, FPGA_A);

Example No. 5: Input/Output Control Do the same as in example no 4, with the following modification: use controlled inputs and counters.

example5Kernel.java Example No. 5 package ind.z5; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; public class example5Kernel extends Kernel { public example5Kernel(KernelParameters parameters) { super(parameters); HWVar ie = control.count.simpleCounter(32); // Input HWVar x = io.input("x", hwFloat(8,24)); HWVar y = io.input("y", hwFloat(8,24), ie.eq(0)); HWVar result = x + y; // Output io.output("z", result, hwFloat(8,24)); }

example5SimRunner.java Example No. 5 package ind.z5; import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager; public class example5SimRunner { public static void main(String[] args) { SimulationManager m = new SimulationManager("example5Sim"); example5Kernel k = new example5Kernel( m.makeKernelParameters() ); m.setKernel(k); m.setInputData("x", 1, 2, 3, 4, 5, 6, 7, 8); m.setInputData("y", 2); m.setKernelCycles(8); m.runTest(); m.dumpOutput(); double expectedOutput[] = { 3, 4, 5, 6, 7, 8, 9, 10 }; m.checkOutputData("z", expectedOutput); m.logMsg("Test passed OK!"); }

example5HostCode.c 1/2 Example No. 5 #include <stdio.h> #include <stdlib.h> #include <MaxCompilerRT.h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in1, data_in2[2], *data_out; unsigned long N, i; printf("Enter size of array: "); scanf("%lu%f",&N, data_in2); data_in1 = malloc(N * sizeof(float)); data_out = malloc(N * sizeof(float)); for(i = 0; i < N; i++) data_in1[i] = i%10; printf("Opening and configuring FPGA.\n"); maxfile = max_maxfile_init_example5(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device);

example5HostCode.c 1/2 Example No. 5 printf("Streaming data to/from FPGA...\n"); max_run(device, max_input("x", data_in1, N * sizeof(float)), max_input("y", data_in2, 2 * sizeof(float)), max_output("z", data_out, N * sizeof(float)), max_runfor("example5Kernel", N), max_end()); printf("Checking data read from FPGA.\n"); for(i = 0; i < N; i++){ if (data_out[i] != i%10 + data_in2[0]){ printf("Error on element %d. Expected %f, but found %f.", i, (float)(i%10+data_in2[0]), data_out[i]); break; } max_close_device(device); max_destroy(maxfile); return 0;

Example No. 6: Conditional Execution Translate the following part of code for the Maxeler MAX2 card: for(int i=0; i<N; i++) if(a[i] != b[i]){ c[i] = b[i]-a[i]; d[i] = a[i]*b[i]/c[i]; }else { c[i] = a[i]; d[i] = a[i]+b[i]; }

example6Kernel.java Example No. 6 package ind.z6; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; public class example6Kernel extends Kernel { public example6Kernel(KernelParameters parameters) { super(parameters); // Input HWVar a = io.input("a", hwFloat(8,24)); HWVar b = io.input("b", hwFloat(8,24)); HWVar c = ~a.eq(b)?b-a:a; HWVar d = ~a.eq(b)?a*b/c:a+b; // Output io.output("c", c, hwFloat(8,24)); io.output("d", d, hwFloat(8,24)); }

example6SimRunner.java Example No. 6 package ind.z6; import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager; public class example6SimRunner { public static void main(String[] args) { SimulationManager m = new SimulationManager("example6Sim"); example6Kernel k = new example6Kernel( m.makeKernelParameters() ); m.setKernel(k); m.setInputData("a", 1, 3); m.setInputData("b", 2, 3); m.setKernelCycles(2); m.runTest(); m.dumpOutput(); double expectedOutputc[] = { 1, 3 }; double expectedOutputd[] = { 2, 6 }; m.checkOutputData("c", expectedOutputc); m.checkOutputData("d", expectedOutputd); m.logMsg("Test passed OK!"); }

Example No. 7: Moving Average 1D Write a program that calculates moving average over an array, calculating the average value for each one of the three successive elements of the input array. (a[0]+a[1])/2 , for i = 0; avg[i] = (a[i-1]+a[i]+a[i+1])/3 , for 0 < i < n-1; (a[n-2]+a[n-3], for i = n-1.

example7Kernel.java Example No. 7 package ind.z7; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; public class example7Kernel extends Kernel { public example7Kernel(KernelParameters parameters) { super(parameters); HWVar N = io.scalarInput("N", hwUInt(64)); HWVar count = control.count.simpleCounter(64); // Input HWVar x = io.input("x", hwFloat(8,24)); HWVar result = ( (count>0?stream.offset(x,-1):0) + x + (count<N-1?stream.offset(x,1):0) )/ (count>0&count<N-1? constant.var(hwFloat(8,24),3):2); // Output io.output("z", result, hwFloat(8,24)); }

Example No. 8: Moving Average 2D Write a program that calculates moving average along a 2D matrix of the size MxN. Transfer the matrix to the MAX2 card through one stream, row by row.

example8Kernel.java 1/2 Example No. 8 package ind.z8; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.stdlib.core.CounterChain; import com.maxeler.maxcompiler.v1.kernelcompiler.stdlib.core.Stream.OffsetExpr; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; public class example8Kernel extends Kernel { public example8Kernel(KernelParameters parameters) { super(parameters); HWVar M = io.scalarInput("M", hwUInt(32)); OffsetExpr Nof = stream.makeOffsetParam("Nof", 3, 128); HWVar N = io.scalarInput("N", hwUInt(32)); CounterChain cc = control.count.makeCounterChain(); HWVar j = cc.addCounter(M,1); HWVar i = cc.addCounter(N,1);

example8Kernel.java 2/2 Example No. 8 // Input HWVar mat = io.input("mat", hwFloat(8,24)); // Extract 8 point window around current point HWVar window[] = new HWVar[9]; int ii = 0; for ( int x=-1; x<=1; x++) for ( int y= -1; y<=1; y++) window[ii++] = (i.cast(hwInt(33))+x>=0 & i.cast(hwInt(33))+x<= N.cast(hwInt(33))-1 & j.cast(hwInt(33))+y >= 0 & j.cast(hwInt(33))+y<=M.cast(hwInt(33))-1)?stream.offset(mat, y*Nof+x):0; // Sum points in window and divide by 9 to average HWVar sum = constant.var(hwFloat(8, 24), 0); for ( HWVar hwVar : window) { sum = sum + hwVar; } HWVar divider = i.eq(0)|i.eq(N-1)|j.eq(0)|j.eq(M-1)?((i.eq(0)|i.eq(N-1))&(j.eq(0)|j.eq(M- 1))?constant.var(hwFloat(8,24),4):6):9; HWVar result = sum / divider; // Output io.output("z", result, hwFloat(8,24));

example8SimRunner.java Example No. 8 package ind.z8; import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager; public class example8SimRunner { public static void main(String[] args) { SimulationManager m = new SimulationManager("example8Sim"); example8Kernel k = new example8Kernel( m.makeKernelParameters() ); m.setKernel(k); m.setInputData("mat", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11,12, 13,14,15,16); m.setScalarInput("M", 4); m.setScalarInput("N", 4); m.setStreamOffsetParam("Nof",4); m.setKernelCycles(16); m.runTest(); m.dumpOutput(); double expectedOutput[] = { 3.5, 4, 5, 5.5, 5.5, 6, 7, 7.5, 9.5, 10, 11, 11.5, 11.5, 12, 13, 13.5 }; m.checkOutputData("z", expectedOutput); m.logMsg("Test passed OK!"); }

example8HostCode.java 1/2 Example No. 8 example8HostCode.java 1/2 #include <stdio.h> #include <stdlib.h> #include <MaxCompilerRT.h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in1, *data_in2, *data_out; unsigned long M, N, i; printf("Enter size of matrix (MxN, max 1024x1024): "); scanf("%lu%lu",&M,&N); data_in1 = malloc(M*N * sizeof(float)); data_out = malloc(M*N * sizeof(float)); for(i = 0; i < M*N; i++){ data_in1[i] = i%10; } printf("Opening and configuring FPGA.\n"); maxfile = max_maxfile_init_example8(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device);

example8HostCode.java 2/2 Example No. 8 example8HostCode.java 2/2 max_set_scalar_input_f(device, "example8Kernel.M", M, FPGA_A); max_set_scalar_input_f(device, "example8Kernel.N", N, FPGA_A); max_set_runtime_param(device, "example8Kernel.Nof", N); max_upload_runtime_params(device, FPGA_A); printf("Streaming data to/from FPGA...\n"); max_run(device, max_input("mat", data_in1, M*N * sizeof(float)), max_output("z", data_out, M*N * sizeof(float)), max_runfor("example8Kernel", M*N), max_end()); printf("Checking data read from FPGA.\n"); for(i = 0; i < M*N; i++){ float expected=0, divider = 9; for (int ii = -1; ii<2; ii++) for(int jj = -1; jj<2; jj++) expected += i/N+ii>=0 && i/N+ii<M && i%N+jj>=0 && i%N+jj<N ?data_in1[i+ii*N+jj]:0; if (i/N==0 || i/N==M-1) divider = 6; if (i%N==0 || i%N==N-1) divider = divider == 6? 4:6; expected /= divider; if (data_out[i] != expected){ printf("Error on element %d. Expected %f, but found %f.", i, expected, data_out[i]); break; } } max_close_device(device); max_destroy(maxfile); return 0;

Example No. 9: Array summation Write a program that calculates the sum of n floating point numbers.

example9Kernel.java, try #1 Example No. 9 example9Kernel.java, try #1 package ind.z9; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWType; public class example9Kernel extends Kernel { public example9Kernel(KernelParameters parameters) { super(parameters); final HWType scalarType = hwFloat(8,24); HWVar cnt = control.count.simpleCounter(64); // Input HWVar x = io.input("x", hwFloat(8,24)); HWVar sum = scalarType.newInstance(this); HWVar result = x + (cnt>0?sum:0.0); sum <== stream.offset(result, -1); // Output io.output("z", result, hwFloat(8,24)); } Problem?

Graph of Dataflow for Summation Example No. 9 Graph of Dataflow for Summation

example9Kernel.java #2 Example No. 9 Solution: New offset = package ind.z9; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWType; public class example9Kernel extends Kernel { public example9Kernel(KernelParameters parameters) { super(parameters); final HWType scalarType = hwFloat(8,24); HWVar cnt = control.count.simpleCounter(64); // Input HWVar x = io.input("x", hwFloat(8,24)); HWVar sum = scalarType.newInstance(this); HWVar result = x + (cnt>12?sum:0.0); sum <== stream.offset(result, -13); // Output io.output("z", result, hwFloat(8,24)); } Solution: New offset = Depth of pipeline loop

example9SimRunner.java #2 Example No. 9 example9SimRunner.java #2 Still, we need to send 13 times mor data then needed package ind.z9; import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager; public class example9SimRunner { public static void main(String[] args) { SimulationManager m = new SimulationManager("example9Sim"); example9Kernel k = new example9Kernel( m.makeKernelParameters() ); m.setKernel(k); m.setInputData("x", 1, 0, 0, 0, 3 , 0, 0, 0, 9 , 0, 0, 0, 0, 2 , 0, 0, 0, 3 , 0, 0, 0, 3 , 0, 0, 0, 0, 3); m.setKernelCycles(27); m.runTest(); m.dumpOutput(); double expectedOutput[] = { 1, 3, 6 }; m.checkOutputData("z", expectedOutput); m.logMsg("Test passed OK!"); } 12 unnecessarily data 12 unnecessarily data

example9HostCode.c 1/2 #2 Example No. 9 #include <stdio.h> #include <stdlib.h> #include <MaxCompilerRT.h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in1, *data_out, expected = 0; unsigned long N, i; printf("Enter size of array: "); scanf("%lu",&N); data_in1 = malloc(N * 13 * sizeof(float)); data_out = malloc(N * 13 * sizeof(float)); for(i = 0; i < N; i++) for( int j=0; j<13; j++) data_in1[13*i+j] = i%10; printf("Opening and configuring FPGA.\n");

example9HostCode.c 2/2 #2 Example No. 9 maxfile = max_maxfile_init_example9(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); printf("Streaming data to/from FPGA...\n"); max_run(device, max_input("x", data_in1, N * 13 * sizeof(float)), max_output("z", data_out, N * 13* sizeof(float)), max_runfor("example9Kernel", N * 13), max_end()); printf("Checking data read from FPGA.\n"); for(i = 0; i < N; i++){ expected += !(i%13) ? i%10 : 0; if (data_out[i] != expected){ printf("Error on element %d. Expected %f, but found %f.", i, expected, data_out[i]); break; } max_close_device(device); max_destroy(maxfile); return 0;

example9Kernel.java #3 Example No. 9 package ind.z9; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWType; import com.maxeler.maxcompiler.v1.kernelcompiler.stdlib.core.CounterChain; public class example9Kernel extends Kernel { public example9Kernel(KernelParameters parameters) { super(parameters); final HWType scalarType = hwFloat(8,24); CounterChain cc = control.count.makeCounterChain(); HWVar cnt = cc.addCounter(1000000,1); HWVar depth = cc.addCounter(13,1); // Input HWVar x = io.input("x", hwFloat(8,24), depth.eq(0) ); HWVar sum = scalarType.newInstance(this); HWVar result = x + (cnt>0?sum:0.0); sum <== stream.offset(result, -13); // Output io.output("z", result, hwFloat(8,24), depth.eq(0)); }

example9SimRunner.java #3 Example No. 9 example9SimRunner.java #3 package ind.z9; import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager; public class example9SimRunner { public static void main(String[] args) { SimulationManager m = new SimulationManager("example9Sim"); example9Kernel k = new example9Kernel( m.makeKernelParameters() ); m.setKernel(k); m.setInputData("x", 1, 2 , 3); m.setKernelCycles(27); m.runTest(); m.dumpOutput(); double expectedOutput[] = { 1, 3, 6 }; m.checkOutputData("z", expectedOutput); m.logMsg("Test passed OK!"); } We still need at least 27 cycles.

example9HostCode.c 1/2 #3 Example No. 9 #include <stdio.h> #include <stdlib.h> #include <MaxCompilerRT.h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in1, *data_out, expected = 0; unsigned long N, i; printf("Enter size of array: "); scanf("%lu",&N); data_in1 = malloc(N * sizeof(float)); data_out = malloc(N * sizeof(float)); for(i = 0; i < N; i++) data_in1[i] = i%10; printf("Opening and configuring FPGA.\n");

example9HostCode.c 2/2 #3 Example No. 9 maxfile = max_maxfile_init_example9(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); printf("Streaming data to/from FPGA...\n"); max_run(device, max_input("x", data_in1, N * sizeof(float)), max_output("z", data_out, N * sizeof(float)), max_runfor("example9Kernel", N * 13 - 12), max_end()); printf("Checking data read from FPGA.\n"); for(i = 0; i < N; i++){ expected += i%10; if (data_out[i] != expected){ printf("Error on element %d. Expected %f, but found %f.", i, expected, data_out[i]); break; } max_close_device(device); max_destroy(maxfile); return 0;

Example No. 10: Optimized Array Summation Write an optimized program that calculates the sum of numbers in an input array First, calculate several parallel/partial sums; then, add them at the end

example10Kernel1.java Example No. 10 package ind.z10; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWType; public class example10Kernel1 extends Kernel { public example10Kernel1(KernelParameters parameters) { super(parameters); final HWType scalarType = hwFloat(8,24); HWVar cnt = control.count.simpleCounter(64); // Input HWVar N = io.scalarInput("N", hwUInt(64)); HWVar x = io.input("x", hwFloat(8,24) ); HWVar sum = scalarType.newInstance(this); HWVar result = x + (cnt>0?sum:0.0); sum <== stream.offset(result, -13); // Output io.output("z", result, hwFloat(8,24), cnt > N-14); }

example10Kernel2.java Example No. 10 package ind.z10; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.kernelcompiler.KernelParameters; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWVar; import com.maxeler.maxcompiler.v1.kernelcompiler.types.base.HWType; import com.maxeler.maxcompiler.v1.kernelcompiler.stdlib.core.CounterChain; public class example10Kernel2 extends Kernel { public example10Kernel2(KernelParameters parameters) { super(parameters); final HWType scalarType = hwFloat(8,24); CounterChain cc = control.count.makeCounterChain(); HWVar cnt = cc.addCounter(14,1); HWVar depth = cc.addCounter(13,1); // Input HWVar x = io.input("x", hwFloat(8,24), depth.eq(0) ); HWVar sum = scalarType.newInstance(this); HWVar result = x + (cnt>0?sum:0.0); sum <== stream.offset(result, -13); // Output io.output("z", result, hwFloat(8,24), cnt.eq(12)); }

example10SimRunner.java Example No. 10 package ind.z10; import com.maxeler.maxcompiler.v1.managers.standard.SimulationManager; public class example10SimRunner { public static void main(String[] args) { SimulationManager m = new SimulationManager("example10Sim"); example10Kernel1 k = new example10Kernel1( m.makeKernelParameters() ); m.setKernel(k); m.setInputData("x", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26); m.setKernelCycles(26); m.runTest(); m.dumpOutput(); double exOutput[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39 }; m.checkOutputData("z", exOutput); m.logMsg("Test passed OK!"); }

example10Manager.java Example No. 10 package ind.z10; import com.maxeler.maxcompiler.v1.managers.custom.blocks.KernelBlock; import com.maxeler.maxcompiler.v1.managers.custom.CustomManager; import com.maxeler.maxcompiler.v1.managers.MAXBoardModel; class example10Manager extends CustomManager { public example10Manager(boolean is_simulation, String name, MAXBoardModel board_model ){ super(is_simulation, board_model, name); KernelBlock kb1 = addKernel(new example10Kernel1(makeKernelParameters("example10Kernel1"))); KernelBlock kb2 = addKernel(new example10Kernel2(makeKernelParameters("example10Kernel2"))); kb1.getInput("x") <== addStreamFromHost("x"); kb2.getInput("x") <== kb1.getOutput("z"); addStreamToHost("z") <== kb2.getOutput("z"); }

example10HostSimBuilder.java Example No. 10 package ind.z10; import static config.BoardModel.BOARDMODEL; import com.maxeler.maxcompiler.v1.managers.BuildConfig; import com.maxeler.maxcompiler.v1.managers.BuildConfig.Level; public class example10HostSimBuilder { public static void main(String[] args) { example10Manager m = new example10Manager(true,"example10HostSim", BOARDMODEL); m.setBuildConfig(new BuildConfig(Level.FULL_BUILD)); m.build(); }

example10HWBuilder.java Example No. 10 package ind.z10; import static config.BoardModel.BOARDMODEL; import com.maxeler.maxcompiler.v1.kernelcompiler.Kernel; import com.maxeler.maxcompiler.v1.managers.standard.Manager; import com.maxeler.maxcompiler.v1.managers.standard.Manager.IOType; public class example10HWBuilder { public static void main(String[] args) { example10Manager m = new example10Manager(false,"example10HostSim", BOARDMODEL); m.setBuildConfig(new BuildConfig(Level.FULL_BUILD)); m.build(); }

example10HostCode.c 1/2 Example No. 10 #include <stdio.h> #include <stdlib.h> #include <MaxCompilerRT.h> int main(int argc, char* argv[]) { char *device_name = (argc==2 ? argv[1] : "/dev/maxeler0"); max_maxfile_t* maxfile; max_device_handle_t* device; float *data_in1, *data_out, expected = 0; unsigned long N, i; printf("Enter size of array (it will be truncated to the firs lower number dividable with 13): "); scanf("%lu",&N); N /= 13; N *= 13; data_in1 = malloc(N * sizeof(float)); data_out = malloc(1 * sizeof(float)); for(i = 0; i < N; i++){ data_in1[i] = i%10; expected += data_in1[i]; }

example10HostCode.c 2/2 Example No. 10 printf("Opening and configuring FPGA.\n"); maxfile = max_maxfile_init_example10(); device = max_open_device(maxfile, device_name); max_set_terminate_on_error(device); max_set_scalar_input_f(device, "example10Kernel1.N", N, FPGA_A); max_upload_runtime_params(device, FPGA_A); printf("Streaming data to/from FPGA...\n"); max_run(device, max_input("x", data_in1, N * sizeof(float)), max_output("z", data_out, 2 * sizeof(float)), max_runfor("example10Kernel1", N), max_runfor("example10Kernel2", 13*12+2), max_end()); printf("Checking data read from FPGA.\n"); printf("Expected: %f, returned: %f\n", expected, *data_out); max_close_device(device); max_destroy(maxfile); return 0; }