Presentation is loading. Please wait.

Presentation is loading. Please wait.

Why you should make an emulator… REVISION 2015.

Similar presentations

Presentation on theme: "Why you should make an emulator… REVISION 2015."— Presentation transcript:

1 Why you should make an emulator… REVISION 2015

2 Aims Make you think about “emulation” in a broader way Convince you that writing an emulator is fun (and you’ll learn useful skills) Highlight a few tricky areas Give a lightening introduction to FPGAs

3 About me First computer 1983 – Dragon 32 Learning how to adapt type-ins for other systems to the Dragon Gradually co-opted my Dad’s Amstrad CPC 464 between 1985 and 1986 Spectrum loader type-in -> learning how to adapt Spectrum save routine to CPC RM Nimbus PC with “IBM emulator” at school from 1987 Bought an Amiga around 1989 University in 1994, discovered UNIX and “source compatibility” Working life – half “business things” and half “games development” Experience on many varied types of machines and CPUs Oh, and all views and opinions are my own!

4 Classic arcade games

5 Nostalgia?

6 Definition em·u·late (ĕm′yə-lāt′) From Latin aemulātiō ("strive") = aemulor (“I rival, emulate”) +‎ -ātiō (“-ation”). 1. To strive to equal or excel; imitate with effort to equal or surpass. 2. To compete with successfully; approach or attain equality with. 3. Computers ◦to imitate the function of (another system), by using a software system, often including a microprogram or another computer that enables it to do the same work and run the same programs, and achieve the same results. ◦to replace (software) with hardware to perform the same task.

7 Related terms Retro gaming Compatible ◦(of software) capable of being run on another computer without change. ◦(of hardware) capable of being connected to another device without the use of special equipment or software. Legacy ◦of or relating to old or outdated computer hardware, software, or data that, while still functional, does not work well with up-to-date systems. Reverse engineering

8 What’s the legacy of these systems?

9 Why should we emulate? Compatibility with the previous generation allows easy transition Ready made market and existing customer momentum Doesn’t have to be an all-or-nothing transition People can continue to use their existing software, but can do more with it More memory, faster execution, etc. New features will encourage more sales

10 Why should we emulate? Hardware used to be considered hard and expensive Billions of dollars to set up a chip fab Slow process, mistakes are costly to rectify Despite the cost of development, older hardware becomes obsolete rapidly Software by comparison was traditionally considered “easy” Barrier of entry for software low – just a compiler or assembler Result: lots of software, and people want to use older, unmaintained software Is this true today? Modern software complexity in MLOC with hundreds to thousands of developers on a project Modern CPUs have ~7bn transistors, but many repeating structures How many CPU designers in the world compared to number of software developers?

11 Why should we emulate? Systems that don’t get emulated are expensive to maintain Ultimately all of this software becomes unusable or impractical due to scarcity of hardware Reasons not to be emulated Why buy your product if another does everything yours does and more? Regional lock-in, copy protection, etc…

12 Why me? Why should I make an emulator?

13 Understanding the machine Great way of getting to know a new machine Will be able to fully appreciate the instruction set of the CPU – often applicable to other CPUs Better appreciation for compiler generated code and how to optimise code State machines might well be the simplest solution Really understand the entire machine

14 Much to learn you still have…

15 Following a design Building a project to someone else’s design In some cases, trying to figure out what their design was Learning to understand the problem from their perspective and why they chose their solutions You’ll probably learn a new approach that help you think about other problems in a new way If you’re a software guy, a hardware perspective will definitely give you new insights! A combination of detective work and research

16 Test-driven development Emulators are great candidates for test-driven development Great to log everything at first Disassemble each instruction as it’s executed along with relevant registers Great for debugging and seeing what changes between runs Can compare against known results, e.g. NESTEST and 6502 core For speed, you’ll probably want to be able to turn this off selectively Unit tests for subsystems, they can probably all be used independently and incrementally Try to test everything you’ve implemented! You can throw away parts as you realise a better way, so it can grow with your knowledge You probably already have programs you can use to test your progress so you can be confident when you’ve hit your goal Write test programs and compare against real hardware whenever possible

17 Optimisation For a software emulator, it’s always worth starting with un-optimised code Quickly get the system up and running, don’t be afraid to throwaway code! Especially with your unit tests, you can optimise later with confidence that it still works Might be best not to implement things until they’re needed… or until your have a test case ready The target system is probably deterministic from a known set of conditions Different approaches Parsing bit patterns in CPU instructions – slowest but less code and closer to hardware implementation Switch block with a case for each instruction – long and possibly unwieldy C/C++ macros can simplify this Function per instruction – as specialised as required Heat maps Dynamic code generation Memory map markup – read/written flags, PC when modified, data breakpoints, etc.

18 How to start making an emulator

19 Components of a typical system CPU RAM ROM Video output, possibly GPU Audio Storage Other IO Glue logic – 74 series chips, ULAs, PLDs, FPGAs, ASICs.

20 Need to understand your system Google it!

21 Need to understand your system


23 Datasheets are full of… data!


25 Start with the CPU

26 Decapping

27 So, now you understand what you’re emulating…

28 Decide on your goal As fast as possible? Maybe you just want a serial based CPM system… Or only care about certain pieces of software or hardware Exact CPU timings? Each chip will usually be synchronised in hardware Games and demos usually will require more accuracy Most things driven by CPU, but need to be able to respond to interrupts in a timely manner… If the target system is slow enough, you can interleave CPU cycles and other hardware, that requires state in each system… just like real hardware! Increased capabilities and optional add-ons?

29 Start with the CPU Do you need to consider instruction pipeline? IF – ID – EX – MEM – WB is typical, but some pipelines are deeper (INSTRUCTION FETCH – INSTRUCTION DECODE – EXECUTE – MEMORY ACCESS – REGISTER WRITEBACK) Some architectures have latency before registers are updated Delay cycles / stall Old results returned Branch delay slots Sparc, MIPS

30 Memory access Most systems have memory maps split into distinct regions, i.e. they use higher address bits for region decode, e.g. 4 x 16KB banks on Amstrad Many systems (especially RISC and 6502) use memory mapped IO, e.g. C64: $D000-$DFFF Consider masking off these region bits and using them as lookups into a table of read/write functions, e.g. page size of 4KB on many 32-bit systems. // example with 256-byte pages on a 6502: class MemoryHandler { virtual uint8_t read_byte(uint16_t addr) =0; virtual void write_byte(uint16_t addr, uint8_t data) =0; }; MemoryHandler *memory_handlers[0x100];

31 Example 6502 memory maps NESC64

32 Video output Very simple fixed layout, e.g. Spectrum Frame at a time? Rendering in parallel to and synchronised to CPU instruction – e.g. loading borders Racing the beam? Essential to have accurate cycle counting Programmable hardware, e.g in CPC, IBM, BBC – can even change width of a line! Tiled video memory, e.g. NES, C64, IBM text modes Memory contention – very common between CPU and video hardware Usually resolved with a CPU stall or video snow

33 Video output What frame rate are you running at? ◦Is that the same as the emulating system? ◦Change speed to match -> audio will change speed ◦59.94Hz of most NTSC systems c.f. 60 Hz of most monitors – speed up probably wouldn’t be noticed ◦50Hz PAL systems –only 5 of 6 frames rendered (juddery) or blend frames


35 Sound Can be very simple (on/off for Spectrum) or complicated (many channels or midi) Especially for the simple beeper case, you’ll definitely need to cycle count accurately as incorrect timing can drastically change the sound Less important on a chip like AY-8192 ◦Timing for tone generation is done on the chip itself ◦CPU usually will change registers infrequently, e.g. 50Hz or 100Hz ◦Things like hardware envelope reset are timing dependant You may need a LUT on the output to correct the volume to emulate any filter circuitry Not all channels are the same volume, e.g. CPC – 1000R on L&R, 2200R on C, split between L&R

36 Input – Amstrad CPC

37 Storage Are there any established formats for your platform or similar? ◦e.g. tzx for Spectrum also used on Amstrad CPC, d64 for C64 disk etc. ◦Disk image format not shared, even though the hardware is basically the same ◦Could you extend your emulator to allow it to discover the new system? ◦E.g. selecting disk image from inside the emulator, flashing a “ROM” from a disk file, etc… Snapshots / image of running system ◦Could be done at a vsync boundary, but state of each component needs to be stored especially timing data ◦Very useful for debugging, especially is snapshots are created automatically at regular intervals – can replay and single step through a problem rather than trying to diagnose it post mortem

38 Glue logic Probably toughest part documentation-wise. Hopefully all the programmer-accessible parts are well documented Internals probably not documented Strange behaviour, e.g. interrupt handling rules on Amstrad ◦May be simpler than Probably best to google for “how to program” documents Remember, there’s probably a good reason for every odd looking decision! Proprietary systems probably have all this information under NDA or never disclosed to developers at all. Patents often document systems very well, however.

39 And when it all comes together…

40 Doing it in hardware

41 Why hardware Hardware is “cooler” Can make a “drop-in” replacement, e.g. output to a real TV Nice to have a dedicated emulator system, feels more authentic Easier to think in the cycle-counting mindset – it’s a lot closer to the original hardware this way You get to learn something new!

42 FPGA macroblock Forget everything you know about software… well almost everything! FPGA is nothing like a CPU! Lots of parallel circuits, if you want sequential operation you need to make a state machine. These macroblocks are essentially 4-bit input, 1- bit output lookup tables, but can also be used as RAM or shift registers. Synthesizer takes care of most of the details. Don’t think in terms of bytes, words, ints, etc… data is as wide as you need.

43 FPGA elements BRAM blocks – for registers, delay lines, small caches, complicated lookup tables, boot ROMs Very configurable, e.g. 16Kx1, 8Kx2, 4Kx4, 2Kx9, 1Kx18, 512x36 Can be wired even wider using multiple blocks Dual ported FAST Modularise everything May be able to reuse elements in other designs, e.g. PS/2 keyboard, flash ROM, etc. FPGAs are parallel, so you can use a module multiple times Easier to replace

44 Clocks Clocks are especially problematic – you want very few clock domains, and ideally convert all sources to the master clock as soon as possible. Limited global clock resources… Clock dividers and phase problems Don’t want too fast a clock or the design won’t synthesize correctly You may be able to ignore some of these errors if you know the clock is divided from a faster source.

45 FPGAs Often you’ll want to step back from the problem and try to work out how the chip was originally implemented… If you see any comparisons apart from equality, it’s probably wrong! Preferable to reset a counter to a known value and increment / decrement until all bits 0 or 1 or a carry occurs You can use a LFSR instead of a counter to optimise gate counts, but it’s harder to determine the initial values

46 CPC FPGA board 5V power 512KB RAM, 512KB ROM PS/2 keyboard Audio jack SCART USB (serial and programming) SD card slot 2 x joystick Expansion pins – lots of expandability

47 Resources The ZX Spectrum ULA: How to design a microcomputer – Chris Smith, ISBN Rapid Prototyping of Digital Systems: A Tutorial Approach – James Hamblen, ISBN: Ken Sheriff’s blog


49 Example VHDL int a,b; // inputs int sum,product; // outputs void run_one_cycle(void) { sum = a+b; product = a*b; } entity example is port ( clock: in std_logic; a: in integer range 0 to 65535; b: in integer range 0 to 65535; sum: out integer range 0 to ; product: out integer range 0 to ) begin process(clock) begin if rising_edge(clock) then sum <= a+b; product <= a*b; end if; end process; end example;

50 Example using bit vectors int a,b; // inputs int sum,product; // outputs void run_one_cycle(void) { sum = a+b; product = a*b; } entity example is port ( clock: in std_logic; a: in std_logic_vector(15 downto 0); b: in std_logic_vector(15 downto 0); sum: out std_logic_vector(16 downto 0); product: out std_logic_vector(31 downto 0) begin process(clock) begin if rising_edge(clock) then sum <= (“0”&a) + (“0”&b); product <= a*b; end if; end process; end example;

Download ppt "Why you should make an emulator… REVISION 2015."

Similar presentations

Ads by Google