Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati 1.

Similar presentations


Presentation on theme: "Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati 1."— Presentation transcript:

1 Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati 1

2 Intel 945 Motherboard architecture GMCH ICH7 (8254,8259,8237) PCI and PCI Express Video Ram, In build GPU DirectX, OpenGL, OpenCL Advance GPU from ATI and AMD – Introduction to Nvidia Cuda Programming 2

3 4 Serial ATA Ports Integrated Matrix Storage Technology 6 PCI Slots BIOS Support BIOS Support Intel HD Audio 8 high Speed USB Ports 6 PCI Express* x1 slot 6 PCI Express* x1 slot Intel Pro 100/1000 LAN Intel Active Mngement Tech. Intel Active Mngement Tech. 82801 GR ICH7 (io cont. hub sys7) South Bridge 82801 GR ICH7 (io cont. hub sys7) South Bridge Intel Pentium D Processor Intel Pentium D Processor DDR2 Support for Media Ext Card Intel GMA 950 Graphics PCI Express* x16 Graphics PCI Express* x16 Graphics 82945 GMCH/MCH North Bridge 82945 GMCH/MCH North Bridge 3

4 Graphics and Memory Controller Hub Graphics Interface (GI) and PCI Express for Graphics card support Host Interface (HI) – Connect to processor and support HT, IntrDelivery, 12 in-order queue, etc. System Memory Interface (SMI) – Connected to two channel DDR2 Direct Media Interface (DMI) – Connect to ICH7 4

5 IO Controller HUB version 7 (South Bridge) Enhance DMA controller, IC and timer – Two cascaded 8259 PIC – One 82C54 PIT (Motorola) – One 8237 DMA Low Pin count (LPC) Interface PCI and PCI express (Peripheral Component. Int) AC97 & HD Audio Codec Serial Peripheral Interface (SPI) Support Firm wire support (BIOS) ACPI, SATA, USBs 5

6 Peripherals : HD monitor Interfaces : Intermediate Hardware – Nvidia GPU card Interfaces : Intermediate Software/Program – Nvidia GPU driver Intel Pentium D Processor Intel Pentium D Processor DDR2 Support for Media Ext Card Intel GMA 950 Graphics PCI Express* x16 Graphics PCI Express* x16 Graphics 82945 GMCH/MCH North Bridge 82945 GMCH/MCH North Bridge 6

7 Char display (80x25 char, 5x7pixel=400x175) CRT Monitor (400x600, 640x480,600x800) LCD Monitor (1024x768,1280x1024,…) Graphics visually more appealing Display Line, Circle, Rectangle, Curve, Polygon – Character using this primitives – True type font RED ARROW Circle 7

8 1024x768 Pixel LCD 0 1 2 3 4 ….. …1023 0 1 2 767 R R Row Ctr Col Ctr Col Ctr CLK > 1024x768x50Hz B B G G 8x3=24 Bits Frame Buffer Refresh screen 50 time a Sec 8

9 Pixels in Frame Buffer Pixels on the Screen 24 Bit Per Pixels Graphical representation of 24 bit color 9

10 GPU : specialized processor that accelerates 3D or 2D graphics primitives operations Lots of Floating point operations Accelerates Primitives – Line, circle, polygon, mesh, projection, sphere, 10

11 3D application 3D API: OpenGL DirectX/3D 3D API: OpenGL DirectX/3D 3D API Commands CPU-GPU Boundary GPU Command & Data Stream GPU Command GPU Command Primitive Assembly Primitive Assembly Rastereisation Interpolation Rastereisation Interpolation Raster Operation Raster Operation Frame Buffer Programmable Fragment Processors Programmable Fragment Processors Programmable Vertex Processor Programmable Vertex Processor Vertex Index Stream Assembled polygon, line & points Pixel Location Stream Pixel Updates Transformed Fragments Rastorized Pretransformed Fragments transformed Vertices Pretransformed Vertices 11

12 Memory System Memory System Texture Memory Texture Memory Frame Buffer Frame Buffer Vertex Processing Vertex Processing Pixel Processing Pixel Processing Vertices (x,y,z) Pixel R, G,B Vertex Shadder Vertex Shadder Pixel Shadder Pixel Shadder 12

13 Access to video memory We create a Linux device-driver that gives applications access to graphics frame-buffer Accessing Frame buffer through PCI Express slot Assume a Graphics card is installed in your system 13

14 user application user application standard “runtime” libraries standard “runtime” libraries call ret user spacekernel space Operating System kernel Operating System kernel syscall sysret device-driver module device-driver module call ret hardware device out in i/o memory RAM A device-driver is a software module that controls a hardware device in response to OS kernel requests relayed, often, from an application 14

15 The graphics screen is a two-dimensional array of picture elements (‘pixels’) Each pixel’s color is an individually programmable mix of red, green, and blue These pixels are redrawn sequentially, left-to-right, by rows from top to bottom 15

16 VRAM RAM CPU CRT 16-MB of VRAM 2048-MB of RAM 16

17 This depends on – the total number of pixels – the number of bits-per-pixel The total number of pixels – Determined by the screen’s width and height – 1280-by-960= 1,228,800 pixels The number of bits-per-pixel (“color depth”) is a programmable parameter (varies from 1 to 32) Certain types of applications also need to use extra VRAM – for multiple displays, or for “special effects” like computer game animations 17

18 R B G alpha red green blue 081624 pixel longword The intensity of each color-component within a pixel is an 8-bit value 0.5, 0, 1, 0 0, 0.5, 0 Alpha represent pre- multiplied valued 18

19 B B G G R R A A B B G G R R A A B B G G R R VRAM 0 1 2 3 Video Screen 4 5 6 78 9 10 … “truecolor” graphics-modes use 4-bytes per picture-element 19

20 Linux is a “protected-mode” operating system I/O devices normally are not directly accessible Linux on x86 platforms uses “virtual memory” Privileged software must “map” the VRAM A device-driver module is needed: ‘vram.c’ We can compile it using: $ mmake vram Device-node: # mknod /dev/vram c 98 0 Make it ‘writable’: # chmod a+w /dev/vram 20

21 It’s a character-mode Linux device-driver It implements four device-file ‘methods’: – ‘read()’: lets a program read from video memory – ‘write()’: lets a program write to video memory – ‘llseek()’: lets a program ‘move’ the file’s pointer – ‘mmap()’: lets a program ‘map’ vram to user-space It also implements a pseudo-file that lets users view the RADEON X300 graphics controller’s PCI Configuration Space parameter-values: $ cat /proc/vram 21

22 It’s an acronym for “Peripheral Component Interconnect” and refers to a collection of industry standards for devices used in PCs An Intel-sponsored initiative (from 1992-9) having several ambitious goals: Reduce diversity inherent in legacy PC devices Improve speed and efficiency of data-transfers Eliminate (or reduce) platform dependencies Simplify adding/removing peripheral adapters Lower PC’s total consumption of electrical power 22

23 PCI Configuration Space Body (48 doublewords – variable format) PCI Configuration Space Body (48 doublewords – variable format) 64 doublewords PCI Configuration Space Header (16 doublewords – fixed format) PCI Configuration Space Header (16 doublewords – fixed format) A non-volatile parameter-storage area for each PCI device-function 23

24 Status Register Status Register Command Register Command Register Device ID Device ID Vendor ID Vendor ID BIST Cache Line Size Cache Line Size Class Code Class/SubClass/ProgIF Class Code Class/SubClass/ProgIF Revision ID Revision ID Base Address 0 Subsystem Device ID Subsystem Device ID Subsystem Vendor ID Subsystem Vendor ID CardBus CIS Pointer reserved capabilities pointer capabilities pointer Expansion ROM Base Address Minimum Grant Minimum Grant Interrupt Pin Interrupt Pin reserved Latency Timer Latency Timer Header Type Header Type Base Address 1 Base Address 2 Base Address 3 Base Address 4 Base Address 5 Interrupt Line Interrupt Line Maximum Latency Maximum Latency 31 0 16 doublewords Dwords 1 - 0 3 - 2 5 - 4 7 - 6 9 - 8 11 - 10 13 - 12 15 - 14 24

25 0x8086 – Intel Corporation 0x1022 – Advanced Micro Devices, Inc 0x1002 – Advanced Technologies, Inc (My office machine) 0x10EC – RealTek, Incorporated 0x10DE – Nvidia Corporation 0x10B7 – 3Com Corporation 0x101C – Western Digital, Inc 0x1014 – IBM Corporation 0x0E11 – Compaq Corporation 0x1057 – Motorola Corporation 0x106B – Apple Computers, Inc 0x5333 – Silicon Integrated Systems, Inc 25

26 0x5347: ATI RAGE128 SG 0x4C58: ATI RADEON LX 0x5950: ATI RS480 0x436E: ATI IXP300 SATA 0x438C: ATI IXP600 IDE 0x5B60:ATI Radeon HD 3200 Graphics See this Linux header-file for lots more examples: 26

27 0x00: Legacy Device (i.e., built before class-codes were defined) 0x01: Mass Storage controller 0x02: Network controller 0x03: Display controller 0x04: Multimedia device 0x05: Memory Controller 0x06: Bridge device 0x07: Simple Communications controller 0x08: Base System peripherals 0x09: Input device 0x0A: Docking stations 0x0B: Processors 0x0C: Serial Bus controllers 0x0D: Wireless controllers 0x0E: Intelligent I/O controllers 0x0F: Encryption/Decryption controllers 0x10: Satellite Communications controllers 0x11: Data Acquisition and Signal Processing controllers 27

28 Class Code 0x01: Mass Storage controller – 0x00: SCSI controller – 0x01: IDE controller – 0x02: Floppy Disk controller – 0x03: IPI controller – 0x04: RAID controller – 0x80: Other Mass Storage controller 28

29 Class Code 0x02: Network controller –0x00: Ethernet controller –0x01: Token Ring controller –0x02: FDDI controller –0x03: ATM controller –0x04: ISDN controller –0x80: Other Network controller 29

30 Class Code 0x03: Display Controller – 0x00: VGA-compatible controller – 0x01: XGA controller – 0x02: 3D controller – 0x80: Other display controller 30

31 Graphics controllers use vendor-specific mechanisms to perform similar operations There’s a common core of compatibility with IBM’s VGA (Video Graphics Array) developed in the mid-1980s But since IBM’s loss of market dominance, each manufacturer has added enhancements which employ incompatible programming interfaces You need a vendor’s manual! (Download from vendor site) 31

32 Today’s PCI graphics systems all provide a dedicated amount of display memory to control the screen-image’s pixel-coloring But how much memory will vary with price And its location within the CPU’s physical address-space can’t be predicted because it depends upon what other PCI devices are installed (and mapped) during startup 32

33 The PCI Configuration Header has several so- called Base Address fields, and vendors use one of these to hold the frame-buffer’s starting address and to indicate how much vram the video controller can actually use The Linux kernel provides driver-writers with some convenient functions for getting the location and size of the frame-buffer 33

34 Our ‘vram.c’ module’s initialization routine employs these kernel helper-functions: #include struct pci_dev *devp; // for a variable that will point to //a kernel-structure // get a pointer to the PCI device’s Linux data-structure devp = pci_get_device( VENDOR_ID, DEVICE_ID, NULL ); if ( !devp ) return –ENODEV;// device is not present // get starting address and length for memory-resource 0 vram_base = pci_resource_start( devp, 0 ); vram_size = pci_resource_len( devp, 0 ); #include struct pci_dev *devp; // for a variable that will point to //a kernel-structure // get a pointer to the PCI device’s Linux data-structure devp = pci_get_device( VENDOR_ID, DEVICE_ID, NULL ); if ( !devp ) return –ENODEV;// device is not present // get starting address and length for memory-resource 0 vram_base = pci_resource_start( devp, 0 ); vram_size = pci_resource_len( devp, 0 ); 34

35 You can use our ‘fileview’ utility to see the current contents of the video frame-buffer $ fileview /dev/vram Our ‘vram.c’ driver’s ‘read()’ method gets invoked when an application-program attempts to ‘read’ from the ‘/dev/vram’ device-file The read-method is implemented by our driver using ‘ioremap()’ (and ’iounmap()’) to temporarily map a 4KB-page of physical vram to the kernel’s virtual address-space 35

36 Linux provides a ‘platform-independent’ way to do copying from an i/o-device’s memory into an application’s buffer (or vice-versa): – A ‘read’ copies from vram to a user’s buffer memcpy_fromio( buf, vaddr, len ); – A ‘write’ copies to vram from a user’s buffer memcpy_toio( vaddr, buf, len ); 36

37 This is a standard UNIX system-call that lets an application ‘map’ a file into its virtual address- space, where it can then treat the file as if it were an ordinary array See the man-page: $ man mmap This same system-call can also work on a device-file if that device’s driver provided ‘mmap()’ among its file-operations 37

38 In the application-program, six arguments get passed to the ‘mmap()’ library-function int mmap( (void*)baseaddress, int memorysize, int accessattributes, int flags, int filehandle, int offset ); 38

39 In the kernel, those six arguments will get validated and processed, then the driver’s ‘mmap()’ callback-function will be invoked to supply missing information and perform further sanity-checks and do appropriate page-mapping actions: int mmap( struct file *file, struct vm_area_struct *vma ); 39

40 int mmap( struct file *file, struct vm_area_struct *vma ) { // extract the paramers we will need from the ‘vm_area_struct’ unsigned longregion_length = vma->vm_end – vma->vm_start; unsigned longregion_origin = vma->vm_pgoff * PAGE_SIZE; unsigned longphysical_addr = fb_base + region_origin; unsigned longuser_virtaddr = vma->vm_start; // sanity check: mapped region cannot extend past end of vram if ( region_origin + region_length > fb_size ) return –EINVAL; // tell the kernel not to try ‘swapping out’ this region to the disk vma->vm_flags |= VM_RESERVED; // tell the kernel to exclude this region from any core dumps vma->vm_flags |= VM_IO; int mmap( struct file *file, struct vm_area_struct *vma ) { // extract the paramers we will need from the ‘vm_area_struct’ unsigned longregion_length = vma->vm_end – vma->vm_start; unsigned longregion_origin = vma->vm_pgoff * PAGE_SIZE; unsigned longphysical_addr = fb_base + region_origin; unsigned longuser_virtaddr = vma->vm_start; // sanity check: mapped region cannot extend past end of vram if ( region_origin + region_length > fb_size ) return –EINVAL; // tell the kernel not to try ‘swapping out’ this region to the disk vma->vm_flags |= VM_RESERVED; // tell the kernel to exclude this region from any core dumps vma->vm_flags |= VM_IO; 40

41 // invoke a helper-function that will set up the page-table entries if ( remap_pfn_range( vma, user_virtaddr, physical_addr >> 12, region_length, vma->vm_page_prot ) ) return –EAGAIN; return0; // SUCCESS } // invoke a helper-function that will set up the page-table entries if ( remap_pfn_range( vma, user_virtaddr, physical_addr >> 12, region_length, vma->vm_page_prot ) ) return –EAGAIN; return0; // SUCCESS } 41

42 This application-program will demonstrate use of our ‘vram.c’ device-driver’s ‘read()’, ‘write()’ and ‘llseek()’ methods (i.e., device-file operations) It will perform a rotation of the color-components (R,G,B) in every displayed ‘truecolor’ pixel: R  G G  B B  R After 3 times the screen will look normal again 42

43 43


Download ppt "Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati 1."

Similar presentations


Ads by Google