UNIX’s “grand illusion” How Linux makes a hardware device appear to be a ‘file’
Basic char-driver components init exit fops function... Device-driver LKM layout registers the ‘fops’ unregisters the ‘fops’ module’s ‘payload’ is a collection of callback-functions having prescribed prototypes AND a ‘package’ of function-pointers the usual pair of module-administration functions
Background To appreciate the considerations that have motivated the over-all Linux driver’s design requires an understanding of how normal application-programs get their access to services that the operating system offers This access is indirect – through specially protected interfaces (i.e., system calls) – usually implemented as ‘library’ functions
Standard File-I/O functions int open( char *pathname, int flags, … ); int read( int fd, void *buf, size_t count ); int write( int fd, void *buf, size_t count ); int lseek( int fd, loff_t offset, int whence ); int close( int fd ); (and other less-often-used file-I/O functions)
Our ‘elfcheck.cpp’ example #include # for open() #include # for perror(), printf() #include # for read(), close() #include # for strncpy() charbuf[4]; int main( int argc, char *argv[] ) { if ( argc == 1 ) return -1;// command-line argument is required intfd = open( argv[1], O_RDONLY );// open specified file if ( fd < 0 ) { perror( argv[1] ); return -1;// quit if open failed intnbytes = read( fd, buf, 4 );// read first 4 bytes if ( nbytes < 0 ) { perror( argv[1] ); return -1;// quit if read failed if ( strncmp( buf, “\177ELF”, 4 ) == 0 )// check for ELF id printf( “File \’%s\’ has ELF signature \n”, argv[1] ); elseprintf( “File \’%s\’ is not an ELF file \n”, argv[1] ); close( fd );// close the file }
Special ‘device’ files UNIX systems treat hardware-devices as special files, so that familiar functions can be used by application programmers to access devices (e.g., open, read, close) But a System Administrator has to create these device-files (in the ‘/dev’ directory) Or alternatively (as we’ve seen), an LKM could create these necessary device-files
UNIX ‘man’ pages A convenient online guide to prototypes and semantics of the C library functions Example of usage: $ man 2 open
The ‘open’ function #include int open( char *pathname, int flags, … ); Converts a pathname to a file-descriptor File-descriptor is a nonnegative integer Used as a file-ID in subsequent functions ‘flags’ is a symbolic constant: O_RDONLY, O_WRONLY, O_RDWR
The ‘close’ function #include int close( int fd ); Breaks link between file and file-descriptor Returns 0 on success, or -1 if an error
The ‘write’ function #include int write( int fd, void *buf, size_t count ); Attempts to write up to ‘count’ bytes Bytes are taken from ‘buf’ memory-buffer Returns the number of bytes written Or returns -1 if some error occurred Return-value 0 means no data was written
The ‘read’ function #include int read( int fd, void *buf, size_t count ); Attempts to read up to ‘count’ bytes Bytes are placed in ‘buf’ memory-buffer Returns the number of bytes read Or returns -1 if some error occurred Return-value 0 means ‘end-of-file’
Notes on ‘read()’ and ‘write()’ These functions have (as a “side-effect”) the advancement of a file-pointer variable They return a negative function-value of -1 if an error occurs, indicating that no actual data could be transferred; otherwise, they return the number of bytes read or written The ‘read()’ function normally does not return 0, unless ‘end-of-file’ is reached
The ‘lseek’ function #include off_t lseek( int fd, off_t offset, int whence ); Modifies the file-pointer variable, based on the value of whence: enum { SEEK_SET, SEEK_CUR, SEEK_END }; Returns the new value of the file-pointer (or returns -1 if any error occurred)
Getting the size of a file For normal files, your application can find out how many bytes belong to a file using the ‘lseek()’ function: int filesize = lseek( fd, 0, SEEK_END ); But afterward you need to ‘rewind’ the file if you want to read its data: lseek( fd, 0, SEEK_SET );
Device knowledge Before you can write a device-driver, you must understand how the hardware works Usually this means you need to obtain the programmer manual (from manufacturer) Nowdays this can often be an obstacle But some equipment is standardized, or is well understood (because of its simplicity)
Our RTC example We previously learned how the Real-Time Clock device can be accessed by module code, using the ‘inb()’ and ‘outb()’ macros: So we can create a simple char driver that lets application-programs treat the RTC’s memory as if it were in an ordinary file outb( addr, 0x70 ); outb( data, 0x71 ); outb( addr, 0x70 ); data = inb( 0x71 );
How device-access works Physical peripheral device (hardware) Device-driver module (software) Operating system (software) Application program Standard runtime library supervisor space (privileged) user space (unprivileged) call ret int $0x80 iret int $0x80 iret call ret out in
Our ‘cmosram.c’ driver We implement three callback functions: –llseek:// sets file-pointer’s position –read:// inputs a byte from CMOS –write:// outputs a byte to CMOS We omit other callback functions, such as: –open:// we leave this function-pointer NULL –release:// we leave this function-pointer NULL The kernel has its own ‘default’ implementation for any function with NULL as its pointer-value
The ‘fops’ syntax The GNU C-compiler supports a syntax for ‘struct’ field-initializations that lets you give your field-values in any convenient order: struct file_operations my_fops = { llseek:my_llseek, write:my_write, read:my_read, };
‘init’ and ‘exit’ The module’s initialization function has to take care of registering the driver’s ‘fops’: register_chrdev( major, devname, &fops ); Then the module’s cleanup function must make sure to unregister the driver’s ‘fops’: unregister_chrdev( major, devname ); (These are prototyped in )
Our ‘dump.cpp’ utility We have written an application that lets users display the contents of any file in both hexadecimal and ascii formats It also works on ‘device’ files! With our driver-module installed, you can use it to view the CMOS memory-values: $./dump /dev/cmos
Now for a useful char-driver… We can create a character-mode driver for the processor’s physical memory (i.e. ram) (Our machines have 2-GB of physical ram) But another device-file is named ‘/dev/ram’ so ours will be: ‘/dev/dram’ (dynamic ram) We’ve picked 85 as its ‘major’ ID-number Our SysAdmin setup a device-node using: root# mknod /dev/dram c 85 0
2-GB RAM has ‘zones’ ZONE_NORMAL ZONE_HIGH ZONE_LOW MB 16-MB 2048-MB (= 2GB) 896-MB Installed physical memory
Legacy DMA Various older devices rely on the PC/AT’s DMA controller to perform data-transfers This chip could only use 24-bit addresses Only the lowest 16-megabytes of physical memory are ‘visible’ to these devices: 2 24 = 0x (16-megabytes) Linux tries to conserve its use of memory from this ZONE_LOW region (so devices will find free DMA memory if it’s needed)
‘Normal’ memory zone This zone extends from 16MB to 896MB Linux uses the lower portion of this zone for an important data-structure that tracks how all the physical memory is being used It’s an array of records: mem_map[] (We will soon be studying this structure) The remainder of ZONE_NORMAL is free for ‘dynamic’ allocations by the OS kernel
‘HIGH’ memory Linux traditionally tried to ‘map’ as much physical memory as possible into virtual addresses allocated to the kernel-space Before the days when systems had 1-GB or more of installed memory, Linux could linearly map ALL of the physical memory into the 1-GB ‘virtual’ kernel-region: 0xC – 0xFFFFFFFF But with 2-GB there’s not enough room!
always mapped The 896-MB limit Virtual address-space HIGH MEMORY application program’s code and data goes here HIGH ADDRESSES 896-MB User space (3-GB) Kernel space (1-GB) Physical address-space Installed ram (2-GB) A special pair of kernel-functions named ‘kmap()’ and ‘kunmap()’ can be called by device-drivers to temporarily map ‘vacant’ areas in the kernel’s ‘high’ address-space to pages of actual physical memory temporary mappings using kmap()
‘dram.c’ module-structure We will need three kernel header-files: –#include // for printk(), register_chrdev(), unregister_chrdev() –#include // for kmap(), kunmap(), and ‘num_physpages’ –#include // for copy_to_user()
Our ‘dram_size’ global Our ‘init_module()’ function will compute the size of the installed physical memory It will be stored in a global variable, so it can be accessed by our driver ‘methods’ It is computed from a kernel global using the PAGE_SIZE constant (=4096 for x86) dram_size = num_physpages * PAGE_SIZE
‘major’ ID-number Our ‘major’ device ID-number is needed when we ‘register’ our device-driver with the kernel (during initialization) and later when we ‘unregister’ our device-driver (during the cleanup procedure): int my_major = 85; // static ID-assignment
Our ‘file_operations’ Our ‘dram’ device-driver does not need to implement special ‘methods’ for doing the ‘open()’, ‘write()’, or ‘release()’ operations (the kernel ‘default’ operations will suffice) But we DO need to implement ‘read()’ and ‘llseek()’ methods Our ‘llseek()’ code here is very standard But ‘read()’ is specially crafted for DRAM
Using our driver We have provided a development tool on the class website (named ‘fileview.cpp’) which can be used to display the contents of arbitrary files -- including device-files! The data is shown in hex and ascii formats The arrow-keys can be used for navigation The enter-key allows an offset to be typed Keys ‘b’, ‘w’, ‘d’ and ‘q’ adjust data-widths
In-class exercise #1 Install the ‘dram.ko’ device-driver module; then use ‘fileview’ to browse the contents of the processor’s physical memory: $./fileview /dev/dram Be sure to try the ‘b’, ‘w’, ‘d’, ‘q’, and keys Also try:$./fileview /dev/cmos
In-class exercise #2 The read() and write() callback functions in our ‘cmosram.c’ device-driver only transfer a single byte of data for each time called, so it takes 128 system-calls to read all of the RTC storage-locations! Can you improve this driver’s efficiency, by modifying its read() and write() functions, so they’ll transfer as many valid bytes as the supplied buffer’s space could hold?