B LOCK D RIVERS Ted Baker Andy Wang CIS 4930 / COP 5641.

Slides:



Advertisements
Similar presentations
Chapter 12: File System Implementation
Advertisements

Chapter 4 Device Management and Disk Scheduling DEVICE MANAGEMENT Content I/O device overview I/O device overview I/O organization and architecture I/O.
Device Drivers. Linux Device Drivers Linux supports three types of hardware device: character, block and network –character devices: R/W without buffering.
RT_FIFO, Device driver.
Sogang University Advanced Operating Systems (Linux Device Drivers) Advanced Operating Systems (Linux Device Drivers) Sang Gue Oh, Ph.D.
Lecture for Lab 3, Exp1 of EE505 (Developing Device Driver) T.A. Chulmin Kim CoreLab. Mar, 11, 2011 [XenSchedulerPaper_Hotcloud-commits] r21 - /
Computer System Laboratory
CS 140 Project 3 Virtual Memory
USERSPACE I/O Reporter: R 張凱富.
R4 Dynamically loading processes. Overview R4 is closely related to R3, much of what you have written for R3 applies to R4 In R3, we executed procedures.
CS 450 Module R4. R4 Overview Due on March 11 th along with R3. R4 is a small yet critical part of the MPX system. In this module, you will add the functionality.
Module R2 CS450. Next Week R1 is due next Friday ▫Bring manuals in a binder - make sure to have a cover page with group number, module, and date. You.
The Linux Kernel: Memory Management
Allocating Memory Ted Baker  Andy Wang CIS 4930 / COP 5641.
CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Memory Management.
Memory management.
Using VMX within Linux We explore the feasibility of executing ROM-BIOS code within the Linux x86_64 kernel.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
63 UQC152H3 Advanced OS Writing a Device Driver. 64 The SCULL Device Driver Simple Character Utility for Loading Localities 6 devices types –Scull-03.
I/O Tanenbaum, ch. 5 p. 329 – 427 Silberschatz, ch. 13 p
Data Structures in the Kernel Sarah Diesburg COP 5641.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Loadable Kernel Modules Dzintars Lepešs The University of Latvia.
I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.
CS 6560 Operating System Design Lecture 13 Finish File Systems Block I/O Layer.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
Introduction to Processes CS Intoduction to Operating Systems.
SIMULATED UNIX FILE SYSTEM Implementation in C Tarek Youssef Bipanjit Sihra.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
File System Review bottomupcs.com J. Kubiatowicz, UC Berkeley.
Presenter : kilroy 1. Introduction Experiment 1 - Simulate a virtual disk device Experiment 2 - Nand-flash simulation for wear leveling algo. Conclusion.
UNIX Files File organization and a few primitives.
CS 241 Section Week #9 (11/05/09). Topics MP6 Overview Memory Management Virtual Memory Page Tables.
Lecture 18 Windows – NT File System (NTFS)
B LOCK L AYER S UBSYSTEM Linux Kernel Programming CIS 4930/COP 5641.
Jeff's Filesystem Papers Review Part I. Review of "Design and Implementation of The Second Extended Filesystem"
File Systems cs550 Operating Systems David Monismith.
4P13 Week 12 Talking Points Device Drivers 1.Auto-configuration and initialization routines 2.Routines for servicing I/O requests (the top half)
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
COMP 3438 – Part I - Lecture 5 Character Device Drivers
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 7 – Buffer Management.
FILE SYSTEM IMPLEMENTATION 1. 2 File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary.
1 The File System. 2 Linux File System Linux supports 15 file systems –ext, ext2, xia, minix, umsdos, msdos, vfat, proc, smb, ncp, iso9660, sysv, hpfs,
Linux Kernel Development Memory Management Pavel Sorokin Gyeongsang National University
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 4.
Lecture 26.
Memory management The main purpose of a computer system is to execute programs. These programs, together with the data they access, must be in main memory.
File System Design David E. Culler CS162 – Operating Systems and Systems Programming Lecture 23 October 22, 2014 Reading: A&D a HW 4 out Proj 2 out.
Chapter 2: The Linux System Part 4
Module 12: I/O Systems I/O hardware Application I/O Interface
Chapter 12: File System Implementation
FileSystems.
Avani R.Vasant V.V.P. Engineering College
Operating System I/O System Monday, August 11, 2008.
CSI 400/500 Operating Systems Spring 2009
Introduction to Linux Device Drivers
Computer-System Architecture
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
Selecting a Disk-Scheduling Algorithm
CS703 - Advanced Operating Systems
Overview Continuation from Monday (File system implementation)
Chapter 2: The Linux System Part 5
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
Buddy Allocation CS 161: Lecture 5 2/11/19.
COMP755 Advanced Operating Systems
Internal Representation of Files
Module 12: I/O Systems I/O hardwared Application I/O Interface
Presentation transcript:

B LOCK D RIVERS Ted Baker Andy Wang CIS 4930 / COP 5641

T OPICS Block drivers Registration Block device operations Request processing Other details

O VERVIEW OF DATA STRUCTURES

B LOCK D RIVERS Provides access to devices that transfer randomly accessible data in blocks, or fixed size chunks of data (e.g., 4KB) Note that underlying HW uses sectors (e.g., 512B) Bridge core memory and secondary storage Performance is essential Or the system cannot perform well Lecture example: sbd (Simple Block Device) A ramdisk driver-for-linux-kernel /

B LOCK DRIVER REGISTRATION To register a block device, call int register_blkdev(unsigned int major, const char *name); major : major device number If 0, kernel will allocate and return a new major number name : as displayed in /proc/devices To unregister, call int unregister_blkdev(unsigned int major, const char *name);

D ISK REGISTRATION register_blkdev Obtains a major number Does not make disk drives available to the system Need additional mechanisms to register a disk Need to know two data structures: struct block_device_operations Defined in struct gendisk Defined in

B LOCK DEVICE OPERATIONS struct block_device_operations is similar to file_operations Important fields /* may need to lock the door for removal media; unlock in the release method; may need to spin the disk up or down */ int (*open) (struct block_device *dev, fmode_t mode); int (*release) (struct gendisk *gd, fmode_t mode);

B LOCK DEVICE OPERATIONS int (*ioctl) (struct block_dev *bdev, fmode_t mode, unsigned int cmd, unsigned long long arg); /* check whether the media has been changed; gendisk represents a disk */ int (*media_changed) (struct gendisk *gd); /* makes new media ready to use */ int (*revalidate_disk) (struct gendisk *gd);

B LOCK DEVICE OPERATIONS int (*getgeo) (struct block_device *bdev, struct hd_geometry); struct module *owner; /* = THIS_MODULE */

B LOCK DEVICE OPERATIONS Note that no read and write operations Reads and writes are handled by the request function Will be discussed later

T HE GENDISK STRUCTURE struct gendisk represents a disk or a partition Must initialize the following fields int major; int first_minor; /* need one minor number per partition */ int minors; /* as shown in /proc/partitions & sysfs */ char disk_name[32];

T HE GENDISK STRUCTURE struct block_device_operations *fops; /* holds I/O requests for this device */ struct request_queue *queue; /* set to GENHD_FL_REMOVABLE for removal media; GENGH_FL_CD for CD-ROMs */ int flags; /* in 512B sectors; use set_capacity() */ sector_t capacity;

T HE GENDISK STRUCTURE /* pointer to internal data */ void *private data;

T HE GENDISK STRUCTURE To allocate, call struct gendisk *alloc_disk(int minors); minors : number of minor numbers for this disk; cannot be changed later To deallocate, call void del_gendisk(struct gendisk *gd); To make disk available to the system, call void add_disk(struct gendisk *gd); To make disk unavailable, call void put_disk(struct gendisk *gd);

I NITIALIZATION IN SBD Allocate a major device number... major_num = register_blkdev(major_num, "sbd"); if (major_num <= 0) { /* error handling */ }...

S BD DATA STRUCTURE struct sbd_device { int size; /* device size in sectors */ u8 *data; spinlock_t lock; struct gendisk *gd; } Device;

S BD DATA STRUCTURE INITIALIZATION... spin_lock_init(&Device.lock); Device.size = nsectors*logical_block_size; Device.data = vmalloc(Device.size); if (Device.data == NULL) { printk(KERN_NOTICE "vmalloc failure.\n"); return; } /* sbd_request is the request function */ Queue = blk_init_queue(sbd_request, &Device.lock);...

I NSTALL THE GENDISK STRUCTURE... Device.gd = alloc_disk(16); if (!Device.gd) { /* error handling */ } Device.gd->major = major_num; Device.gd->first_minor = 0; Device.gd->fops = &sbd_ops; Device.gd->queue = Queue; Device.gd->private_data = Device;...

I NSTALL THE GENDISK STRUCTURE... snprintf (Device.gd->disk_name, 32, "sbd%c", which + 'a'); set_capacity(Device.gd, nsectors*(hardsect_size/KERNEL_SECTOR_SIZE)); add_disk(Device.gd);...

S UPPORTING REMOVAL MEDIA Check to see if media has been changed, call int sbd_media_changed(struct gendisk *gd) { struct sbd_dev *dev = gd->private_data; return Device.media_change; } Prepare the driver for the new media, call int sbd_revalidate(struct gendisk *gd) { struct sbd_dev *dev = gd->private_data; if (Device.media_change) { Device.media_change = 0; memset(Device.data, 0, Device.size); } return 0; }

SBD IOCTL See drivers/block/ioctl.c for built-in commands To support fdisk and partitions, need to implement a command to provide disk geometry information has a dedicated block device operation called getgeo, which is no longer an ioctl call

SBD GETGEO int sbd_getgeo(struct block_device *bdev, struct hd_geometry *geo) { long size; size = Device.size *(logical_block_size / KERNEL_SECTOR_SIZE); geo->cylinders = (size & 0x3f) >> 6; geo->heads = 4; geo->sectors = 16; geo->start = 0; return 0; }

T HE ANATOMY OF A REQUEST The bio structure Contains everything that a block driver needs to carryout out an IO request Defined in Some important fields /* the first sector in this transfer */ sector_t bi_sector; /* size of transfer in bytes */ unsigned int bi_size;

T HE ANATOMY OF A REQUEST /* use bio_data_dir(bio) to check the direction of IOs*/ unsigned long bi_flags; /* number of segments within this bio */ unsigned short bio_phys_segments; struct bio_vec { struct page *bv_page; unsigned int bv_offset; // within a page unsigned int bv_len; // of this transfer }

T HE BIO STRUCTURE

For portability, use macros to operate on bio_vec int segno; struct bio_vec *bvec; bio_for_each_segment(bvec, bio, segno) { // Do something with this segment } Current bio_vec entry

L OW - LEVEL BIO OPERATIONS To access the pages directly, use char *__bio_kmap_atomic(struct bio *bio, int i, enum km_type type); void __bio_kunmap_atomic(char *buffer, enum km_type type);

L OW - LEVEL BIO MACROS /* returns the page to be transferred next */ struct page *bio_page(struct bio *bio); /* returns the offset within the current page to be transferred */ int bio_offset(struct bio *bio); /* returns a kernel logical (shifted) address pointing to the data to be transferred; the address should not be in high memory */ char *bio_data(struct bio *bio);

L OW - LEVEL BIO MACROS /* returns a kernel virtual (page-table-mapped) address pointing to the data to be transferred; the address can be in either high or low memory; atomic; can only map one segment at a time */ char *bio_kmap_irq(struct bio *bio, unsigned long *flags); Void bio_kunmap_irq(char *buffer, unsigned long *flags);

T HE REQUEST STRUCTURE A request structure is implemented as a linked list of bio structures, with some additional info Some important fields /* first sector that has not been transferred */ sector_t __sector; /* number of sectors yet to transfer */ unsigned int __data_len;

T HE REQUEST STRUCTURE /* linked list of bios, access via rq_for_each_bio */ struct bio *bio; /* same as calling bio_data() on current bio */ char *buffer;

T HE REQUEST STRUCTURE /* number of segments after merging */ unsigned short nr_phys_segments; struct list_head queuelist;

T HE REQUEST STRUCTURE

R EQUEST QUEUES struct request_queue or request_queue_t Include Keep track of pending block IO requests Create requests with proper parameters Maximum size, segments Hardware sector size Alignment requirement Allow the use of multiple IO schedulers Maximize performance in device-specific ways Sort blocks Apply deadlines Merge adjacent requests

Q UEUE CREATION AND DELETION To create and initialize a queue, call request_queue_t *blk_init_queue(request_fn_proc *request, spinlock_t *lock); request is the request function Spinlock controls the access to the queue Need to check out-of-memory errors To deallocate a queue, call void blk_cleanup_queue(request_queue_t *);

Q UEUEING FUNCTIONS Need to hold the queue lock To get the reference to the next request, call struct request *blk_fetch_request(request_queue_t *queue); Leave the request in the queue To remove a request from the queue, call void blk_dequeue_request(struct request *req); Used when a driver operates on multiple requests from a queue concurrently

Q UEUEING FUNCTIONS To put a dequeue request back, call void blk_requeue_request(request_queue_t *queue, struct request *req);

Q UEUE CONTROL FUNCTIONS /* if a device can handle more pending requests, call */ void blk_stop_queue(request_queue_t *queue); /* to restart the queue, call */ void blk_start_queue(request_queue_t *queue); /* set the highest physical address to which a device can perform DMA; the address can also be BLK_BOUNCE_HIGH, BLK_BOUNCE_ISA, or BLK_BOUNCE_ANY */ void blk_queue_bounce_limit(request_queue_t *queue, u64 dma_addr);

M ORE QUEUE CONTROL FUNCTIONS /* max in sectors */ void blk_queue_max_sectors(request_queue_t *queue, unsigned short max); /* for scatter gather */ void blk_queue_max_phys_segments(request_queue_t *queue, unsigned short max); void blk_queue_max_hw_segments(request_queue_t *queue, unsigned short max); /* in bytes */ void blk_queue_max_segment_size(request_queue_t *queue, unsigned int max);

Y ET MORE QUEUE CONTROL FUNCTIONS /* if a device cannot cross a 4MB boundary, use 0x3fffff as mask */ void blk_queue_segment_boundary(request_queue_t *queue, unsigned long mask); void blk_queue_dma_alignment(request_queue_t *queue, int mask);

R EQUEST COMPLETION FUNCTIONS After a device has completed transferring the current request chunk, call bool __blk_end_request_cur(struct request *req, int error); Indicates that the driver has finished transferring count sectors since the last time. Return false if all sectors in this request have been transferred and the request is complete Return true if there are still buffers pending

R EQUEST PROCESSING Every device is associated with a queue To read or write a block device, call void request(request_queue_t *queue); Runs in an atomic context Cannot access the current process May return before completing the request

W ORKING WITH SBD BIOS static void sbd_request(request_queue_t *q) { struct request *req; req = blk_fetch_request(q); while (req != NULL) { /* skip non-fs request */ if (!blk_fs_request(req)) { __blk_end_request_all(req, -EIO); continue; }

W ORKING WITH SBD BIOS sbd_transfer(&Device, blk_rq_pos(req), blk_rq_cur_sectors(req), req->buffer, rq_data_dir(req)); if (!__blk_end_request_cur(req, 0)) { req = blk_fetch_request(q) }

SBD _ TRANSFER static int sbd_transfer(struct sbd_dev *dev, sector_t sector, unsigned long nsect, char *buffer, int write) { unsigned long offset = sector * logical_block_size; unsigned long nbytes = nsect * logical_block size;

SBD _ TRANSFER if ((offset + nbytes) > dev->size) { /* error: write beyond the limit */ return; } if (write) memcpy(dev->data + offset, buffer, nbytes); else memcpy(buffer, dev->data + offset, nbytes); }

B ARRIER REQUESTS Reordering can be problematic Databases must be sure that their journals are flushed to storage Barrier requests If a request is marked with the REQ_HARDBARRIER flag, it must be written to the storage before the next request is initiated A driver needs to force HW caches to flush

B ARRIER REQUESTS To indicate driver support of barrier requests, use void blk_queue_ordered(request_queue_t *queue, int flag, prepare_flush_fn *pff); Set the flag to nonzero To test this flag, call int blk_barrier_rq(struct request *req); Returns nonzero for a barrier request