Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Wander through GHC’s New IO library Simon Marlow.

Similar presentations

Presentation on theme: "A Wander through GHC’s New IO library Simon Marlow."— Presentation transcript:

1 A Wander through GHC’s New IO library Simon Marlow

2 The 100-mile view the API changes: – Unicode putStr “A légpárnás hajóm tele van angolnákkal” works! (if your editor is set up right…) locale-encoding by default, except for Handles in binary mode ( openBinaryFile, hSetBinaryMode ) changing the encoding on the fly hSetEncoding :: Handle -> TextEncoding -> IO () hGetEncoding :: Handle -> IO (Maybe TextEncoding) data TextEncoding latin1, utf8, utf16, utf32, … :: TextEncoding mkTextEncoding :: String -> IO TextEncoding localeEncoding :: TextEncoding

3 The 100-mile view (cont.) Better newline support – teletypes needed both CR+LF to start a new line, and we’ve been paying for it ever since. hSetNewlineMode :: Handle -> NewlineMode -> IO () data Newline = LF {- “\n” –} | CRLF {- “\r\n” -} nativeNewline :: Newline data NewlineMode = NewlineMode { inputNL :: Newline, outputNL :: Newline } noNewlineTranslation = NewlineMode { inputNL = LF, outputNL = LF } universalNewlineMode = NewlineMode { inputNL = CRLF, outputNL = nativeNewline } nativeNewlineMode = NewlineMode { inputNL = nativeNewline, outputNL = nativeNewline }

4 The 10-mile view Unicode codecs: – built-in codecs for UTF-8, UTF-16(LE,BE), UTF- 32(LE-BE). – Other codecs use iconv on Unix systems – Built-in codecs only on Windows (no code pages) yet… – The pieces for building a codec are provided…

5 The 10-mile view Build your own codec: API in GHC.IO.Encoding data BufferCodec from to state = BufferCodec { encode :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to) close :: IO () getState :: IO state setState :: state -> IO () } type TextEncoder state = BufferCodec Char Word8 state type TextDecoder state = BufferCodec Word8 Char state data TextEncoding = forall dstate estate. TextEncoding { mkTextDecoder :: IO (TextDecoder dstate) mkTextEncoder :: IO (TextEncoder estate) } Saving and restoring state is important since Handles support buffering, random access, and changing encodings

6 The 1-mile view Make your own Handles! – why mkFileHandle, not mkHandle? mkFileHandle :: (IODevice dev, BufferedIO dev, Typeable dev) => dev -> FilePath -> IOMode -> Maybe TextEncoding -> NewlineMode -> IO Handle Type class providing I/O device operations: close, seek, getSize, … Type class providing buffered reading/writing Typeable, in case we need to take the Handle apart again later For error messages ReadMode/WriteMode/…

7 IODevice -- | I/O operations required for implementing a 'Handle'. class IODevice a where -- | closes the device. Further operations on the device should -- produce exceptions. close :: a -> IO () -- | seek to the specified positing in the data. seek :: a -> SeekMode -> Integer -> IO () seek _ _ _ = ioe_unsupportedOperation -- | return the current position in the data. tell :: a -> IO Integer tell _ = ioe_unsupportedOperation -- | returns 'True' if the device is a terminal or console. isTerminal :: a -> IO Bool isTerminal _ = return False … etc … Default is for the operation to be unsupported

8 BufferedIO class BufferedIO dev where newBuffer :: dev -> BufferState -> IO (Buffer Word8) fillReadBuffer :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) fillReadBuffer0 :: dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8) emptyWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer0 :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) Device gets to allocate the buffer. This allows the device to choose the buffer to point directly at the data in memory, for example. 0-versions are non-blocking, non-0 versions must read or write at least one byte (but may transfer less than the whole buffer)

9 RawIO -- | A low-level I/O provider where the data is bytes in memory. class RawIO a where read :: a -> Ptr Word8 -> Int -> IO Int readNonBlocking :: a -> Ptr Word8 -> Int -> IO (Maybe Int) write :: a -> Ptr Word8 -> Int -> IO () writeNonBlocking :: a -> Ptr Word8 -> Int -> IO Int readBuf :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8) readBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8) writeBuf :: RawIO dev => dev -> Buffer Word8 -> IO () writeBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)

10 Example: a memory-mapped Handle Random-access read/write doesn’t perform very well with ordinary buffered I/O. – Let’s implement a Handle backed by a memory- mapped file – We need to 1.define our device type 2.make it an instance of IODevice and BufferedIO 3.provide a way to create instances

11 Example: memory-mapped files 1.Define our device type data MemoryMappedFile = MemoryMappedFile { mmap_fd :: FD, mmap_addr :: !(Ptr Word8), mmap_length :: !Int, mmap_ptr :: !(IORef Int) } deriving Typeable Ordinary file descriptor, provided by GHC.IO.FD Address in memory where our file is mapped, and its length The current file pointer (Handles have a built-in notion of the “current position” that we have to emulate) Typeable is one of the requirements for making a Handle

12 aside: Buffers module GHC.IO.Buffer ( Buffer(..),.. ) where data Buffer e = Buffer { bufRaw :: !(ForeignPtr e), bufState :: BufferState, -- ReadBuffer | WriteBuffer bufSize :: !Int, -- in elements, not bytes bufL :: !Int, -- offset of first item in the buffer bufR :: !Int -- offset of last item + 1 } Data bufRawbufLbufRbufSize

13 Example: memory-mapped files 2.(a) make it an instance of BufferedIO instance BufferedIO MemoryMappedFile where newBuffer m state = do fp <- newForeignPtr_ (mmap_addr m) return (emptyBuffer fp (mmap_length m) state) fillReadBuffer m buf = do p <- readIORef (mmap_ptr m) let l = mmap_length m if (p >= l) then do return (0, buf{ bufL=p, bufR=p }) else do writeIORef (mmap_ptr m) l return (l-p, buf{ bufL=p, bufR=l }) flushWriteBuffer m buf = do writeIORef (mmap_ptr m) (bufR buf) return buf{ bufL = bufR buf } fillReadBuffer returns the entire file! flush is a no-op: just remember where to read from next

14 Example: memory-mapped files 2.(b) make it an instance of IODevice instance IODevice MemoryMappedFile where close = IODevice.close. mmap_fd seek m mode val = do let sz = mmap_length m ptr <- readIORef (mmap_ptr m) let off = case mode of AbsoluteSeek -> fromIntegral val RelativeSeek -> ptr + fromIntegral val SeekFromEnd -> sz + fromIntegral val when (off = sz) $ ioe_seekOutOfRange writeIORef (mmap_ptr m) off tell m = do o <- readIORef (mmap_ptr m); return (fromIntegral o) getSize = return. fromIntegral. mmap_length … etc …

15 Example: memory-mapped files 3.provide a way to create instances mmapFile :: FilePath -> IOMode -> Bool -> IO Handle mmapFile filepath iomode binary = do (fd,_devtype) <- FD.openFile filepath iomode sz <- IODevice.getSize fd addr <- c_mmap nullPtr (fromIntegral sz) prot flags (FD.fdFD fd) 0 ptr <- newIORef 0 let m = MemoryMappedFile { mmap_fd = fd, mmap_addr = castPtr addr, mmap_length = fromIntegral sz, mmap_ptr = ptr } let (encoding, newline) | binary = (Nothing, noNewlineTranslation) | otherwise = (Just localeEncoding, nativeNewlineMode) mkFileHandle m filepath iomode encoding newline Open the file and mmap() it Call mkFileHandle to build the Handle

16 Demo… $./Setup configure Configuring mmap-handle-0.0... $./Setup build Preprocessing library mmap-handle-0.0... Building mmap-handle-0.0... [1 of 1] Compiling System.Posix.IO.MMap ( dist/build/System/Posix/IO/MMap.hs, dist/build/System/Posix/IO/MMap.o ) Registering mmap-handle-0.0... $./Setup register --inplace --user Registering mmap-handle-0.0... $ ghc-pkg list --user /home/simonmar/.ghc/x86_64-linux-6.11.20090816/package.conf.d: mmap-handle-0.0

17 Demo… $ cat test.hs import System.IO import System.Posix.IO.MMap import System.Environment import Data.Char main = do [file,test] <- getArgs h <- if test == "mmap" then mmapFile file ReadWriteMode True else openBinaryFile file ReadWriteMode sequence_ [ do hSeek h SeekFromEnd (-n) c <- hGetChar h hSeek h AbsoluteSeek n hPutChar h c | n <- [ 1..10000] ] hClose h putStrLn "done" $ ghc test.hs --make [1 of 1] Compiling Main ( test.hs, test.o ) Linking test...

18 Timings… $ time./test /tmp/words file done 0.24s real 0.14s user 0.10s system 99%./test /tmp/words file $ time./test /tmp/words mmap done 0.09s real 0.09s user 0.00s system 99%./test /tmp/words mmap $ time./test./words file #./ is NFS-mounted done 10.44s real 0.20s user 0.52s system 6%./test tmp file $ time./test./words mmap #./ is NFS-mounted done 0.10s real 0.09s user 0.00s system 93%./test tmp mmap

19 More examples A Handle that pipes output bytes to a Chan Handles backed by Win32 HANDLEs Handle that reads from a Bytestring/text Handle that reads from text

20 The -1 mile view Inside the IO library – The file-descriptor functionality is cleanly separated from the implementation of Handles: GHC.IO.FD implements file descriptors, with instances of IODevice and BufferedIO GHC.IO.Handle.FD defines openFile, using FDs as the underlying device GHC.IO.Handle has nothing to do with FDs

21 Implementation of Handle data Handle__ = forall dev enc_state dec_state. (IODevice dev, BufferedIO dev, Typeable dev) => Handle__ { haDevice :: !dev, haType :: HandleType, -- read/write/append etc. haByteBuffer :: !(IORef (Buffer Word8)), haCharBuffer :: !(IORef (Buffer CharBufElem)), haEncoder :: Maybe (TextEncoder enc_state), haDecoder :: Maybe (TextDecoder dec_state), haCodec :: Maybe TextEncoding, haInputNL :: Newline, haOutputNL :: Newline,.. some other things.. } deriving Typeable Existential: packs up the IODevice, BufferedIO, Typeable dictionaries, and codec state is existentially quantified Two buffers: one for bytes, one for Chars.

22 Where to go from here This is a step in the right direction, but there is still some obvious ugliness – We haven’t changed the external API, only added to it – There should be a binary I/O layer hPutBuf working on Handles is wrong: binary Handles should have a different type in a sense, BufferedIO is a binary I/O layer: it is efficient, but inconvenient – FilePath should be an abstract type. On Windows, FilePath = String, but on Unix, FilePath = [Word8]. – Should we rethink Handles entirely? OO-style layers: binary IO, buffering, encoding Separate read Handles from write Handles? – read/write Handles are a pain

Download ppt "A Wander through GHC’s New IO library Simon Marlow."

Similar presentations

Ads by Google