Presentation is loading. Please wait.

Presentation is loading. Please wait.

T215A Communication and information technologies (I) Block 1 Storing and sharing Session 2 Arab Open University - Fall 2012 1.

Similar presentations


Presentation on theme: "T215A Communication and information technologies (I) Block 1 Storing and sharing Session 2 Arab Open University - Fall 2012 1."— Presentation transcript:

1 T215A Communication and information technologies (I) Block 1 Storing and sharing Session 2 Arab Open University - Fall 2012 1

2 Session Outline Part 2: Data storage (Cont.) Storing Data Data Organisation and Management Data Storage Devices Archiving Data Online Data Storage Arab Open University - Fall 2012 2

3 Part 2: Data Storage 3. Storing Data Arab Open University - Fall 2012 3

4 Storing Data [1] We all agree that: Before any data is stored it has to be created Computer data is created using specialised software applications Example: word-processor software: takes the data created by pressing the key and converts it into a text representation on computer screen Different software applications may use different codes to produce a similar representation on screen  Need a way of identifying which software application was used to create the data so that it can be retrieved and interpreted correctly Arab Open University - Fall 2012 4

5 Storing Data [2] This identification depends on the operating system being used, and normally it is supplied automatically by the software application as part of the saving process Information about the software application that created the data, the file, the date and time of saving is provided by “metadata ”  metadata: ‘data about data’ When a file is saved metadata about the file is created Arab Open University - Fall 2012 5

6 3.1 File Types File type: is a part of the filename that is called the file extension, or simply extension “group of letters after the dot.” Interestingly, although early versions of Windows used a file’s extension as metadata to identify the relevant software application, more recent versions do not: the metadata resides in a header inside the file itself. The extension has therefore become metadata intended for the computer’s user rather than for the computer itself! Arab Open University - Fall 2012 6

7 3.2 File Sizes [1] Arab Open University - Fall 2012 7

8 3.2 File Sizes [2] Arab Open University - Fall 2012 8

9 3.2 File Sizes [3] Arab Open University - Fall 2012 9

10 3.2 File Sizes [4] Arab Open University - Fall 2012 10

11 3.3 Data Compression [1] If the camera allows to capture photos in a ‘raw’ format, then the resulting files are very big! This is because every single pixel in the image – and there will likely be more than 10 million of them – needs data stored regarding its colour and brightness.  each ‘raw’ image needs millions bytes of storage space on the memory card. Moving images are even more memory-hungry (they are made up of a sequence of still images)!! Arab Open University - Fall 2012 11

12 3.3 Data Compression [2] The way in which such huge amounts of data are normally handled is to perform an operation called compression on them before they are stored or transmitted. Compression: is the re-coding of data into a more compact form for storage or transmission. This means that smaller data files are achieved at the expense of more processing of the data There are two types of compression: Lossless and Lossy Arab Open University - Fall 2012 12

13 3.3 Data Compression [3] Lossless compression: the compressed data can subsequently be decompressed back to its exact original uncompressed form. Examples: the zipping compression algorithm called ‘deflate’ that is often used on large files before transmission (leading to filenames with the extension zip) the GIF compression algorithm that reduces the size of some types of image files (leading to filenames with the extension gif). Arab Open University - Fall 2012 13

14 3.3 Data Compression [4] Lossy compression: there is some unrecoverable loss of data Examples JPEG compression algorithm that is used on photos and other images (leading to filenames with the extension jpg), MPEG compression algorithm that is used on video (leading to filenames with the extensions mpg or mpeg) MPEG compression algorithm used on sound (leading to filenames with the extension mp3 or mp4). Note: JPEG: stands for “Joint Photographic Experts Group” MPEG: stands for “Moving Picture Experts Group” Arab Open University - Fall 2012 14

15 3.3 Data Compression [5] Why might anyone want to compress data in such a way that some of the original data is irretrievably lost? Answer: loss almost certainly doesn’t matter! Because lossy compression algorithms, which are used on image and sound files, can exploit characteristics of human sight and hearing in such a way that the viewer or listener will not notice, or will only barely notice, any degradation in the image or sound. In fact, the algorithms used in JPEG and MPEG have been carefully designed to have minimal impact on the viewer or listener. Compression designed on this principle is sometimes described as ‘perceptual coding’ because it takes into account people’s perceptions. Arab Open University - Fall 2012 15

16 3.3 Data Compression [6] Arab Open University - Fall 2012 16

17 3.3 Data Compression [7] Arab Open University - Fall 2012 17

18 3.3 Data Compression [7] Arab Open University - Fall 2012 18

19 3.4 File Compatibility [1] Files created by one software application are often not compatible with files produced by another application, even if they are both the same sort of software – for example, both word processors or both graphics software. Why is this? Some of the answer lies in the metadata the application inserts in the file as it saves it. Example: a word processor  metadata is information about how the text is to be displayed fonts to be used, which text is to be bold or italic or coloured, how big the page margins are to be Arab Open University - Fall 2012 19

20 3.4 File Compatibility [2] Each word-processor application has its own format for this metadata This format is not necessarily compatible with the format used by any other word-processor application. Similar ideas apply to audio files, picture files, and so on. The term file format is often used to describe the way the metadata has been formatted and embedded in the file. This problem of non-compatibility has led to a section of the software industry dedicated to producing programs to convert the file format of one application to the format needed by another application. Arab Open University - Fall 2012 20

21 3.4 File Compatibility [3] Unfortunately, such conversion programs rarely provide an exact conversion. This is not because of the deficiencies in the conversion programs, but because different applications use metadata in such completely different ways that an exact conversion is almost impossible. The problem of compatibility is further compounded when software companies make new versions of their application packages with embedded metadata that is not compatible with the metadata used in earlier versions. Example: Microsoft Word 2003’s inability to read the docx documents created by Word 2007!! Arab Open University - Fall 2012 21

22 3.4 File Compatibility [4] A file format for text that is standardised – in theory at least – does exist, in the form of rich text format, or RTF. Unfortunately, a standardised file format like this may not cater for all the features individual software applications provide. Thus it is either not universally used or is modified by a software company to suit its own needs – and hence becomes incompatible again. One software application for text that does not add metadata is a plain text editor. Text editors use only standard ASCII characters – letters, numbers and punctuation marks This means that, in general, files produced by different text editors are compatible with each other, and a file produced by one text editor can be read and edited by another. Arab Open University - Fall 2012 22

23 3.4 File Compatibility [5] ASCII codes ASCII stands for American Standard Code for Information Interchange and is a set of 7-bit binary codes that correspond to 128 characters for the letters of the English alphabet, numbers, punctuation marks and some additional codes used for control purposes. These control codes are invisible: they have no printable equivalent and thus are never displayed on the computer’s screen or printed. Some examples of ASCII code are: 1000001 is A 1100001 is a 0111111 is ? 1111011 is { 0000001 is the ‘start of header’ control code. Arab Open University - Fall 2012 23

24 3.4 File Compatibility [6] Arab Open University - Fall 2012 24

25 3.4 File Compatibility [7] Example: plain text file  Figure 2.4 Size: 1 KB. All of original formatting (Times New Roman, bold, italic and red) had been lost. There were no document statistics, nor was there any additional information about the file. Arab Open University - Fall 2012 25

26 3.4 File Compatibility [8] Example: RTF  Figure 2.5 file size 4 KB. It included a quantity of additional information, most of it in recognisable language (but difficult to identify the purpose of some portions of it) Arab Open University - Fall 2012 26

27 3.4 File Compatibility [9] Example: HTML (especial form of RTF)  Figure 2.7 It is relatively easy to read HTML as the metadata is fairly self- explanatory Metadata items are enclosed in angled brackets,, rather than in the curly brackets, { and }, used in RTF. These metadata items in angled brackets are often called tags. Arab Open University - Fall 2012 27

28 3.4 File Compatibility [10] Example: native format (Microsoft Word’s)  Figure 2.8 file size 24 KB When viewed it in the text editor most of it has meaningless (for us!) symbols and sections of blank space. Relatively easy to pick out the original text as this appeared together without the interspersion of any symbols or formatting information, unlike the RTF version. Very hard (if not impossible) to pick out the same kind of detail that would been able to read in the RTF file. Arab Open University - Fall 2012 28

29 3.4 File Compatibility [11] Example: native format (Microsoft Word’s) Arab Open University - Fall 2012 29

30 Part 2: Data Storage 4. Data Organisation and Management Arab Open University - Fall 2012 30

31 Data Organisation and Management In order to be able to work efficiently, it’s not enough just to be able to store data Users need to be able to retrieve it as well  need to organise the stored data in some coherent way that makes sense to the user. Windows, in common with most operating systems intended for desktop and notebook computers, offers help in two principal ways, both via the file manager. Allows users to organise files into folders, sub-folders, sub-sub- folders, and so on  Create a hierarchy of folders Allows users to search for files by name, by date last used, by some text in the file, and so on Arab Open University - Fall 2012 31

32 4.1 Hierarchical file structures: folders [1] Hierarchical diagram: sometimes called a tree diagram because the way it spreads out resembles the roots of a tree spreading underground Arab Open University - Fall 2012 32

33 4.1 Hierarchical file structures: folders [2] An important point about file organisation is that users are not free to organise their files in just any way; they have to fit with the ‘rules’ laid down by the operating system on their computer. The operating system builds file structures in accordance with a template that it expects both users and all application software to use. Arab Open University - Fall 2012 33

34 4.1 Hierarchical file structures: folders [3] 1- The Windows folder or System folder holds many of the programs and other associated files that the system needs in order to operate. Often these are ‘hidden’ in the sense that they do not show up when you use your file manager, nor do you have ready access to them 2- The Program Files or Applications folder is created to house many of the different programs to install and run on a computer 3- The Documents folder is the default folder for the data files (holds documents, images and media files, and so on.) Arab Open University - Fall 2012 34

35 4.1 Hierarchical file structures: folders [4] Linux doesn’t organise files in quite the same way as do Windows and Mac OS X It calls folders ‘directories’. But like Windows and Mac OS X, Linux has folders: for system files (/bin, /usr/bin, /sbin, /usr/sbin) for application files (/usr) and for a particular user’s data files (typically /home/username). Arab Open University - Fall 2012 35

36 4.2 Searching for files Even with a workable hierarchy of folders it’s not always easy to find a particular file  This is where searching can be helpful For example, a file manager can use the metadata it holds about files to sort the files in a particular folder in different ways in alphabetical order reverse alphabetical order in date order of last modification in reverse date order of last modification, or in size order (largest or smallest first, as preferred) by type. By text contained in the files Arab Open University - Fall 2012 36

37 4.3 Databases [1] A database is a collection of individual records about items (such as people, geographical areas, warehouse components, etc.) where the records are organised in such a way that retrieval of data about groups of items is straightforward. NOTE: we are not talking here about relatively simple personal databases such as you might have on your computer!  We consider the sort of complex databases that are found in mid-size and large organisations. Such databases hold very large amounts of data and are accessed by many people. Arab Open University - Fall 2012 37

38 4.3 Databases [2] What requirements might such an organisation have of its database? 1- Database must be able to hold many items of data about many individual people or objects, each of which must be uniquely identifiable. (Each individual data item is called a field, and the collection of fields for an individual person or object is called a record.) 2- Data in the database must be readily updatable 3- Users must be able to pull out data from a variety of combinations of fields 4- Users must be able to have access rights to subsets of the individual fields within a record according to their role in the organisation 5- Database must be kept secure from all those with no right to access it, and from careless or malicious use from those who do have access rights Arab Open University - Fall 2012 38

39 4.3 Databases [3] Metadata and searching Metadata can be used to facilitate searching for files on a hard disk. It can also be used in other sorts of searches. One example relates to photographs and other images, which are increasingly saved with metadata known as EXIF. This metadata contains keywords that can be used for searching. Another example relates to HTML files on the Web, which incorporate metadata in the form of tags that give information about the contents of the page, for example keywords. Such tags can help search engines to create a database of information they can use to offer relevant pages during a search. Arab Open University - Fall 2012 39

40 4.3 Databases [4] Why the sort of hierarchical file storage isn’t well suited to a database ? One very significant reason is that the records in a database, and the fields within a record, aren’t in any way hierarchical While it is possible to impose some sort of hierarchy on them, it would be making the data to be stored fit the requirements of the storage scheme rather than designing a storage scheme to fit the needs of the data to be stored. Arab Open University - Fall 2012 40

41 4.3 Databases [5] In practice, databases use files based on tables rather than on single records, but the data in these tables is also linked appropriately. It is this linking that is the key notion in modern databases, and databases that use linked tables are called relational databases. By suitable linking of the tables, duplication of records can be avoided, thus eliminating possible problems with updating The way the records in the tables are organised and linked, and the tables stored, is so important  a program called a database management system (DBMS) is used for the task. Arab Open University - Fall 2012 41

42 4.3 Databases [6] Database management system acts as an interface between users and the database itself It can be used to enforce rules about who has access to what data, and who can update the data It can also be designed to prevent careless or malicious misuse of the database by those permitted to use it In summary, a relational database with a suitable management system fulfils the requirements: ability to hold large numbers of records, each of many fields updatability ability to combine fields in various ways in a query access rights protection from abuse Arab Open University - Fall 2012 42

43 Part 2: Data Storage 5. Data Storage Devices Arab Open University - Fall 2012 43

44 Data Storage Devices [1] Example of first computers, back in the early 1980s 4 KB of ROM 1 KB of RAM required a connection to a TV set to provide an output display It had no internal long-term memory for the user’s data (any data in the RAM was lost on power-down) but it was possible to connect it to an external audiocassette recorder to store data on magnetic tape!! Arab Open University - Fall 2012 44

45 Data Storage Devices [2] ROM stands for read-only memory Its contents can be read by a computer’s processor but cannot be altered by the processor. A ROM’s contents are non-volatile: that is, they are retained permanently and are not lost when the computer is powered down. In a desktop or notebook computer the ROM contains the program the computer needs on start-up (the so-called boot-up program, or boot program). In an embedded computer in a device such as a controller for a central-heating system there will only be one program, or perhaps a small collection of programs, and the ROM will contain it. Arab Open University - Fall 2012 45

46 Data Storage Devices [3] RAM stands for random-access memory The computer’s processor can both read this memory’s contents and alter them. A RAM’s contents are volatile and so the contents are lost when the computer is powered down. In a desktop or notebook computer there will be large amounts of RAM, probably several gigabytes. In an embedded computer there will normally be very much less, sometimes as little as just a few kilobytes in simple systems. In the case of desktop and notebook computers, RAM and ROM together comprise what is known as the computer’s main memory. Arab Open University - Fall 2012 46

47 Data Storage Devices [4] Secondary memory is used for long-term storage Hard disk DVD CD-RW drive, USB ports to connect memory devices in the form of external hard disks or USB flash Arab Open University - Fall 2012 47

48 5.1 Magnetic storage devices [1] Hard disks, in common with the now obsolescent floppy disks, are examples of magnetic storage devices A disk has a series of concentric circular tracks, and files are stored on these. Retrieving a file is thus a matter of spinning the disk under a read/write head that is moved in towards the centre or out towards the perimeter as necessary, a much quicker process. Arab Open University - Fall 2012 48

49 5.1 Magnetic storage devices [2] Magnetic disks used for data storage are coated with a thin layer of magnetic material, Data itself is stored as magnetisation patterns on circular tracks in this thin layer. The read/write head does not come into contact with the disk  held at a very small distance above the disk One area of the disk is used for a master file table to link each file to the portion of the disk where it is stored. Thus when you ask your computer to retrieve a particular file from a disk, your operating system will first consult the master file table to determine where the file is, and then instruct the head to move to that area of the disk so the file can be read. To facilitate this process, the tracks are numbered, and each track is divided into sectors, also numbered. Typically, each sector can hold 512 bytes Arab Open University - Fall 2012 49

50 5.1 Magnetic storage devices [3] Similarly, when you save a file, the operating system will consult the master file table to find the necessary free space and then instruct the head to move appropriately. It’s important to realise that when a file is deleted it is only its entry in the master file table that is deleted. The contents of the file itself remain in their tracks and sectors until some other data overwrites them (and even then traces of the previous pattern may remain). This explains how computer forensic experts are able to read files that the computer user believes to be deleted!! Arab Open University - Fall 2012 50

51 5.1 Magnetic storage devices [4] Over the years the quantity of data that can be stored on a single side of a magnetic disk has increased, as the technologies for storing data in smaller and smaller areas and for locating the head more and more precisely have developed The very first hard disks installed in IBM PCs back in 1981 had a capacity of 10 MB! And NOW …….?!! The most common diameter for hard disks in desktop computers is 3.5 inches, but 2.5-inch disks are beginning to replace them (In notebook computers  1.8-inch disks) Arab Open University - Fall 2012 51

52 5.2 Optical storage devices [1] Compact discs (CDs) and digital video or digital versatile discs (DVDs) are examples of optical storage devices used as secondary memory devices for computers. ‘disc’ with a ‘c’ is normally used for optical storage devices ‘disk’ with a ‘k’ is normally used for magnetic storage devices Data is stored on a track on an optical disc and is accessed by a sensor head moving over the spinning disc Optical discs rely on the optical properties of the disc’s material Data is stored as a sequence of small pits in a thin metal layer that is arranged in a long spiral track that runs from the centre of the disc towards the edge and is protected by a thin transparent layer of plastic. This contrasts with a magnetic disk, where there are many concentric tracks! Arab Open University - Fall 2012 52

53 5.2 Optical storage devices [2] Arab Open University - Fall 2012 53

54 5.2 Optical storage devices [3] To read the data on a CD-ROM, a small laser beam is focused on the track as the disc rotates. This beam detects the change in reflectivity at the transitions from no-pit to pit and from pit to no-pit. Such transitions are interpreted as 1s; and the absence of transitions as 0s Arab Open University - Fall 2012 54

55 5.2 Optical storage devices [4] Some elaborate timing helps to ensure that the length of a pit, or of a gap between pits, is interpreted as the right number of 0s. In addition, the raw data is encoded in such a way that there can never be less than two consecutive 0s, nor more than ten. This encoding adds additional data to the data to be stored – data that is redundant in terms of its contribution to the stored data but necessary in terms of the ability of the CD- ROM drive to read the stored data accurately. Arab Open University - Fall 2012 55

56 5.2 Optical storage devices [5] DVDs store data in a similar way to CD-ROMs, but have a larger capacity The first DVDs had a maximum capacity of around 4.7 GB This increase over CD-ROM capacity was achieved by: having a narrower track with more turns on it; permitting shorter pits; using more efficient coding schemes for storing the data. More recent DVDs have used both sides of the disc for storing data, and also have used two recording layers on each side (with a laser beam that can be refocused as necessary)  Maximum capacity of around 17 GB! One notable development is the shift from red laser light to blue laser light Blue light has a shorter wavelength  enables the laser beam to detect smaller features on the surface of the disc. This has enabled the pits to become both shorter and narrower. Blu-ray discs had a maximum capacity of 50 GB! Arab Open University - Fall 2012 56

57 5.2 Optical storage devices [6] Future Project: holographic discs! Initial capacities of hundreds of gigabytes are suggested, potentially rising to several terabytes!! Here holographic principles are used to store the data. Parallel to the development of the capacity of optical discs have been developments that have provided a means to write data to a disc as well as to read data from it. Instead of representing the data as indentations on the reflective surface, they use a material that can exist in two different states, each of which has different reflective properties. Arab Open University - Fall 2012 57

58 5.3 Solid-state storage devices [1] Term ‘solid-state’ can be used either to refer to electronic devices made from semiconductors such as silicon or to indicate an absence of moving parts. In the case of solid-state memory devices, both of these hold: they are semiconductor devices and they have no moving parts. Advantage of the absence of moving parts 1- solid-state storage devices are less susceptible to mechanical failure 2- more resistant to shock and vibration 3- the read/write access times can be faster because there is no inbuilt delay while a disk is spun and a read/write head moves to locate the correct position 4- lower power consumption Arab Open University - Fall 2012 58

59 5.3 Solid-state storage devices [2] Solid-state storage devices (or flash memory) are small and lightweight, described as a ‘read/write memory’ Consist of a group of memory cells in which data is stored and from which data is read Single-level cell (SLC) devices store just one bit in each cell Multi-level cell (MLC) devices are capable of storing multiple bits per cell The trade-off for this higher capacity is increased access time, higher power consumption and lower reliability than their SLC cousins, though manufacturing cost per megabyte is lower. Unlike other types of memory, however, where a writing operation can change a previously stored 1 to a 0 or a previously-stored 0 to a 1, in flash memory a writing operation can only change a 1 to a 0. Arab Open University - Fall 2012 59

60 5.3 Solid-state storage devices [3] So, How are the 1s in the new data to be stored? Previously-stored data must be erased and all cells reset to 1 by applying a high-voltage electrical pulse to a block of cells  new data can be written to this block. Ultimately, this use of a high-voltage pulse to reset a block of cells to 1 causes damage to the structure of the cells where the data is stored and limits flash memory’s lifetime to a finite number of write cycles. Flash memory devices need a source of power during both read and write operations. USB ports are designed to be able to supply small amounts of power to the devices connected to them. Arab Open University - Fall 2012 60

61 Part 2: Data Storage 6. Archiving Data Arab Open University - Fall 2012 61

62 Archiving Data Governments now place requirements on many types of organisation to keep various types of data for minimum periods  this implies the need for reliable archiving! It’s important to understand that archiving is different from backing up  not synonymous Backup is a temporary copy of a record or data set for the purpose of data protection or disaster recovery An archive is a long-term or permanent copy of a record or data set for the purpose of satisfying records management, IT cost control, regulatory compliance or litigation support requirements Arab Open University - Fall 2012 62

63 6.1 Device characteristics Some of characteristics of secondary memory devices that would be desirable for the reliable long-term storage of data: 1- The material from which the devices are made should not degrade with time 2- The data should not leak away slowly with time 3- The device should be able to retain data when subjected to adverse physical conditions, such as extremes of temperature, mechanical shocks and vibration, and so on. longevity of data on optical discs is strongly affected by high temperatures repeated erase processes in order to write new data on solid-state devices (USB flash) drives will affect their longevity Arab Open University - Fall 2012 63

64 6.2 Data characteristics Suppose that some digital data is stored on a suitable device under near-ideal conditions, and device readers remain available, Does that mean that people will be able to use the data in 50 years, in 100 years, in 5000 years? Characteristics of the data itself may prevent people from being able to use the data. If the file formats used to store the data can no longer be decoded by the computers of the future then the data, although perfectly physically preserved and still physically accessible, may be meaningless. Arab Open University - Fall 2012 64

65 Part 2: Data Storage 7. Online Data Storage Arab Open University - Fall 2012 65

66 Online Data Storage [1] Some commercial companies, in particular ISPs (internet service providers), are now offering an online storage service for individuals’ files. That is, you upload your files to the company’s storage facility, which you then regard as your primary storage area, and the company undertakes to store the files safely for you. These companies need large numbers of networked computers  servers, and they are often housed together in a single secure area. Group of servers is known as a server farm (or data centre.) Arab Open University - Fall 2012 66

67 Online Data Storage [2] Storing one’s data online is an example of what is coming to be known as cloud computing. The use of ‘cloud’ in ‘cloud computing’ derives from the fact that networks are often depicted with a cloud shape in diagrams  a graphic way of naming what is in fact network computing! Arab Open University - Fall 2012 67

68 Online Data Storage [3] Issues to be considered for online storage How long does the facility undertake to store your data for? Where will the data be stored? If it will be stored in a different country from the one you live in, there may be different data protection and privacy laws: are they adequate? What compensation will you have if a fault develops and some of your data is lost? What happens if you and the service supplier are in dispute, including which country’s laws would prevail if you are not both in the same country? What happens if you wish to cancel your account and move your data elsewhere? Arab Open University - Fall 2012 68


Download ppt "T215A Communication and information technologies (I) Block 1 Storing and sharing Session 2 Arab Open University - Fall 2012 1."

Similar presentations


Ads by Google