4J, K, L, M, N, O Component 2.

4J, K, L, M, N, O Component 2

Assessment Outcomes 4J - Explain the purpose of, and be able to use, a hashing algorithm. 4K - Compare different hashing algorithms. 4L - Explain the use of multi-level indexes. 4M - Explain the techniques used to manage overflow and the need for file re- organisation. 4N - Explain the need for file security, including file backup, generations of files and transaction logs. 4O - Describe the need for archiving files.

Hash file structures In the three methods of file organisation we have looked at so far, serial, sequential and index sequential, there involves varying degrees of searching through other records to get to the one you want. With a hash file structure, you go straight to the record you want! In a hash file, some search data is entered for the system to use to get back the whole record. You then apply a maths algorithm (or formula) to this search data. This transforms the search data into an address or 'hash code'. The computer can then go and get the record!

Hash File Structure: Finding Data Example
A database of pupils is going to be organised using a hash file. This file structure has been chosen because the database will be used to deal with pupil/parent enquiries and so needs to retrieve each record quickly. The secretary dealing with enquiries will type in a pupil's surname to get their data back. The database designer has selected the following maths formula, known as a ‘hash algorithm’ When a surname is typed in, convert each letter of the surname into a number, and then add the numbers together to give an address. Convert letters to numbers using A=1, B=2, C=3 ... X=24, Y=25, Z=26. For example, when the secretary types Jones, this is converted into = 63. The computer goes to memory address 63 and there's the start of the record!

Hash File Structure: Storing Data Example
If you wanted to store ‘Jones’ instead, you would work out the memory address you are going to store the data in by applying the hashing algorithm to the data. When a surname is typed in, convert each letter of the surname into a number, and then add the numbers together to give an address. Convert letters to numbers using A=1, B=2, C=3 ... X=24, Y=25, Z=26. Store the data in the location that equals the total! This is a much-simplified example but does illustrate the basic idea. It is a way of converting a request for a file into an address, so the record can be retrieved immediately, without having to go through other records.

Problem! Multiple Surnames
The one above is very poor indeed because there will be lots of different surnames that give the same memory address! This is called a 'clash' or 'collision'. You need to design a hashing algorithm that minimises clashes because they slow down access times. On the other hand, an algorithm might also spread out the data so much that large areas of storage are used up! Having large areas of storage that aren't used efficiently is known as ‘redundancy’.

Good Hashing Algorithms
A good hashing algorithm will: Minimise clashes. Ensure that the hash codes of data aren't spread too far apart, wasting memory. Be quick to calculate.

Hash Tables We can store data in a number of different types of data structure. One kind of data structure involves creating 'hash tables'. Hash tables involve using a maths formula known as a 'hashing algorithm' to turn data that you want to store (the 'hash key') into a set of numbers (the 'hash code'). What we end up with a table of pairs of data, each pair being made up of a hash key and a hash code. The reason that this is done is that it is a lot quicker to search through hash codes and then find the corresponding hash key than it is searching through the hash keys themselves.

Hash Tables

Multi-Level Indexes Multi-Level Indexes are a indexed sequential file structure where there are more than 1 index for each piece of data. Think about a library.. If the library has 10 books on computing the index could be the single shelf… Shelf 1A.. If the library has 10,000 books on computing it would point to a range of shelves.. These shelves could then have their own index where Computing is split into different sections ‘Programming’, ‘Networks’, etc.

Multi-Level Index: Bank Example
Consider a bank with 100 million customers. They all have their own account number, which is an 8-digit number. The records are organised by putting the account numbers in sequential order. An index is then designed.

Indexes 1 Index 2 Indexes

Indexes 3 Level Index

Overflows The Overflow area is used when a file cannot be stored:
When new files have to be stored in a sequential file, it may not be possible to store them at that moment in time in their correct place. There may not be the space available at the right place in the file. Some files will need to be deleted but can't be deleted straightaway so they are 'marked' to be deleted in the future. Some files are deleted straightaway and leave gaps in the file structure. You may also have the situation where a file is changed, and the change is such that the newly modified file cannot be written back to its original destination.

Managing the Overflow area
Managing the overflow area is an important consideration if file operations are not to quickly slow down and to prevent too many file reorganisations being necessary. When files are created in the first place, they can include some storage redundancy (extra storage space) in anticipation of the need to store more files in the future. This is a technique used to reduce the early need for an overflow area. When used, overflow areas are typically split into blocks. One block is filled up and then once full, another overflow block is linked to it and used. Cellular Splitting Another technique, which is more complex to manage, is that once a block is filled, it is split up into two seperate blocks to create more distributed space. This is called celluar splitting and is more flexible and dynamic than chaining larger blocks together as described before.

Why do we need to organise files?
Over time, the overflow area will increase as more files are placed in it. This will become an unsatisfactory situation over time as it will slow down the system. Searching through data and sorting it will take longer as the records won't be in their most efficient order for fast searches and sorts. The records will be in two different places, and each of them will need to be looked at. In addition, the gaps created when files have been removed will also slow down file operations and clearly, over time, the number of gaps in the file structure is likely to increase. Periodically, a sequential file of data will need to be reorganised. This will involve reorganising and writing the file to a new area.

How to reorganise files?
All of the files from both the original area and the overflow area will have to be transfered to the new file and combined into the correct order. Any files that have been marked for removal will have to be removed. All indexes associated with the data will have to be re-written. The old data file will need to be deleted. The overflow area will have to be cleared out.

File Security This term relates to protecting files from unauthorised users, it covers protecting access to data, picking up accidental or malicious changes to data that compromises data integrity and also the steps taken to protect the systems that data is held on. Why is it important that files are kept secure? Safeguard from malicious attacks Stealing Data – Remember Data is valuable! Using Data to commit crimes: Fraud False Identity

File Backups Have you ever lost work you have done on the computer?
Do you backup your work onto a pen drive every time you do some work on the computer? If you don't, you should! Companies must also back up their work. If you lose your work, you can start again - not much fun but possible. If a company loses files they could go out of business, people could lose livelihoods! Data is valuable to an organisation. It takes companies years to build up a customer base. Getting data into a computer takes time. Businesses today are run on computers.

File Backups Operating Systems usually come with tools that will allow the user to backup their data. Examples of these are: System Images Taking an entire ‘snapshot’ of all of the data on the computer system This can then be rolled back to using a recovery device (Live USB or CD, or Recovery tools) Schedules Backups Automatic backups of data Different types Only backs up files that have changed since the last backup Backup all files every time The last point about back-ups to make is that an untested back-up should not be depended upon. You should periodically test files being backed up to see if the process has been working correctly.

Backups Why? How? Test.

Transaction Logs In the field of computer science, a transaction log or audit trail is a history of actions executed by a system (Database Management System or Operating System). Why is it needed? To see who did what, and when To see when the last backup was When was the last working system?

Archiving Files ARCHIVING FILES IS NOT THE SAME AS BACKING UP FILES
DO NOT THINK THIS YOU WILL FAIL

Archiving files Consider a school:
Each year, new pupils join a school and some pupils leave. If all a school ever did were to input the details of pupils into the system but never remove details, the system's resources such as hard disk space would soon diminish. Therefore, the school should remove pupils who leave. There is a problem, though. You cannot just delete pupils' details from school once they have left. Neither can you destroy invoices. A school is under a legal obligation to keep the records of pupils for many years after they have left.

Archiving files What would happen if a pupil needed a reference?
The school would need to know their data – their results, their behaviour, attendance report, etc. This is where archiving comes in..

How to archive files There should be a written archiving policy in an organisation stating who is responsible for ensuring data is archived, intervals between archiving and so on. Archiving should take place at appropriate time intervals, e.g. every 6 months. Compare this to backing-up files. Redundant data could be copied onto a magnetic tape or other mass storage device because you can store more bytes per unit cost compared to other storage devices. The data may also be compressed so you can fit more of it onto the tape. The archived data is meant to be kept for a long time and doesn't generally need to be easily accessible. The original data can be deleted from the system once it is archived. The magnetic tape should be labelled up and put in a fire safe or somewhere else secure.

Archiving: Summary In summary, archived data is generally data that is unlikely to be needed again but you can't get rid of it for legal or other reasons. You want to remove old data from a system because it ties up system resources. Should you need to get back data that has been archived, it will be possible but may take a little bit of effort. You will have to get the correct archived tape, un-compress the data if it has been compressed and then find the data on a serial file, which can take time.

4J, K, L, M, N, O Component 2.

Similar presentations

Presentation on theme: "4J, K, L, M, N, O Component 2."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

4J, K, L, M, N, O Component 2.

Similar presentations

Presentation on theme: "4J, K, L, M, N, O Component 2."— Presentation transcript:

Similar presentations

About project

Feedback