1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by Moshell et al. Imagery is fromWikimedia except where marked with *.
-2 - File System Organization recovermyfiles.com * Disks have sectors; each sector has an address (integer) * A file is a collection of sectors. They can be contiguous or fragmented. * To find the sectors comprising a file, we need a directory. * The directory system records which sectors belong to each file. * The Operating System has software to manage directories & files. planetoftunes.com
-3 - Formatting a Disk * factory (low level) format: - timing tracks, etc. "marks in the parking lot" - usually not re-doable * local reformatting: * Check for read/write errors * Mark good sectors and bad ones * Create a list of available sectors * Set up file structure: - directory - boot sector (for bootable drives) stripespls.com
-4 - File System Organization recovermyfiles.com * A simple (conceptual) architecture: Directory: * at sectors 22010, we have records: So the file is a linked list (like a treasure hunt) through the disk's sectors. (Not all disks are organized this way.) planetoftunes.com DirnumFilenameFilesizeHeadsector 1addresses.doc employees.doc payroll.xls etc blockdirnumnextblockdata Adams, John \t 222 West Wilson, Steve \t 333 East...
-5 - File System Errors * Disk drive hardware checks parity when reading sectors * If a parity error occurs, data may have been lost * Usually this just reports a failure to the OS and you're stuck. However – the actual disk drive hardware can probably still read the data; it just doesn't LIKE it. So, specialized software can sometimes get this "bad checksum" data and display it... we discuss this shortly.
-6 - File System Organization recovermyfiles.com * Deleting a file: The OS keeps an available sector list of sectors that can be reused. To DELETE a file, the system just changes its first and last links. (Think of out-of-service boxcars). The data is not gone, it's just unlinked. It will be overwritten, when (and if) the OS needs more space. tdc.ca
-7 - Losing and Recovering Data recovermyfiles.com Now what if the directory or a sector gets screwed up? a) software error: erase the pointer or link to a file. or b) hardware error: part of directory or sector gets corrupted The data is still out there, but OS can't find it. If you can directly READ THE SECTORS, you will find broken strands of spaghetti... with clues in 'em. restaurantwidow.com
-8 - Recovering Data What clues exist? Links (obviously) if it's a linked system Try to reconstruct the files, or fragments of them Directory item numbers, if these exist Try to "work backwards" and reconstruct the directory The data itself (e. g. search for "Adams") Use syntactic knowledge to match up partial sentences in blocks. Which block might match that one?.. and we re nguins live in Antarc... spect the opinions of \t 333.9e14...
-9 - Recovering Data If you have 'bad sectors' (i. e. bad checksums) Read the data and override the parity error messages Humans are normally required to look at the data and piece it back together. Success is not guaranteed. Formatting a drive writes 0 in all the sectors. SOME claim they can recover what was there before (maybe NSA can?) But it is not a high-percentage bet.
-10 - Forensics: Finding Hidden Stuff * simplest cases: just "erased" your files? - straightforward disk recovery may work. * the famous photocopier story. - copiers have hard drives and remember what was copied shtml * RAMsticks are just like hard drives; "delete" does not empty. (Nonvolatile RAM versus volatile RAM. Why isn't it ALL nonvolatile?) macforensiclabs.com
-11 - Forensics: Finding Hidden Stuff * virtual memory: copies part of your RAM into hard drive on computer. * those images may include print queues and other information that can be recovered. * backup systems may not have been reformatted even if the main hard drive was reformatted. * offsite backup probably was NOT reformatted; old sectors may have copies of data you wanted to make disappear. macforensiclabs.com
-12 - File Structures: Summary * vocabulary terms throughout lecture * backup/archive/redundant storage * criteria for choice of offsite backup * understand and explain disk organization * understand how disk errors occur * analyze what data could be recovered from a particular accident * discuss forensic issues concerning disk data erasure and recovery motifake.com
-13 - Cloud Computing and Digital Asset Management First let's look at the Cloud - Where did it come from? - What is it? - How can it help me? - What new skills will I need to use it? - What effect does Cloud have on DAM?
-14 - As of the Year Most Internet Service Providers sold (... rented...) dedicated hosting One website: delivered by 1 computer mystore.com shared virtual hosting yourstore.com N websites each got 1/Nth computer histore.com herstore.com
Built giant 'ad hoc' systems with thousands of CPUs and petabytes of storage phaseoneenterprises.com And a few giants (Yahoo, Google, Amazon)
And a few giants (Yahoo, Google, Amazon) Built giant 'ad hoc' systems with thousands of CPUs and petabytes of storage. Amazon noticed... less than 10% of their capacity was being used most of the time phaseoneenterprises.com
-17 - en.wikimedia.org... and in 2006 launched Amazon Web Services The 'utility model': power plants have capacity to meet AVERAGE demand and so can deliver UNLIMITED* power to some customers when needed. (*"Unlimited" as long as << total capacity)
Astronomers worldwide now schedule time on big telescopes through the Internet and don't have to go to a cold mountaintop and stay up all night to capture imagery as.utexas.edu The Shared Telescope Model
NASA released NEBULA in 2008, to share research computers instead of building additional data centers. NEBULA is an open source cloud management system The Shared Computing Model
Before PCs, we programmed on punch-cards as.utexas.edu... resembles the old Mainframe Timeshare model
Before PCs, we programmed on punch-cards and thought it was a great INNOVATION when time-sharing became possible as.utexas.edu... resembles the old Mainframe Timeshare model
In 1965 this was SCARCE and we were NUMEROUS (relatively) (Skilled specialists who wanted to use computers) as.utexas.edu But with one fundamental difference: redlinecs.com.au
In 2012 this is ABUNDANT and we are EVERYONE allthingsdistributed.com But with one fundamental difference: reuters.com
... may reduce your company's IT costs * software is expensive – so RENT it * hardware is expensive to update – so RENT it * buildings are expensive – so share them * land is expensive – build in rural areas relies on fast, reliable networks
1. Agility through dynamic provisioning - Order up "supercomputer for an hour" 2. API Accessibility - Your program can specify the needed QOS* QOS: Quality of Service: - Maximum guaranteed latency (e. g. <1ms) - Minimum guaranteed CPU (e. g. >1 petaflop) Key Cloud Concepts:
"Floating Point Operations" like x=239.44* per second Math models (physics, stock market, statistics) may need tera = billion*billion of flops giga = 10 9 tera = peta = exa = What's a flop?
1. Agility through dynamic provisioning - Order up "supercomputer for an hour" 2. API Accessibility - Your program can specify the needed QOS* 3. Virtualization - You "THINK" you have your own machine - Protection models don't need to be reinvented Key Cloud Concepts:
SECURITY. (I know this guy) One solution (for larger firms): Build your own Cloud /95900/ / One Key Cloud Concern:
bigbird.com cookie.com elmo.com kermit.com piggie.com Quickly, web-hosts realized that they could virtualize their service
-30 - Software as a Service (SaaS) The 800 pound anthropoid: Salesforce.com sales cloud (CRM systems) force.com – build your own pin.primate.wisc.edu
-31 - Digital Asset Management in the Cloud pin.primate.wisc.edu 1. Simple: Dropbox 2. Specialized for software: Github 3. Rich metadata -> DAM (e. g. AlienBrain) Media Valet - Widen Fordela
-32 - Digital Asset Management in the Cloud pin.primate.wisc.edu 1. Simple: Dropbox 2. Specialized for software: Github 3. Rich metadata -> DAM (e. g. AlienBrain) Media Valet - "CMIS compliant?"
-33 - Content Management Interoperability Standard roperability_Services CMIS is an open standard that defines how DAM systems can manage metadata ("generic properties") for files and folders. Adobe, HP, IBM, Microsoft, Oracle + + +
-34 - Digital Asset Management in the Cloud pin.primate.wisc.edu 1. Simple: Dropbox 2. Specialized for software: Github 3. Rich metadata -> DAM (e. g. AlienBrain) Media Valet Widen - Fordela
-35 - Digital Asset Management in the Cloud pin.primate.wisc.edu 1. Simple: Dropbox 2. Specialized for software: Github 3. Rich metadata -> DAM (e. g. AlienBrain) Media Valet Widen Fordela - VIDEO focus (started by LucasArts veterans)
-36 - Choosing a DAM System pin.primate.wisc.edu Here's a logically organized Buyer's Guide management-buying-guide-1.html
-37 - Choosing a DAM System pin.primate.wisc.edu Here's a logically organized Buyer's Guide management-buying-guide-1.html End of lecture... End of lectureS. When we return... Project Show-and-tell!