Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The ISDA Tools Computationally Scalable File Migration Services.

Similar presentations


Presentation on theme: "National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The ISDA Tools Computationally Scalable File Migration Services."— Presentation transcript:

1 National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The ISDA Tools Computationally Scalable File Migration Services to Keep Your Files Current Kenton McHenry Rob Kooper Luigi Marini Michael Ondrejcek

2 The Problem The abundance of file formats is a problem when preserving electronic records Why? Will there be software to load the file in the future? If not will the specification for the format still exist? Was the specification ever available to begin with (closed/proprietary formats)?

3 *.dwg*.max, *.3ds *.blend *.k3d *.w3d *.ma, *.mb, *.mp *.iam *.lwo *.c4d *.pdf (*.prc, *.u3d) *.vtk, *.vtp *.skp

4 Available 3D File Formats…

5 Converting Formats In order to preserve content for future use one option is to convert the file to an open/standardized format that is likely to be supported for some time. Store both this file and the original for provenance Ideally with one file format for a particular content type it will be easy for users to view/use the data.

6 NCSA Polyglot (2009) Conversions service based on utilizing any and all available 3 rd party software Imposed Code Reuse: Re-attaching a programmable interface to compiled software. Scripted operations within software GUI scripting (e.g. AutoHotKey) Created a simple workflow referred to as an Input/Output Graph Compared files before/after conversion to measure information loss Distributed across multiple machines Web access

7 ISDA File Migration Tools Conversion Software Registry Software Servers Polyglot Versus

8 Software that can Convert between Formats There is a lot of software available, each with its own unique capabilities A lot of it is not free It would be expensive to buy a package just to check if it truly is capable of converting between a desired pair of formats How can someone know what software to get for their needs? http://isda.ncsa.illinois.edu/NARA/CSR

9 The Conversion Software Registry (Tool #1) Adobe 3D Reviewer

10 The Conversion Software Registry (Tool #1)

11

12 Input/Output Graphs Adobe 3D Reviewer

13 Input/Output Graphs 3DS Max Adobe 3D Reviewer AutoCAD Blender Cinema 4D K-3D LightWave 3D Maya Wings 3D

14 Shortest conversion path Input/Output Graphs

15 The CSR: I/O-Graphs

16

17

18 The CSR: Searching for Software

19 The CSR: File Formats

20 CSR: Adding Software

21 Software Servers (Tool #2) Imposed Code Reuse: The process of attaching an API like interface to software so that its functionality can be called within new code.

22 Software Servers (Tool #2) Shares the functionality of software over the web In contrast to services which share data: ftpd, nfsd, sambad, httpd Similar to services such as: telnetd, sshd, VNC, rdesktop The main difference is in the interface: Uniform across all software http://host:8182/software/ / / / Simple Widely accessible Capable of being programmed against Allows any desktop application to become a cloud based web service*

23 Software Functionality Sharing

24

25

26

27

28

29

30

31

32

33

34

35

36 #!/bin/bash host="http://141.142.224.231:8182" application="A3DReviewer" task="convert" output="igs" input="stp" url=$host/software/$application/$task/$output for input_file in `ls *.$input` ; do output_url=`curl -s -H "Accept:text/plain" -F "file=@$input_file" $url` output_file=${input_file%.*}.$output echo "Converting: $input_file to $output_file" while : ; do wget -q -O $output_file $output_url if [ ${?} -eq 0 ] ; then break fi sleep 1 done

37 Software Server Robustness Software: 3D Studio Max, Adobe 3D Reviewer, Blender, Google Sketchup, ImageMagick, IrfanView, Microsoft Paint, Microsoft Word, ParaView, VTK Measure throughput of software on a software server TRY TO MAKE IT FAIL!!! Results: Ideal case: 1395 tasks/hour on a 1 core 1GB VM with an average wait of 4.42 s. In a less than ideal case: 945 tasks/hour with an average wait of 11.17 s. Server did not crash!

38 Software Server Robustness We are using GUI based software! Consider command line software as baseline: ImageMagick: 1871 tasks/hour IrfanView: 3163 vs GUI software: 3DS Max: 355 tasks/hour Microsoft Word: 756 tasks/hour How many people would it take using this software for the same throughput?

39 Polyglot (Tool #3) Listens for Software Server broadcasts on the network Catalogues available input/output operations and constructs and I/O-graph Identifies conversion paths between input and output formats Carries out CHAINED conversions

40 Polyglot (Tool #3)

41

42

43

44

45 Versus (Tool #4) Java library/framework for comparing file content Under development: Framework/API designed Distributed architecture RESTful Web Interface http:// /versus/comparisons dataset1, dataset2 adapter, extractor, measure Adding extractors, measures

46 Which conversion preserved the most? Using the light fields measure: Emphasizes shape through silhouettes Adobe 3D Reviewer between *.pdf and *.stp (61.67) Using the spin image measure: Emphasize shape through relative vertex positions Adobe 3D Reviewer between *.obj and *.pdf (59.07)

47 Which is the best format? Within the context of preservation we can define this as the format that retains on average the most information when converted to by other formats. Using the light fields measure: Emphasizes shape through silhouettes *.stp (40.73) Using the spin image measure: Emphasizes shape through relative vertex positions *.stl (34.89) *.stp being a CAD format has more variability in vertex positions due to tessellation

48 ISDA Tools Conversion Software Registry Software Servers Polyglot Versus 3D Utilities Image Utilities CyberIntegrator

49 Acknowledgements This research was partially supported by a National Archive and Records Administration (NARA) supplement to NSF PACI cooperative agreement CA #SCI-9619019 and by NCSA Industrial Partners. Imaginations unbound The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the National Science Foundation, the National Archive and Records Administration, or the U.S. government.

50 The ISDA Tools (Free and Open Source) Image, Spatial, and Data Analysis Group http://isda.ncsa.illinois.edu Kenton McHenry Rob Kooper Michal Ondrejcek Luigi Marini

51 End

52 Measuring 3D Information Loss good… (e.g. 1.0) not so good… (e.g. 0.1) Adobe 3D Reviewer Blender Cyberware PlyTool K-3D NIST VRML/X3D VTK

53 Statistics Use the mean and standard deviation of the vertices to represent the model Simple but fast to compute Sensitive to size and orientation of the model

54 Surface Area Use the sum of face areas to represent the model Also simple and fast to compute Sensitive to size, somewhat sensitive to shape. Will detect loss of faces.

55 Light Fields [Chen, 2003] Compares silhouettes from various viewing angles around a model.

56 Light Fields

57

58

59

60 Fairly fast to compute Sensitive to shape of convex hull, invariant to rigid transformations

61 Spin Images [Johnson, 1999] 2D histograms of the in plane and out of plane distances of vertices neighboring a given vertex. p N q  

62 Spin Images

63

64

65

66

67 Expensive to compute Sensitive to relative vertex position, ignores surface, invariant to rotations and translations

68 STP to X3D to STP X3D WRL STP WRL A3D Reviewer STP Vrml97ToX3d X3dToVrml97A3D Reviewer


Download ppt "National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The ISDA Tools Computationally Scalable File Migration Services."

Similar presentations


Ads by Google