Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Similar presentations


Presentation on theme: "1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions."— Presentation transcript:

1 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions

2 Review of reading Information Integration –Social issues in information discovery and sharing –Information integration in geo-informatics –http://cseweb.ucsd.edu/~goguen/projs/data.html –http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839387/ Information Life Cycle –MSDN Information Life Cycle –Information Life Cycle definition and context –http://www.computerworld.com/s/article/79885/The_new_buzzwords_Information_lifecycle_management –http://www.databasejournal.com/sqletc/article.php/3340301/Database-Archiving-A-Critical-Component-of-Information- Lifecycle-Management.htm –http://en.wikipedia.org/wiki/Information_Lifecycle_Management –http://msdn.microsoft.com/en-us/library/bb288451.aspx Information Visualization –http://mastersofmedia.hum.uva.nl/2011/04/18/the-simple-ways-of-information-visualization/comment-page-1/ –http://www.siggraph.org/education/materials/HyperVis/domik/folien.html –http://www.visual-literacy.org/periodic_table/periodic_table.html Information model development and visualization –http://www.acm.org/crossroads/xrds7-3/smeva.html Outside the current box –Peter Fox and James Hendler, 2011, Changing the Equation on Scientific Data Visualization, Science, Vol. 331 no. 6018 pp. 705-708, DOI: 10.1126/science.1197654 online at http://www.sciencemag.org/content/331/6018/705.full or see: http://escience.rpi.edu/publications/visualization/fox_hendler_science2011.html 2

3 Logical Collections The primary goal of a Management system is to abstract the physical collection into logical collections. The resulting view is a uniform homogeneous collection. Note the analogy with logical models and information integration: so EARLY ON –Identifying naming conventions and organization –Aligning cataloguing and naming to facilitate search, access, use (who uses?) –Provision of **contextual** information 3

4 Physical Handling Map between physical and logical. Where and who does it come from? –Is there a transfer into a physical form? –Is it backed-up, archived, cached? … –What formats? –Naming conventions – do they change? Note analogy to physical models 4

5 Interoperability Support 5

6 Security Access authorization and change verification. This is the basis of trusting your information. 6

7 Ownership Who is responsible for quality and meaning 7

8 Metadata Recall metadata are data about data. Metainformation? 8

9 Persistence Deployment of mechanisms to counteract technology obsolescence. 9

10 Discovery Ability to identify useful relations and information inside the collection More on this later in this class 10

11 Dissemination 11 Mechanisms to make aware the interested parties of changes and additions to the collections. Do you rely on information retrieval? The Web?

12 Summary of Information Management Creation of logical collections Physical handling Interoperability support Security support Ownership Metadata collection, management and access. Persistence Knowledge and information discovery Dissemination and publication 12

13 Note for your project writeup! Information management! Cover the 9 areas. 13

14 Information Workflow What is a workflow? Why would you use it? Key considerations for information, cf. data Some pointers to workflow systems 14

15 15 What is a workflow? General definition: “series of tasks performed to produce a final outcome” (taxes?) Information workflow – involves people but potentially want to –Automate jobs that a person traditionally performed manually –Process large volumes of information faster than one could do by hand NB difference from data workflows – it reaches out to encompass the user (e.g. ‘unrecorded actions’)

16 16 Background: Business Workflows Example: planning a trip Need to perform a series of tasks: book a flight, reserve a hotel room, arrange for a rental car, etc. Each task may depend on outcome of previous task –Days you reserve the hotel depend on days of the flight –If hotel has shuttle service, may not need to rent a car Prior information, experience, preferences…

17 Tripit.com? 17

18 18 What about information workflows? Perform a set of transformations/ operations on information source(s) Examples –Generating images from raw data –Identifying areas of interest from a large information source (e.g. word cloud) –Classifying a set of objects –Querying a web service for more information on a set of objects –Many others…

19 19 More on Workflows Can process many information types: –Archives –Web pages –Streaming/ real time –Images –Semiotic systems Robust workflows depending on formal (concept and logical) models of the flow of information among components May be simple and linear or very complex

20 20 Challenges Questions: –What are some challenges for users in implementing workflows? –What are some challenges to executing these workflows? –What are limitations of writing a program? Mastering a programming language Visualizing workflow Sharing/exchanging workflow Formatting issues Locating datasets, services, or functions

21 21 Workflow Management Systems

22 22 Benefits of Workflows Documentation of aspects of analysis Visual communication of analytical steps Ease of testing/debugging Reproducibility Reuse of part or all of workflow in a different project

23 23 Additional Benefits Integration of and between multiple computing environments ‘Automated’ access to distributed resources via other architectural components, e.g. web services and Grid technologies System functionality to assist with information integration of heterogeneous components and source

24 Why not just use a script? Script does not specify low-level task scheduling and communication May be platform- dependent Can’t be easily reused May not have sufficient documentation to be adapted for another purpose 24

25 Why can a GUI be useful? No need to learn a programming language Visual representation of what workflow does Allows you to monitor workflow execution Enables user interaction (though not necessarily collaboration) Facilitates sharing of workflows 25

26 Some workflow systems Kepler SCIRun Sciflo Triana Taverna Pegasus Some commercial tools: –Windows Workflow Foundation –Mac OS X Automator http://www.isi.edu/~gil/AAAI08TutorialSlides/5-Survey.pdf http://www.isi.edu/~gil/AAAI08TutorialSlides/ See reading for this week 26

27 Discovery How does someone find your information? How would you provide discovery of –collections –files –‘bits’ How would you find -> 27

28 Discovery o Search (Federated Search) o Helped by o Folksonomies (user contributed) o Intelligent Agents o Search Engines o Taxonomies o Find photos of Kim o Boy or girl? 28

29 Use cases Find a sound recording of a swallow. Excuse me? 29

30 Use cases Find a sound recording of an African Swallow Find a sound recording of a bird that sounds like an African Swallow Media types – how can you discover them? 30

31 Use cases Find the movie that Jean Tripplehorn first starred in/ that was her most successful/ was lead actress? Has anyone gene sequenced a mouse? Find images of primary productivity in the North Atlantic Discovery can often involve information integration (or is it *almost always*?) 31

32 32 Three level ‘metadata’ solution for DATA Level 1: Data Registration at the Discovery Level, e.g. Volcano location and activity Level 2: Data Registration at the Inventory Level, e.g. list of datasets, times, products Level 3: Data Registration at the Item Detail Level, e.g. access to individual quantities Ontology based Data Integration Using scientific workflows Earth Sciences Virtual Database A Data Warehouse where Schema heterogeneity problem is Solved; schema based integration Data DiscoveryData Integration A.K.Sinha, Virginia Tech, 2006

33 33 Three level ‘metadata’ solution? Level 1: Registration at the Discovery Level, e.g. Find the upper level entry point to a source Level 2: Registration at the Inventory Level, e.g. list of datasets, using the logical organization Level 3: Registration at the Item Detail Level, i.e. annotation e.g. tagging Integration using mapping management Catalog/ Index Schema based integration Information Discovery Information Integration A.K.Sinha, Virginia Tech, 2006

34 Information discovery What makes discovery work? –Metadata –Logical organization –Attention to the fact that someone would want to discover it –It turns out that file types are a key enabler or inhibitor to discovery –Result ranking using *tuned* algorithm What does not work? –Result ranking algorithms that depend on unconventional information types (icon, index, symbol) 34

35 Federated search “is the simultaneous search of multiple online databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.” wikipedia Libraries have been doing this for a long time (Z39.50, ISO23950) Key is consistent search metadata fields (keywords) E.g. Geospatial One Stop http://www.geodata.govhttp://www.geodata.gov 35

36 Smart search Semantically aware search, e.g. http://noesis.itsc.uah.edu, http://eie.cos.gmu.edu (Water -> Semantic Search) http://noesis.itsc.uah.edu http://eie.cos.gmu.edu Faceted search, e.g. mspace (http://mspace.fm ), exhibit (MIT), S2S (RPI; http://aquarius.tw.rpi.edu/s2s )http://mspace.fm http://aquarius.tw.rpi.edu/s2s 36

37 NOESIS 37

38 Faceted search 38 logd.tw.rpi.edu

39 Summary - discovery Useful to write a few discovery use cases to drive how your design is developed Evolution of your role in facilitating discovery and what/ how others implement access to your information 39

40 Reading for this week Is retrospective 40

41 Check in for Project Assignment Analysis of existing information system content and architecture, critique, redesign and prototype redeployment Or a new use case, development, etc. 41

42 What is next April 16 – Information Audit April 23 – April 30 – May 6 – final project presentations 42


Download ppt "1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions."

Similar presentations


Ads by Google