Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Organizing and modelling data Gert Jan Hofstede Teacher Course Data Management, INF-21306 Sjoukje Osinga From Information Technology Group www.wageningenur.nl/inf.

Similar presentations


Presentation on theme: "1 Organizing and modelling data Gert Jan Hofstede Teacher Course Data Management, INF-21306 Sjoukje Osinga From Information Technology Group www.wageningenur.nl/inf."— Presentation transcript:

1 1 Organizing and modelling data Gert Jan Hofstede Teacher Course Data Management, INF-21306 Sjoukje Osinga From Information Technology Group www.wageningenur.nl/inf

2 2 Data Management Course topics: Why manage data? What is a database? Database design (week 2) Advanced SQL (week 3) Architectures (week 4) Managing (week 5) Case presentations (week 6).... and some tips & tricks INF-21306

3 3 Why manage data? The organization that loses its memory, loses its life Data to manage are everywhere! –Experimental data, model inputs, model outputs…..but can all this be managed? –most of it just grows unmanaged –some of it is managed with spreadsheets or databases

4 4 Why manage data (2)? Research: –results are hidden in piles of paper –data files lack documentation –costly or impossible to use existing data Organizations: –redundancy leads to errors –data structures are stable over time –A good design saves programming cost

5 5 What is a database? Theoretically: –a coherent collection of data –searchable as one whole –by many people In practice: –a collection of related 2-dim tables –rows are “things” –columns are “attributes” –special software “DBMS” is needed

6 6 A database table column row Typically, a database (DB) has many tables. Each table has its own type of data. Knowing how to organize a DB into tables is an art: Database design.

7 7 Database: more than tables. It starts with facts, e.g. you want to remember the fact that one employee can be another’s boss. That leads to a one-to-many relationship.

8 8 Another such fact: “one or more employees work for one department” I see a third relationship...

9 9 The Information Systems cycle and models process models data models big pictures

10 10 Database design in research Have a research question Try out; Think and rethink; Design ‘hi-fi’ data model; Collect data; Query & Interpret data (Write article)

11 11 Database design means choosing “A field has one or more facets” –what counts as a field, or facet? –who says so? –Stakeholders must be involved! Rice, Bhutan

12 12 Data modelling exercise You are in charge of designing a database to find out which teachers give which lectures where and when in your course programme. This is the main ‘fact type’ you need to store. Find out which entities are important. Find key attributes. Draw an Entity-Relationship diagram to show the structure.

13 13 Possible course data model E-R diagram occurs asincludes is scheduled indelivers Legend: according to Hofstede What if several rooms per course-instance?

14 14 Cast, melt or create new data type? e.g. Experimental study that collects measurement data. (A) POINT (x,y, date1, a, b, date2, a, b, c) or (B) POINT (x,y) MEASUREMENT (x,y,date, a, b, c) or (C) POINT (x,y) MEASUREMENT (x,y,date,type, value) Where are a, b, and c?

15 15 In a database: A cast B melted C abstracted

16 16 Hofstede’s law of no escape from trouble There is no escaping from choosing. …and you could pay a high price for getting it wrong garbage in, garbage out! Data types (columns) vs data (rows): design issue! (A) efficient but no measurements can be added (B) measurements can be added but not new types (C) flexible, extensible, allows grouping across types

17 17 SQL, Structured Query Language One language for all 3 levels of database architecture: –regulate user level (grant, revoke) –create data (create, drop, alter) –regulate storage (create index, tablespace…) –see data or metadata ( select ) User program User program User program Data dictionary storage

18 18 SQL Select format of a select statement (‘query’): select from [ where ] [group by ]; e.g. Display the average of all measurements per type. select type, avg(value) from Cmeasurement group by type;

19 19 Tips & tricks / good practices Beware of version changes! –Always save structured data also in a raw format that can be read without the software. So not only.xls, but also.csv (comma-separated file) –This is harder for data in databases – save all tables also as.txt when possible Can another person use + understand your work? –Do the test!

20 20 A database may not be enough When you run a simulation model: Simulation software changes Hardware changes (e.g. 16->32 bits) Especially relevant for random generators Can you still reproduce your results? Short term: Always store results + model version Longer term: Save random seed (and algorithm) Forever: impossible.

21 21 Moral: Communicate with the future! (EU funded project called ‘Shaman’)


Download ppt "1 Organizing and modelling data Gert Jan Hofstede Teacher Course Data Management, INF-21306 Sjoukje Osinga From Information Technology Group www.wageningenur.nl/inf."

Similar presentations


Ads by Google