Presentation is loading. Please wait.

Presentation is loading. Please wait.

A short summary about Theories and Practical work

Similar presentations


Presentation on theme: "A short summary about Theories and Practical work"— Presentation transcript:

1 A short summary about Theories and Practical work
Documenting PGR A short summary about Theories and Practical work

2 How do you store your data?
In MS Excel spreadsheet? Handwritten cards? In a database (like MS Access)? In you head? In another database format?

3 What is a database? Databases are designed to offer an organized mechanism for storing, managing and retrieving information. They do so through the use of tables. If you’re familiar with spreadsheets like Microsoft Excel, you’re probably already accustomed to storing data in tabular form. It’s not much of a stretch to make the leap from spreadsheets to databases. The ability to link tables is a very powerful resource

4 So why not use a spreadsheet?
Databases are actually much more powerful than spreadsheets in the way you’re able to manipulate data . Just a few actions you can perform on a database: Retrieve all records that match certain criteria Update records in bulk Cross-reference records in different tables Perform complex aggregate calculations

5 What is a database? In other words:
I have data! Me too! In other words: A well-organized set of interrelated data held in one or more files which are capable of being managed by a software. Me too! Me too! Me also! Me too! Think of it like tables connected to eachother. Instead of having one very big table You use many different tables and link them together. You create a relationship between them. Relational databases. Me too!

6 Step by step Analysis Design Implementation
of gene bank activities to determine information and documentation needs Design of the manual and/or computerized system based on documentation and information needs Implementation of the system that has been developed Only a few years ago the documentation of PGR collections consisted of hand written records on cards stored in drawers. Computers have greatly increased the quantity and quality of documentation, but different gene banks tend to use different systems to organize their databases. Implementation: get the system started, accomplish to execute the program or information system.

7 Quantitative, dealing with numbers 94 g, 34 mm, 67%
Data Quantitative Qualitative Qualitative, description of the object being examined. Blue, horizontal .. Quantitative, dealing with numbers 94 g, 34 mm, 67% You can also store pictures, links url-addresses, maps, scanned material..

8 Name Address Telephone Filip Lund 12345 Peter Malmö 23456 Victoria
A relation can be seen as a table that stores information. Attributes or columns Person Name Address Telephone Filip Lund 12345 Peter Malmö 23456 Victoria 34567 Rows Records

9 But every table is not an relation.
There can not exist two rows in a table that are exactly the same Every column in the table has a unique name There is only one value in every cell. All values in one column belongs to the same domain or are undefined (null)

10 Redundant data Name Address Tel Course
Data in just one table repeat information… Students taking courses in a school Name Address Tel Course Filip Lund 12345 Math Peter Malmö 23456 Victoria 34567 English History Göra tabell med kursare och en med kurser och koppla ihop dem. You already know that Sven is living in Malmö with that tel. number What if Peter wants to take a course in English?

11 C2 Math Engl C4 History C3 1 Filip Lund 12345 12 Peter Malmö 23456 123
Example: Students taking courses in a University Person_id Name Adress Tel 1 Filip Lund 12345 12 Peter Malmö 23456 123 Victoria 34567 Person_id Course_id 1 C2 12 C4 123 C3 Course_id Study C2 Math C3 Engl C4 History Instead, store person data in a table and course data in another. The linking table only needs the the two id that identifies the rows. How do I store information if Peter wants to take a course in English?

12 Primary key A primary key can identify the data row It must be unique
Cant be null Can be a ID The easiest way is to let the program hold track of the rows and automatically increase the key. Remember the primary key must be unique!

13 genebank_institute_name
Table of accessions, data of seedsample accession_number Scientific name accession_name genebank_institute_name origin_country culton_type acqusition_date NGB1 Triticum aestivum ssp. aestivum IDUNA Nordic Gene Bank Sweden CV NGB2 STANDARD NGB3 JARL NGB4 ANKAR Denmark NGB5 SAXO The id’s are the primary key, because they identify the rest of the rows

14 But how store data in a good and reliable way?
Just to put everything in a file, in one big table is not a good idea. Use a relational database (think of the db as several tables that are connected to each other) Normalize it! Split up information into different tables and give the record an identifier! Save related information in one table. (For instance all personal data in a table called “person”) Link different tables together so you can retrieve all information you need

15 Normalization Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

16 This database is poorly normalized.
In fact, it is not normalized at all. Everything is stored in one big table. Imagine how many things you have to repeat if you test the same accession three times!

17 Person id Name Address Zipcode/city countr Title Institute Ins_street Inst_City Telephone Cou 123 Magdalena Svärdh Bondev 21 Lund Swe IT staff Nordic Genetic Resource center Smedjev 3 Alnarp 124 Louise Bondo Danske vej Farum Den Section leader 125 Pia Ohlsson Blentarp 123 45 Lab 126 Johan Bäckman Dag H. vg Lund IT boss 127 Jonas Nordling S Hans gr Lund 128 Slatan Ibrahim Long Street 02 Copenhagen Apple expert Pometet LIFE Institute for Agricultural Science Short Street 2630 Taastrup DNK 129 Per Person A street 123 45Malmö swe Seed store 130 Sven Svensson B street 131 Karl Krlssn C street Information about the institute where people work is repeated several times! Imagine how boring it will be to add the same information for all staff members in NordGen!

18 Person Institutes Much better!
id Name Address Zipcode/city countr Title Inst_id 123 Magdalena Svärdh Bondev 21 Lund Swe IT staff 45 124 Louise Bondo Danske vej Farum Den Section leader 125 Pia Ohlsson Blentarp 123 45 Lab 126 Johan Bäckman Dag H. vg Lund IT boss 127 Jonas Nordling S Hans gr Lund 128 Slatan Ibrahim Long Street 02 Copenhagen Apple expert 46 129 Per Person A street 123 45Malmö swe Seed store 130 Sven Svensson B street 131 Karl Krlssn C street Much better! If the Institute changes telephone number you just have to change it once! Institutes Inst_id Institute Ins_street Inst_City Telephone Cou 45 Nordic Genetic Resource center Smedjev 3 Alnarp Swe 46 Pometet LIFE Institute for Agricultural Science Short Street 2630 Taastrup DNK

19 Relational Database Managers
By linking files or information together, a relationship is produced between the tables. The shared field is only stored once, it is not duplicated in each file

20 You often visualize a table like this:
person id name address zipcode/city countr title inst_id 123 Magdalena Svärdh Bondev 21 Lund Swe IT staff 45 124 Louise Bondo Danske vej Farum Den Section leader 125 Pia Ohlsson Blentarp 123 45 Lab 126 Johan Bäckman Dag H. vg Lund IT boss 127 Jonas Nordling S Hans gr Lund 128 Slatan Ibrahim Long Street 02 Copenhagen Apple expert 46 129 Per Person A street 123 45Malmö swe Seed store 130 Sven Svensson B street 131 Karl Krlssn C street person id name adress zipcode/city country title inst_id

21 another example… Seed info
Fields which have a logical relation to each other and to the identifying field, should be grouped together in a relation. Seed info Accession_id Accession name Species Date of acquisition DonorInstituteName Street Address City Address Other fields… This is not a good example! If the gene bank receives another accession at a later date , the entire address would need to be entered again in a different record. It would be better to store information about the donors in a separate table.

22 … this would be better Donor Seed info AccDon_id Accession_id
Accession name Species Date of acquisition Amount of seeds AccDon_id Other fields… AccDon_id donPer_id donInst_id donAccname Don_info Information about the donated material, from which institute, person who donated it, what the accession has been called in the donating institute and other information about the donated material

23 …yet another example Registration Passport Storage Accession name
Scientific name Original country Other fields… Accession name Collecting institute Collector Date of collection Other fields… Accession name Freezer_number Box_number Storage_date Other fields… You might have the following separate files for a particular crop: Here the Accession name is the same in all for the plants or seeds. What happens if you have to change the name? You have to change it in all three otherwise data will be lost. The best thing is to store a link to the accession and store all the information about the accession elsewhere. What happens if you have to change the name? You have to change it in all three otherwise data will be lost. The best thing is to store a link to the accession and store all the information about the accession elsewhere.

24 Accession Passport Storage Accession_id Accession name Scientific name
Original country Other fields… Accession_id Collecting institute Collector Date of collection Other fields… Accession_id Freezer_number Box_number Storage_date Other fields… What happens if you have to change the name? You have to change it in all three otherwise data will be lost. The best thing is to store a link to the accession and store all the information about the accession elsewhere. If you misspelled the accession name, you just have to change it once

25 Visualized in a ER-model (Entity-Relationship model) it looks like this:
Seed sending info Gene Bank 1:N Our institute (Gene Bank) can receive many seeds (with seed sending information)

26 One species can have many accessions.
Visualized in a ER-model (Entity-Relationship model) it looks like this: species accessions 1 * 1 teacher 0..5 students One species can have many accessions. One accession belongs to just one species One teacher can have up to five students to teach

27 Different categories for data modeling
Object oriented model Relational model Objekt orienterad with inheritance. Klassen bil ärver bilmärke, antal dörrar osv. Klassen bil har allt som alla bilar har. Dvs som är gemensamt med alla bilar. Relationsmodellen kopplar tabeller till relationer för att undvika att data lagras på onödigt många ställen och i flera tabeller. Ex studenter tar kurser på olika universitet. Andra data modeller finns som nätverk och excelblad…

28 Data modeling- how to get started
To start – find the object Let the object become a relation Each object or relation should have its own information and descriptive attributes The object becomes the table, the attributes columns and the identifying attribute becomes the primary key An object could be Persons Places Objects (like Accessions, institutes…) Events (like seed drying, germination tests..) Things you want to store information about

29 Data modeling A data model makes it easier to understand the meaning of the data A common and popular data model used in database design is based on the concepts of the Entity – Relationship (ER) model Conceptual logical structure on databases. Designed by experts, information users, those who know and are familiar with the information. Identification of important entity and relationship types Physical structure designed by experts, database developers. How the logical structure is to be physically implemented (as tables) on the target DBMS Conceptual: The person who knows where to put where, which information goes with what and how to organize the data. Important with the teamwork

30 Data modeling: Should represent reality and the information that need to be stored Help us to visualize our task and clarify rules and restrictions and relations between objects Gives the staff a chance to participate early in the development Is a established way of developing information systems Is a way to document and explain the IT-system

31 Accession and collecting information
Start to gather related information into tables and link them together Accession number Scientific name Pedigree name Donor name Acquisition name And more… Collecting organization Collecting date Country of collection And more… You could have everything in one file or table, but this will be a large and slow system to work with. Better is to organise related information in different tables.

32 A model of accession descriptors and collecting descriptors:
Accession_id Accession name Scientific name Cultivar name Donor Acquisition date More… Accession_id Collection organization Collecting date Country of collecting Province/state Location of collecting site Type of sample… The accession relation gather all the information about the accession. The collecting relation describe all the information about the specific collecting event

33 Example of relations Collect collect_id accession_id person_id
latitude longitude site_name country_id 1 Accession accession_id taxon_id accession_name country_id acc_mandate 1 Taxon taxon_id sci_name eng_name mandate grin_no thgw 1 1 N Seedstore batch_id accession_id batch_no collection box_no harvest_year N 1 germin_test test_id accession_id batch_id grm_pct test_date person_id N N

34 Wow Instead of repeating a lot of information, You store the accession once and link to other tables with other information! By linking files or information together, a relationship is produced between the tables. Smart! And effective!

35 Part 2 Maybe a short brake?

36 DBMS A software system that enables users to define, create, maintain and control access to the database. The DBMS is the software that interacts with the users’ application programs and the database. Typically a DBMS provides the following facilities… Data base management system

37 It provides controlled access to the database.
It allows users to define the database with types and structures and the constraints on the data to be stored in the database. (DDL) It allows users to insert, update, delete and retrieve data from the database. (DML) It provides controlled access to the database. DDL= data definition language DML= data manipuation language Controlled access, security system, integrity system, concurrency system, recovery system if server failure. Concurrency så fler kan använda databasen.

38 Example of DML and DDL Data manipulating language (DML)
Select * from Person; Insert into Person (Name, Address, tel) values (‘John Book’, ‘California’, ) Data Definition language (DDL) (Changes in structure) Create table Person ( Name varchar(40) NotNull, Address char(10), tel int)

39 DBMS environment Data Procedures People Hardware Software Bridge Human
Machine

40 Users with software and questions
Web site, For instance SESTO. Question: List all accessions User1 User2 DBMS Interpretation of question Data base language (SQL) DDL & DML All communication goes by the DBMS Data bases with information and meta data

41 Features in SESTO e.g. project archive and pictures archive
Entry levels with seed store data: Genus, Taxons, Cultivars and Accessions. To list all Taxons in your gene bank, click the button and information about the taxons and their accessions will be presented. The SADC entrance

42 To see the accession list of this taxon (Arachis hypogaea) click the [select]-link

43 To see detailed information about this accession click [select]

44 The accession name SWZ1 from Zea mays gives this information
To see all the information from stored material click this link

45 To edit accession information and storage information, click the [edit] links and a pop up window will be presented Two batches with 10 distribution bags are stored in the Active Storage in freezer 15 in boxes 10 and 8.

46 To add and edit information about an accession.
The information goes to the database and will be stored there

47 A documentation system should be…
Reliable Retrieve information fast User-friendly Flexible, should anticipate changes

48 Things you will not regret!
Be organized and structured from the start! Be sure you have a good data model Spend some time on normalization, it will be worth it later Try to cooperate and work together with the documentation working group (Documentation is far to important to be left alone with the IT-staff)

49 Have you heard about the new dataset…
Web services Web services are standardized programs that send your data to databases automatically. Your data will be published elsewhere This makes your work recognized And will give your gene bank credit and your material will be asked for GBIF, Biocase The Biological Collection Access Service for Europe, BioCASE the Global Biodiversity Information Facility is a mega-science project with the aim "to make the world's primary data on biodiversity freely and universally available via the Internet" ( Have you heard about the new dataset…

50 Last but not least… Data is not information.
Information must be retrieved and understood. Then knowledge is created. "Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?" T.S.Eliot (Where is the information we have lost in data?)

51 Thank you for listening!


Download ppt "A short summary about Theories and Practical work"

Similar presentations


Ads by Google