Presentation on theme: "13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006."— Presentation transcript:
13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day” MIS 304 Winter 2006
13 2 Goals for this class Identify tools to help evaluate database products Understand the role of other data management architectures. Understand the features of the MUMPS data structure. Understand the structure of the X.500/LDAP directory standards. Understand the Linear Associative Model
13 3 Database Evaluation Requirements, Requirements, Requirements Do the evaluation! Make it as realistic as possible Use outside tools
13 6 There are 2 basic ways to organize data The tree The table
13 7 “M” a.k.a. MUMPS Massachusetts General Hospital Multi-utility Programming System The ANSI Standard version now called simply “M” “Multidimensional” database. –http://www.cache.com
13 8 The MUMPS Data Structure In traditional programming languages SOMETHING(X,Y) or SOMETHING(1,2,3,4…) or SALES (1,1,1,1) = 42 In MUMPS Sales(region,salesman,product,time) example: TotalSales = Sales(east,Fred,clocks,Q1) + Sales(east,Ed,clocks,Q1) Note that the indexes on the “array” are word valued and not number valued.
13 9 So What Does the Query Look Like? FOR region = EAST to WEST FOR salesman=Adams to Smith FOR product=1111 to 9999 FOR time=Jan to Dec TOTALSALES = TOTALSALES + SALES(region,salesman,product,time) NEXT time NEXT product NEXT salesman NEXT region
13 Directory Services A Special Database Case MIS 304 Fall 2005
13 12 Class Goal Understand the application of Naming to network management. Understand the idea of a classification hierarchy. Understand Lightweight Directory Access Protocol (LDAP) and its application.
13 The Case for Directories The “Net” has become increasingly complex. More need than ever to work across organizational boundaries. Wouldn’t it be great if everything had a unique and understandable name?
13 14 What’s in a Name People Buildings Computers Printers Locations Objects (computer) Roads Vehicles Rooms Stock locations Truck wells Servers
13 15 The Goal If you can name it and locate it you can manage it.
13 16 What’s in a Name A name draws a distinction between two things. G. Spencer-Brown, Laws of Form, Dutton, 1979. To take advantage of human processing capabilities names should be “friendly”.
13 17 Taxonomy The study of the general principles of scientific classification. A way to organize anything into hierarchical categories based on characteristics. Used widely in Biological Sciences.
13 21 X.500 Originally part of the Open Systems Interconnect (OSI) network suite. Defined directory structure on an OSI network. Modified to run over TCP/IP networks (Internet).
13 22 Tags C = Country O = Organization OU = Organizational Unit L = Location G = Given Name S = Surname
13 23 Person Identifier C= CA O= BCE Emergis OU= Automotive S= Morin G= Gary
13 24 Person Identifier Because of object “inheritance” each level inherits the attributes of the preceding level.
13 25 Database Structure Can be either hierarchical or relational. If it’s relational, what’s the key? OOUS BCE EmergisAutomotiveMorin Daimler ChryslerChryslerSmith EDSUSCutler FordAssemblyJones FordVisteonMorin
13 26 Distinguished Name A string of globally unique characters. Almost everything has problems. –Mohamed Chang? –SSN? –An E-Mail address? You almost always have a “messy” key.
13 27 Lookup in SQL Select * from DIRECTORY where c = us and o = Ford and s = “Morin” Where is DIRECTORY? SQL may not be the ideal answer.
13 28 LDAP X.500 was getting really messy. Most organizations did not need all of the features. Some U of M students wrote the Lightweight Directory Access Protocol. Defines how to connect to an query a X.500 style database with lots less overhead.
13 29 LDAP Examples Microsoft Exchange/Outlook Lotus Notes Novell NDS Netscape browser Open LDAP http://www.openldap.org WAX500, MAX500, XAX500
13 30 Logical Extensions Once you can name it, locate it and have a way of querying it just extend the idea to any object.
13 31 Communities of Interest Internet Engineering Task Force X.521 describes a “person” object. AIAG has a guideline to describe Companies and Locations.
13 32 Example 1 CN = ITM Centerline ou=locations o=arius.com street = 25999 Lawrence Ave l = Centerline st = Michigan c = us postalCode = 48015-0303 buildingNumberOfFloors = 1
13 33 Example 2 cn = Detroit Medical Center Helipad ou = locations landingStripType = concrete landingStripElevation = 630 ft landingStripAirportID =5MI0 l = Detroit st = Michigan street = 420 St. Antoine c = us
13 34 Naming Objects Computer Objects are somewhat different than physical things. Human readability is not so much of an issue and lookup speed is.
13 35 OSI ASN.1 A notation for describing data structures. Uses an Object Identifier (OID) and a short text description to identify levels of the tree. If a labeled node is a leaf in a tree then it is an object and contains a value.
13 37 So What? You can build a cross company directory. –Names are agreed on by a common standards body (AIAG) –Common Query Language (LDAP) Each organization keeps its own information current. Extensions are easy to add.
13 Search Engines and The Associative Retrieval Model a new kind of Database? MIS 304 Fall 2004
13 39 Goals for this class Understand that a linear associative retrieval model is.
13 40 There are 2 basic ways to organize data? The tree The table And… A Matrix of Associations?
13 41 The Problem to be Solved The Internet has a large number of documents linked together with the documents spread out physically across many web servers. How do you find anything?
13 42 One solution Build a data structure that indexes the pages. The structure is populated by searching individual pages with a “bot”, a program that surfs the web returning the text of the many pages there. The pages returned by the bot are processed into a special kind of database.
13 43 A Simple Document Index Structure Create a matrix containing the index terms on one axis and the documents containing them. –Leave out words like a, the, and, it… –Assign a number to each term and document. –Call this matrix C Doc 1Doc 2Doc 3Doc 4Doc 5 Term 111101 Term 201100 Term 310011
13 44 Coordinate Retrieval Now suppose we want take all of the documents we have retrieved from the web and query our C matrix for where a term occurs in a document. We can do this by creating a 1xt matrix of the terms (t) we want to search for and call it Q then if we normalize so that each row in C sums to 1 we get a 1xd matrix of documents (d) with a score for every document by: R = QC
13 45 Discussion This is a good as far as it goes but… This does nothing to help us get to the situation where there are more complex relationships between the terms. Synonyms are a good example. Suppose you are writing a document you don’t want to use the same word to describe something over and over again so you use a synonym. The probability that both words occur in same document is greatly increased.
13 46 Inter-term Relationships Suppose we want to include these inter-term relationships in our search. We need a Thesaurus.
13 47 Transform Now look through the table and create a matrix of the number of times terms occur together in a document. Term1Term2Term3 Term1422 Term2220 Term3203
13 48 Normalization Matrix Normalize the transform table so that the cells are the “cost” of that two terms occur together. Call that matrix L. Term1Term2Term3 Term1.125.33 Term2.125.500 Term3.330
13 49 Query Vector Now create a vector of the terms you want to search for. Term 1Term 2Term 3 101
13 50 Now the Math Multiply the index term table, call it T by the normalized transform table C and the Query vector Q and you get a vector R that contains a ranking of documents 0 to 1. R = QLC
13 51 Results The result vector. The documents with the highest value have the most likelihood of being relevant to our search. Doc 1Doc 2Doc 3Doc 4Doc 5 Rank0.5.3.6.1
13 52 Discussion The Matrix that is created by the multiplication of L and C now becomes a new kind of structure a matrix of “associations” between documents and terms and the terms themselves. This may be the only other way of organizing data besides the table and the tree. You can extend this by creating a new structure that is a normalized document by document (dxd) matrix that takes into account associations between documents. (e.g.) chapters or authors. This falls into the new category called “Connectionist” models that include Neural Networks.
13 53 A Model of Consciousness Some have even gone so far as to say this may be one of the structures in a conscious brain. (Kanerva, 1988) Do some thought experiments on your own “associative” brain by trying some stream of consciousness exercises.
13 54 Linear Associative Retrieval Model Giulianio and Jones, Linear Associative Retrieval, Vistas in Information Handling, Spartan Press 1962, Hough, The Control of Complex Systems, Progress in Cybernetics and Systems Research, Halstead Press, 1975. Kanerva, Sparse Distributed Memory, MIT Press, 1988.
13 55 The Future More of the same –There is a lot of pent up inertia –SQL is a pretty good programming language More XML –There is no stopping this train. More AI/Connectionist/Associative tools Bigger and bigger databases