Download presentation
Presentation is loading. Please wait.
1
Database Extensibility
Jacques Roy Presentation notes go here.
2
Agenda What is Database Extensibility?
Database and Extensibility background IDS features Extensibility Examples DataBlade Modules Bladelets Considerations for building your own
3
What is Database Extensibility?
Ability to add business components in the database Tailor the database to the business environment Reduce application complexity Put the processing where it makes the most sense Set processing provided by the database Higher performance Less data movement, better data representation, better indexing Faster development, lower maintenance cost
4
Extensibility Background
Historic Approach user exits, device drivers Relational Databases triggers, constraints, stored procedures New development platforms Web Server: CGI, NSAPI/ISAPI App Servers: J2EE Other Eclipse and MS Visual Studio plugins Object-Relational Capabilities Relational Framework
5
IDS Extensibility Features
User-defined types Distinct, opaque Table/Type inheritance Functional index R-Tree index Smart Large Objects More Extensions can be written in: SPL, C, Java Complex types Row, set, multiset, list Polymorphism User-Defined Function Also known as user-defined routine User-Defined Aggregate Primary/Secondary Access Methods
6
Eliminate large data transfer Simplify or improve processing
When to Use UDFs/UDRs Eliminate large data transfer Transfer time is much larger than processing time Simplify or improve processing Eliminate table scan Eliminate application set processing Define new sorting order Replace store procedures with aggregates Provide business processing to applications Consistency of result Eliminate the need for custom interfaces
7
Better Grouping Quarter() function: SELECT Quarter(date), SUM(income) FROM orders WHERE Quarter(date) LIKE '2008Q%‘ GROUP BY 1 ORDER BY 1; AgeGroup() function: SELECT AgeGroup(birthdate, 20, 10) AgeGrouping, SUM(total_price) Total FROM customer c, orders o, items I WHERE c.customer_num = o.customer_num AND o.order_num = i.order_num GROUP BY 1 ORDER BY 1 DESC; AgeGrouping Total $ $
8
Quarter Function in “C”
#include <mi.h> mi_lvarchar *quarter(mi_date date, MI_FPARAM *fparam) { mi_lvarchar *retVal; /* The return value. */ short mdy[3]; mi_integer ret, qt; char buffer[10]; /* Extract month, day, and year from the date */ ret = rjulmdy(date, mdy); qt = 1 + ((mdy[0] - 1) / 3); /* calculate the quarter */ sprintf(buffer, "%4dQ%d", mdy[2], qt); retVal = mi_string_to_lvarchar(buffer); return retVal; /* Return the value. */ } rjulmdy() is an ESQL/C function
9
Compiling and Linking # Compilation
COMPILE=-I$(INFORMIXDIR)/incl/public -O -c cc -DMI_SERVBUILD $(COMPILE) quarter.c # Creating the shared library ld -G -o qlib.bld quarter.o # Change permission on file to 775 # read and execute to other chmod a+x qlib.bld
10
Creating the function CREATE FUNCTION quarter(date)
RETURNS varchar(10) WITH (not variant, parallelizable) external name "$INFORMIXDIR/extend/class/qlib.bld(quarter)" LANGUAGE C; GRANT EXECUTE ON FUNCTION quarter(date) TO public;
11
UDR Processing vs. Stored Procedures
SQL group order scan Stored Procedures UDR
12
Using SPL for Extensions
Better date manipulation: Day of the year Week of the year Week of the month Quarter Unit conversion Feet Meters Gallons Liters Fahrenheit Celsius Functional indexes CREATE FUNCTION quarter(dt date) RETURNS integer WITH (NOT VARIANT) RETURN (YEAR(dt) * 100) (MONTH(dt) - 1) / 3; END FUNCTION; Example: EXECUTE FUNCTION quarter('9/2/2008'); (expression) 200803 For example, “A large retailer” created their own DataBlade for date processing. See Developer Works Informix zone:
13
The Node Type "Hierarchically" aware type: Node Pre-processed the hierarchical relationships ex: Chapter 11, section 7, paragraph 3: Add new way to relate objects to one another IsAncestor(), IsChild(), IsDescendant(), IsParent() Can change processing from exponential to linear examples: policies, product classification, bill-of-material, LDAP, XML, etc. 1.0 1.1 1.2 1.3 1.2.1 1.2.2 1.2.3
14
Node Application Example
Geo Hierarchy: country > state > metro > city GeoNode Policy Resource Q: What policy apply to the Hyatt in Denver? A: A Colorado Policy
15
Bill-of-Material Example
CREATE TABLE Components ( ComponentId Node, Sequence integer, Quantity integer, Name varchar(30), PartNumber integer ); CREATE TABLE Parts ( PartNumber integer, Name varchar(30), ProviderId integer, Price Money(10,2) ); A component can be made up of multiple components A component is made up of multiple parts component Parts
16
Fine-Grained Auditing
Use triggers and user-define routines Register event processing Commit or rollback Send events to file or outside process Use a generic function for any table See the developerworks article: Event-driven fined-grained auditing with Informix Dynamic Server
17
New Trigger Use CREATE TRIGGER tab1instrig INSERT ON tab1
FOR EACH ROW ( EXECUTE PROCEDURE do_auditing2() ); The API includes functions to find the context of the trigger.
18
Event-Driven Architecture
Statement Table Trigger Callback Register EventTable Commit/Rollback MonitorProgram 8 3 2 1 6 5 4 7
19
Inheritance and Polymorphism
loans Manufacturing Telco Retail Healthcare Financial Services Now, lets imagine you are running a bank. You have thousands of branches that give out loans to customers. Your goal is to monitor the level of risk that branches are taking. The diagram above represents a table hierarchy for loans. A manufacturing loan inherits from loan. There can be additional specialization like in the case of Food under Retail. The risk calculation is different for each industry. If you loan to a manufacturing company you can repossess their assets to recover your loan if the company defaults. This affects the risk factor of the load. If a Novelty store goes under, you are unlikely to be able to recover anything if they were in the business of selling pet rocks! Obviously, the risk involved is much higher. With IDS 9.x, you can create one risk function for each type of customers. Each risk function needs only to know how to manipulate one loan. An operation on the table hierarchy will call the appropriate risk function at the appropriate time for each customer selected. In a standard relational environment, you would have to develop a custom application that retrieves each row, figures out the type of customer and calculate the risk factor. In addition to being more complex than IDS 9.x solution, the data transfer between the database server and the application would represent most of the processing time. It also impacts the network traffic/bandwidth. Lets see how we would calculate the risk in an IDS 9.x environment. Clothing Food Novelties
20
Inheritance and Polymorphism (cont.)
SELECT branch_id, AVG(risk(loans)) FROM loans GROUP BY 1 HAVING AVG(risk(loans)) > 1 ORDER BY 2 DESC; SELECT branch_id, AVGRISK(loans) FROM loans GROUP BY 1 HAVING AVGRISK(loans) > 1 ORDER BY 2 DESC; Now that we have the risk functions added to the server, we can easily calculate the average risk factor that each branch is taking as shown in the first SQL statement. As a result, we get a list of branches that are above the risk threshold of 1, sorted in decreasing order. Getting the average risk factor is not really what we want. We need to weigh the risk of each loan with the amount of the loan: Lending $1M to a manufacturing company should be riskier than lending $1000 to a Novelty store. We already have the risk functions, we only need to add a few functions: One that multiplies the risk factor by the loan amount and returns the result and the loan amount. Another one that knows how to combine 2 results-amounts together, and another one that can finalize the calculation and return the average risk. With this, we now have a User-Defined Aggregate function that can be used as shown in the second SQL statement and alert you when branched are taking too much risk. You then know where to send your internal auditors. This example is given in the context of a banking problem. The concept applies to other problems. Its implementation reduces application complexity, reduce the processing required and limits network utilization. It translates in lower costs, faster time to market, and, therefore, business advantage. With this basic understanding of the Informix object-relational technology, we can now move on to our case study and hopefully understand the relative simplicity of the implementation.
21
Replacing Store Procedures with UDA
Business problem: Merger multi-polygon types (ESRI) Original Solution: SPL stored procedure (82 lines, 3 SELECT statements, 1 insert statement, 2 embedded FOREACH) New Solution: User-Defined Aggregate 23 lines of SPL, no SQL, no FOREACH SELECT a.user_id, a.case_id, a.case_type, a.serial_nr_full,a.spt_disp_theme_cd, do_union(a.shp) FROM case_shp a WHERE user_id = "user0001" GROUP BY 1, 2, 3, 4, 5 INTO TEMP my_temp_table WITH NO LOG;
22
Original Code CREATE PROCEDURE union_caseid()
DEFINE GLOBAL theUserId CHAR(8) DEFAULT USER; . . . BEGIN SELECT MAX(a.se_row_id) INTO max_serowid FROM case_shp a WHERE user_id = theUserId; FOREACH -- case_id in current township SELECT unique case_id INTO p_caseid FROM case_shp WHERE user_id = theUserId LET init = 'true'; LET newSeRowId = max_serowid ; FOREACH -- shape and its display atributes SELECT a.case_type, a.serial_nr_full, a.spt_disp_theme_cd, a.shp, a.se_row_id INTO p_casetype, p_serialnrfull, p_sptdispthemecd, p_shp, p_serowid FROM case_shp a WHERE user_id = theUserId AND case_id = p_caseid IF ( p_shp IS NOT NULL ) THEN IF ( init = 'true' ) THEN LET init = 'false'; LET unionResult = p_shp; ELSE LET unionResult = (union(p_shp::MULTIPOLYGON, unionResult::MULTIPOLYGON)::MULTIPOLYGON); END IF -- init = 'true‘ END IF -- p_shp IS NOT NULL AND END FOREACH -- shape and its display atributes INSERT INTO case_shp VALUES (theUserId, p_caseid, p_casetype, p_serialnrfull, p_sptdispthemecd, unionResult, newSeRowId ); END FOREACH -- case_id in current township END -- BEGIN END PROCEDURE;
23
User-Defined Aggregate Code
CREATE FUNCTION do_union_iter(state lvarchar, arg lvarchar) RETURNING lvarchar DEFINE retval lvarchar; IF (state IS NULL ) THEN RETURN arg ; END IF IF (arg IS NULL) THEN RETURN state ; LET retval = state || arg ; RETURN retval ; END FUNCTION ; CREATE AGGREGATE do_union WITH (ITER=do_union_iter, COMBINE=do_union_iter);
24
User-Defined Aggregates
A lot more flexible than standard aggregate functions Can take any type as input ex.: row type Can take an initialization parameter of any type ex: row type Can return complex types ex: list
25
Fabric Classification
Business Problem: Provide an efficient way to select fabrics Indexing Colors: Cielab coding (3 dimensions) and other attributes (Total of 5 dimensions) Fabric Type Hierarchy Requires a hierarchy-aware type ex: We want a “natural” fabric Fabric Style and Patterns ex: What does “Victorian” mean?
26
Fabric Classification Queries
Find the fabrics that have “blue” in them “Blue” is not a specific value but a range of values in 3 dimensions Find “natural” fabrics that have “blue” in them What does “natural” mean? Find “silk” fabrics that has “blue” in them What does “silk” mean? There could be many types of silk An application wants the answer to the questions, not a list of potential candidates with “false positives”
27
Problem with Indexing Colors
B-tree index indexes only one dimension Selects too many rows Rule of thumb: Index selected if it returns less than 20% of the rows Result: likely to do a table scan (read all rows) Solution: Use an R-tree index Multi-dimensional index Requires the creation of a 5-dimension type (UDT) R-tree index Preferred indexing method for spatial data (spatial, geodetic) Data types: point, line, polygon, etc.
28
Handling the Fabric Type Hierarchy
Fabric types can be seen as a hierarchy Hierarchies are difficult to deal with in a RDBMS The hierarchical processing is only a part of the total required processing We can use the Node type to solve our problem Fabric Natural Synthetic Wood Silk Pine Hard Soft . . . Cotton
29
Handling Fabric Types and Patterns
This was not addressed in the PoC Create a pattern hierarchy Define “Victorian” in terms of colors, fabric types, and patterns.
30
Solution Benefits Scalability due to indexing
Colors Hierarchy Application simplification No manipulation of extra data No special algorithms to handle color matching, etc. Performance Less data movement Less data through the network
31
Other examples Other multi-dimensional problems: 3D CAD drawings
Support for Globally Unique Identifier (GUID) Soundex/Phonex
32
Advanced Feature: Named Memory
Usage: Allocate memory of a chosen duration that can be retrieved by name Durations from function to server Example: BeCentric uses names memory to keep formatted information in memory for quick sanctioning testing (financial entity validation)
33
Advanced Feature: Virtual Table/Index Interface
Think of those as similar to device drivers Purpose functions: am_create, am_drop, am_open, am_close, am_insert, am_delete, am_update, am_stats, am_scancost, am_beginscan, am_getnext, am_rescan, am_endscan, am_getbyid, am_check Usage: Used to make “something” look like a table or an index Examples: Excalibur Text (virtual index interface) TimeSeries: make a TimeSeries look like a relational table (Virtual Table Interface)
34
Key to Successful Extensibility Project
Start small, develop your expertise Remember the first OO projects Use pre-built extensions Study examples Approaches: Use DataBlade Modules Use Bladelets Build your own
35
What are DataBlade Modules?
Business Components A set of functionality that solves a specific business problem Building Blocks Can include: User-defined types User-defined Functions Tables, views Client component A DataBlade can come from IBM, a third party or be built in-house
36
IDS DataBlade Modules Large Object Locator (LLD) Binary data type Node
Basic Text Search (BTS) MQ Series Spatial XML and XSLT Web feature services Web Geodetic TimeSeries Real-Time Loader C-ISAM Image Foundation Video Foundation Excalibur Text Excalibur Image Missing some datablades that came with IDS 11.10: Node, Indexable binary type, Basic text search…
37
Spatial is Everywhere Where are my stores located related to my distributors? How can I efficiently route my delivery trucks? How can I micro-market to customers fitting a particular profile near my worst performing store? How can I set insurance rates near to flood plain? Where are the parcels in the city that are impacted by a zoning change? Which bank branches do I keep after the merger based on my customers locations (among other things)?
38
Complex Spatial Example
CREATE TABLE e_Yellow_Pages ( Name VARCHAR(128) NOT NULL, Business Business_Type NOT NULL, Description Document NOT NULL, Location GeoPoint NOT NULL, Occupied SET( Period NOT NULL ) ); -- -- “Show me available service stations -- specializing in Porsche brakes -- within 30 miles of where I am?” SELECT Y.Name FROM e_Yellow_Pages Y WHERE Contains( Y.Location, Circle( :GPS_Loc, ‘30 Miles’ ) ) AND Y.Business MATCH “Automotive Service” AND DocContains(Y.Description, “Porsche AND brakes” ) AND NOT Booked ( Y.Occupied, Period (TODAY,TODAY+5)); Raise the level of abstraction at the database level. Buy components to solve common problems, and build components to achieve a competitive edge IDS allows you to solve business problems that include different types of data. For example a consumer would like to find particular businesses in a certain geographical area that have open appointments within the next week. This involves searching using geographic terms (business located within a 30 mile radius of where I am now) Textual information (business knows about brakes and Porsches) Temporal data (there are open bookings in the next 5 days). Note – the user-defined functions shown here are generic and do not apply to any specific datablade – they are merely used as examples.
39
Geodetic DataBlade longitude latitude -90 (90° W) +90 (90° N) R GeoSpatial objects consist of data that is referenced to a precise location on the surface of the earth Includes altitude and time range dimensions Applications: Global, Polar, Trans-Pacific, High Accuracy Basic computations: Complex, Expensive Example: Find all the atmospheric elements that were present in the Denver area during the last thunderstorm Spatial co-ordinate systems In applications where it is important to correctly account for the Earth's shape it is best to use the "Geodetic DataBlade". This avoids problems as you go round the world (cross the dateline).
40
Spatial vs. Geodetic: Distance Issue
What is the distance from Anchorage to Tokyo? In a flat plane, there is no doubt as to what is the shortest path between two points: a straight line. But when that flat plane is supposed to represent a round surface without edges, that straight line becomes meaningless. The shortest path is the shorter of the two possible geodesic paths: the thick part of the great circle
41
Spatial vs. Geodetic: straddling the 180th Meridian
split flat-plane representation into 2 or more pieces MULTIPOLYGON( ( , , , , ), (180 30,180 40,165 40, 165 30, ) ) POLYGON( (165 30, , , ) ) Another view of a similar case, where the shape crosses over the boundary between Eastern and Western hemisphere. The figure on the right uses a different projection from the previous slide, giving a better impression of a round earth.
42
Flat Map vs. Round Earth
43
XML Capabilities Self-describing data Useful for information exchange
Useful for flexible data definition IDS provides: Functions to generate XML (UDA). Functions to extract part of an XML document. Functions to test for the existence of an element in an XML document. XSLT transformation Extract, exist and validate functions use Xerces-C. Open source provided by IBM. <customer> <row> <customer_num>101</customer_name> <fname>Ludwig </fname> <lname>Pauli </lname> <company>All Sports Supplies </company> <address1>213 Earstwild Court</address1> . . . </row> </customer>
44
Basic Text Search (BTS) DataBlade
Text search engine built into IDS. Provides a variety of word and phrase searching on an unstructured document repository. Search engine is provided by the CLucene text search package (open source). Examples: SELECT pid FROM t WHERE bts_contains(text_data, 'foo AND bar'); SELECT id FROM product WHERE bts_contains(brands, 'standard', score # real) AND score > 70.0; score is a statement local variable (SLV)
45
Basic Text Search (BTS) DataBlade
Supports Wildcards and Fuzzy Search: “?” single character wildcard “*” multiple character wildcard “~” fuzzy search SELECT cat_advert, score FROM catalog WHERE bts_contains(cat_advert, 'soarness~ and classic', score # real); Proximity, range searches Use stop words Can index XML documents Index all or selected XML elements Index all or selected attributes soarness~ matches soreness
46
TimeSeries DataBlade Module
A time series is a set of data as it varies over time TimeSeries DataBlade optimizes storage usage 50% Savings not uncommon Optimized Access Time 10 times performance improvement typical Calendars Defines period of time that data may or may not be collected SQL, Java, and C interfaces VTI Interface Makes a time series look like a regular table
47
Who’s Interested in TimeSeries
Capital Markets Arbitrage opportunities, breakout signals, risk/return optimization, portfolio management, VaR calculations, simulations, backtesting... Telecommunications: Network monitoring, load prediction, blocked calls (lost revenue) from load, phone usage, fraud detection and analysis... Manufacturing: Machinery going out of spec; process sampling and analysis Logistics: Location of a fleet (e.g. GPS); route analysis Scientific research: Temperature over time... Other Live Sports statistics
48
TimeSeries Performance Example
Financial brokerage firm: Online stock collection < 1 sec lag requirement > 100,000 messages/sec >5 TB of history kept online Runs on a 4 CPU Linux box! Extreme Scalability!
49
MQ DataBlade Message queues are useful for exchanging information asynchronously Communication mechanism in an SOA environment Used as an Enterprise Service Bus Access WebSphere Message Queue (WMQ): Function interface. Table interface. Transactional must be used within a transaction Message is sent only if the transaction commits
50
IDS Bladelets Located at: and Bladelets: mrLvarchar, Node, regexp, shape, exec, period, etc. For detailed information, see: "Open-Source Components for Informix Dynamic Server 9.x“ Jacques Roy, William W. White, Jean T. Anderson, Paul G. Brown ISBN Includes: Node, Period, ffvti, exec, shape, sqllib/IUtil, regexp, mrLvarchar, JPGImage
51
mrLvarchar and regexp Store data efficiently based on its length
Document data vary in length Many business data set include such documents Web pages, XML documents, product description, articles, etc. mrLvarchar stays in row when shorter than 2KB, in BLOB otherwise Includes useful functions Snip, clip, concat, concatAll, Upper, Lower, Length, Instr, regexp functions RegExp regexp_match, regexp_replace, regexp_extract, regexp_split Search on mrLvarchar more complete search capabilities than LIKE and MATCHES
52
Period Time periods comparisons based on dates or datetimes
Manages information about fixed intervals in a timeline Rich comparison functions Equal(), NotEqual(), WithinNotTouches(), Within(), ContainsNotTouch(), Contains(), AfterTouches(), BeforeTouches(), Overlap(), Before() After() R-Tree Indexing Can be used in any type of scheduling Hotel room reservations Consultant scheduling Equipment scheduling Advantages: simpler SQL, Range indexing (R-Tree), proper behavior of a range Example: "Are there any situations where two different trains are scheduled on the same track over the next week?"
53
Other Bladelets ffvti Exec Sqllib/IUtil Shape JPGImage
Make external data file look like a table (read only) Exec Dynamic execution of stored procedures Capability now available in IDS stored procedures Sqllib/IUtil Compatibility functions with other vendors Shape Implements point, box, circle with indexing capabilities JPGImage Simple functions to manipulate JPEG images
54
New Applications Tracking Systems (TimeSeries, Spatial)
RTD buses, U-Haul, Cell phones Customer Services: closest location GPS application Security Systems: face recognition, fingerprints, etc. Handling hierarchical problems: material types, policies, XML, bill-of-material, etc.
55
Building Your Own Extensions
Use SPL before going to C or Java Use the DataBlade Development Kit (DBDK) At least at first to generate your function headers and packaging Learn the DataBlade API Many ESQL/C functions are also part of the DAPI Review the latest documentation in the release directory
56
Development Environment
SPL (Stored Procedure Language) Standard SPL environment C Include directory: $INFORMIXDIR/incl/public Makefile generated by DBDK DBDK Include file for Makefile: $INFORMIX/incl/dbdk/makeinc.<platform> Java Server configuration: onconfig parameters (see release notice and machine notes) $(INFORMIXDIR)/extend/krakatoa/krakatoa.jar
57
DBDK Tools BladeSmith BladePack DataBlade DB BladeManager App
Install Procedure BladeSmith Defines / builds DataBlade modules Generates functional tests Generates script info for install BladePack Organizes distribution Generates InstallShield scripts BladeManager Stores info in datablades_database Registers/Unregisters DataBlade modules in an application database C source files script files README, etc. DataBlade DB App func tests
58
BladeSmith Creates/manages DataBlade module projects
Wizard-based interface supports the creation of DataBlade module components Automatically generates source code Automatically generates the SQL registration files for BladeManager Generates functional tests Generates packaging files for BladePack
59
BladePack Reads BladeSmith’s packaging files
Lets the user add/remove files from the manifest Groups files into logical units for easier management at the installation site Produces a clickable installer (on Windows) or a directory tree (on UNIX) that can be copied to tape, floppies, or CD Installation bundle is used by BladeManager to register the DataBlade module in user databases
60
Blade Manager User copies the DataBlade module package to hard disk (the installation step) BladeManager understands the resulting directory hierarchy, and file contents DBA may register or unregister any available DataBlade module in any database (the registration step)
61
Programming Environment
Proprietary Thread implementation Non-Preemptible Threads Multiple Processes on UNIX Multiple Threads on NT
62
Threading Restrictions
Function Libraries Must be re-entrant Signals Must not be used in UDRs Memory Allocation Must use DAPI functions since the function can move from one CPU VP to another Blocking Calls Such as File Access (I/O) will block the entire server
63
Dynamic Libraries Used for Server-Side Functions
UNIX: dlopen(), dlsym(), dlclose() NT: loadLibrary(), GetProcAddress(), FreeLibrary() Symbol Visibility
64
Summary Databases today are more than plain SQL
Framework for business solutions You can use building blocks to speed up the creation of solutions The database can adapt to the business environment Informix is a full fledge partner in your business processing This can result in: Faster time-to-market Higher performance Lower maintenance costs = Business Advantage!
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.