Presentation is loading. Please wait.

Presentation is loading. Please wait.

Emerging Research Directions in DBs/ISs

Similar presentations


Presentation on theme: "Emerging Research Directions in DBs/ISs"— Presentation transcript:

1 Emerging Research Directions in DBs/ISs

2 Outline Mobile Databases Multimedia Databases
Geographic Information Systems Bioinformatics XML Data Mining Data Warehousing Introduction to ASIS Lab 2

3 Mobile Databases Recent advances in portable and wireless technology led to mobile computing, a new dimension in data communication and processing. Portable computing devices coupled with wireless communications allow clients to access data from virtually anywhere and at any time. There are a number of hardware and software problems that must be resolved before the capabilities of mobile computing can be fully utilized. Some of the software problems – which may involve data management, transaction management, and database recovery – have their origins in distributed database systems. 3

4 Mobile Databases(2) In mobile computing, the problems are more difficult, mainly: The limited and intermittent connectivity afforded by wireless communications. The limited life of the power supply(battery). The changing topology of the network. In addition, mobile computing introduces new architectural possibilities and challenges. Intermittent: khong lien tuc, gia’n doan 4

5 Mobile Computing Architecture
Fixed hosts: Perform the transaction and data management functions with the help of database servers Mobile units: Portable computers, move around a geographical region that is a collection of mobile cells 5

6 Mobile Computing Architecture(2)
It is distributed architecture where a number of computers, generally referred to as Fixed Hosts and Base Stations are interconnected through a high-speed wired network. Fixed hosts are general purpose computers configured to manage mobile units. Base stations function as gateways to the fixed network for the Mobile Units. 6

7 Data Management Issues
From a data management standpoint, mobile computing may be considered a variation of distributed computing. Mobile databases can be distributed under two possible scenarios: The entire database is distributed mainly among the wired components, possibly with full or partial replication. A base station or fixed host manages its own database with a DBMS-like functionality, with additional functionality for locating mobile units and additional query and transaction management features to meet the requirements of mobile environments. The database is distributed among wired and wireless components. Data management responsibility is shared among base stations or fixed hosts and mobile units. A mobile database is a database than can be connected to by a mobile computing device over a mobile network. The client and server have wireless connections. A cache is maintained to hold frequent data and transactions so that they are not lost due to connection failure. 7

8 Data Management Issues(2)
Data management issues as it is applied to mobile databases: Data distribution and replication Transactions models Query processing Recovery and fault tolerance Mobile database design Location-based service Division of labor Security M-Commerce 8

9 Outline Mobile Databases Multimedia Databases
Geographic Information Systems Bioinformatics XML Data Mining Data Warehousing Introduction to ASIS Lab 9

10 Multimedia Databases In the years ahead multimedia information systems are expected to dominate our daily lives. Our houses will be wired for bandwidth to handle interactive multimedia applications. Our high-definition TV/computer workstations will have access to a large number of databases, including digital libraries, image and video databases that will distribute vast amounts of multisource multimedia content. Dominate: troi hon, thong tri Vast: rong lon 10

11 Multimedia Databases (2)
DBMSs have been constantly adding to the types of data they support. Today many types of multimedia data are available in current systems. 11

12 Multimedia Databases(3)
Types of multimedia data are available in current systems Text: May be formatted or unformatted. For ease of parsing structured documents, standards like SGML and variations such as HTML are being used. Graphics: Examples include drawings and illustrations that are encoded using some descriptive standards (e.g. CGM, PICT, postscript). 12

13 Multimedia Databases(4)
Types of multimedia data are available in current systems (contd.) Images: Includes drawings, photographs, and so forth, encoded in standard formats such as bitmap, JPEG, and MPEG. Compression is built into JPEG and MPEG. These images are not subdivided into components. Hence querying them by content (e.g., find all images containing circles) is nontrivial. Animations: Temporal sequences of image or graphic data. Temporal: hien thi theo thoi gian 13

14 Multimedia Databases(5)
Types of multimedia data are available in current systems (contd.) Video: A set of temporally sequenced photographic data for presentation at specified rates– for example, 30 frames per second. Structured audio: A sequence of audio components comprising note, tone, duration, and so forth. MPEG-4 Structured Audio consists of several components, most notably an audio programming language called SAOL. SAOL is historically related to Csound and other so-called Music-N languages 14

15 Multimedia Databases(6)
Types of multimedia data are available in current systems (contd.) Audio: Sample data generated from aural recordings in a string of bits in digitized form. Analog recordings are typically converted into digital form before storage. Aural: nghe duoc bang tai 15

16 Multimedia Databases(7)
Types of multimedia data are available in current systems (contd.) Composite or mixed multimedia data: A combination of multimedia data types such as audio and video which may be physically mixed to yield a new storage format or logically mixed while retaining original types and formats. Composite data also contains additional control information describing how the information should be rendered. 16

17 Data Management Issues
Multimedia applications dealing with thousands of images, documents, audio and video segments, and free text data depend critically on Appropriate modeling of the structure and content of data Designing appropriate database schemas for storing and retrieving multimedia information. 17

18 Outline Mobile Databases Multimedia Databases
Geographic Information Systems Bioinformatics XML Data Mining Data Warehousing Introduction to ASIS Lab Revision 18

19 Geographic Information Systems
Geographic information systems(GIS) are used to collect, model, and analyze information describing physical properties of the geographical world. 19

20 Geographic Information Systems(2)
The scope of GIS broadly encompasses two types of data: Spatial data, originating from maps, digital images, administrative and political boundaries, roads, transportation networks, physical data, such as rivers, soil characteristics, climatic regions, land elevations, and Non-spatial data, such as socio-economic data (like census counts), economic data, and sales or marketing information. GIS is a rapidly developing domain that offers highly innovative approaches to meet some challenging technical demands. 20

21 Geographic Information Systems(3)
21

22 Spatial data 22

23 GIS Applications It is possible to divide GISs into three categories:
Cartographic applications Digital terrain modeling applications Geographic objects applications Cartographic : ve ban do Terrain: dia hi`nh 23

24 GIS Applications(2) GIS Applications Geographic Objects Applications
Civil engineering and military evaluation GIS Applications Cartographic Irrigation Crop yield analysis Land Evaluation Planning and Facilities management Landscape studies Traffic pattern analysis Digital Terrain Modeling Applications Air and water pollution studies Earth science Soil Surveys Flood Control Water resource management Consumer product and services – economic analysis Geographic Objects Applications Car navigation systems Utility distribution and consumption Geographic market analysis 24

25 Data Management Requirements of GIS
The functional requirements of the GIS applications above translate into the following database requirements. 25

26 Data Management Requirements of GIS (2)
Data Modeling and Representation GIS data can be broadly represented in two formats: Vector data represents geometric objects such as points, lines, and polygons. 26

27 Data Management Requirements of GIS (3)
Data Modeling and Representation (contd.): Raster data is characterized as an array of points, where each point represents the value of an attribute for a real-world location. Informally, raster images are n-dimensional array where each entry is a unit of the image and represents an attribute. Two-dimensional units are called pixels, while three-dimensional units are called voxels. Three-dimensional elevation data is stored in a raster-based digital elevation model (DEM) format. Voxels: 27

28 Data Management Requirements of GIS (4)
Data Integration GISs must integrate both vector and raster data from a variety of sources. Sometimes edges and regions are inferred from a raster image to form a vector model, or conversely, raster images such as aerial photographs are used to update vector models. Several coordinate systems such as Universal Transverse Mercator (UTM), latitude/longitude, and local cadastral systems are used to identify locations. Data originating from different coordinate systems requires appropriate transformations. 28

29 Specific GIS Data Operations
GIS applications are conducted through the use of special operators such as the following: Interpolation Interpretation Proximity analysis Raster image processing Analysis of networks Interpolation: nội suy Interpretation: biểu diễn Proximity: trạng thái (không gian) 29

30 Specific GIS Data Operations(2)
The functionality of a GIS database is also subject to other considerations: Extensibility Data quality control Visualization Such requirements clearly illustrate that standard RDBMSs or ODBMSs do not meet the special needs of GIS. Therefore it is necessary to design systems that support the vector and raster representations and the spatial functionality as well as the required DBMS features. 30

31 Outline Mobile Databases Multimedia Databases
Geographic Information Systems Bioinformatics XML Data Mining Data Warehousing Introduction to ASIS Lab Revision 31

32 Bioinformatics Bioinformatics: The study of genetics can be divided into three branches: Mendelian genetics is the study of the transmission of traits between generations Molecular genetics is the study of the chemical structure and function of genes at the molecular level Population genetics is the study of how genetic information varies across populations of organisms Bioinformatics addresses information management of genetic information with special emphasis on DNA sequence analysis Interdisciplinary research field Trait: dac diem Mendelian: theo thuyet di truyen Mendel Molecular: phan tu Organism: sinh vat Interdisciplinary : lien quan toi 32

33 Outline Mobile Databases Multimedia Databases
Geographic Information Systems Bioinformatics XML Data Mining Data Warehousing Introduction to ASIS Lab Revision 33

34 XML: Extensible Markup Language
Although HTML is widely used for formatting and structuring Web documents, it is not suitable for specifying structured data that is extracted from databases. A new language—namely XML (eXtended Markup Language) has emerged as the standard for structuring and exchanging data over the Web. XML can be used to provide more information about the structure and meaning of the data in the Web pages rather than just specifying how the Web pages are formatted for display on the screen. The formatting aspects are specified separately—for example, by using a formatting language such as XSL (eXtended Stylesheet Language). 34

35 XML (2) Example1: Example2: 35

36 XML (3) The basic object is XML is the XML document.
There are two main structuring concepts that are used to construct an XML document: Elements Attributes Attributes in XML provide additional information that describe elements. 36

37 XML(4) As in HTML, elements are identified in a document by their start tag and end tag. The tag names are enclosed between angled brackets <…>, and end tags are further identified by a backslash </…>. Complex elements are constructed from other elements hierarchically, whereas simple elements contain data values. It is straightforward to see the correspondence between the XML textual representation and the tree structure. In the tree representation, internal nodes represent complex elements, whereas leaf nodes represent simple elements. That is why the XML model is called a tree model or a hierarchical model. 37

38 Outline Mobile Databases Multimedia Databases
Geographic Information Systems Bioinformatics XML Data Mining Data Warehousing Introduction to ASIS Lab Revision 38

39 Definitions of Data Mining
The discovery of new information in terms of patterns or rules from vast amounts of data. The process of finding interesting structure in data. The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data. 39

40 Knowledge Discovery in Databases (KDD)
Data mining is actually one step of a larger process known as knowledge discovery in databases (KDD). The KDD process model comprises six phases Data selection Data cleansing Enrichment Data transformation or encoding Data mining Reporting and displaying discovered knowledge hiện tri thức và khai phá dữ liệu (KDD – Làm sạch dữ liệu (data cleaning): loại bỏ nhiễu hoặc các dữ liệu không thích hợp. Tích hợp dữ liệu (data integration): tích hợp dữ liệu từ các nguồn khác nhau như: CSDL, Kho dữ liệu, file text... Chọn dữ liệu (data selection): ở bước này, những dữ liệu liên quan trực tiếp đến nhiệm vụ sẽ được thu thập từ các nguồn dữ liệu ban đầu. Chuyển đổi dữ liệu (data transformation): trong bước này, dữ liệu sẽ được chuyển đổi về dạng phù hợp cho việc khai phá bằng cách thực hiện các thao tác nhóm hoặc tập hợp. Khai phá dữ liệu (data mining): là giai đoạn thiết yếu, trong đó các phương pháp thông minh sẽ được áp dụng để trích xuất ra các mẫu dữ liệu. Đánh giá mẫu (pattern evaluation): đánh giá sự hữu ích của các mẫu biểu diễn tri thức dựa vào một số phép đo. Trình diễn dữ liệu (knowlegde presentation): sử dụng các kĩ thuật trình diễn và trực quan hoá dữ liệu để biểu diễn tri thức khai phá được cho người sử dụng. Knowledge Discovery and Data Mining) Enrichment: lam phong phu them 40

41 Outline Mobile Databases Multimedia Databases
Geographic Information Systems Bioinformatics XML Data Mining Data Warehousing Introduction to ASIS Lab Revision 41

42 Data Warehousing The data warehouse is a historical database designed for decision support. Data mining can be applied to the data in a warehouse to help with certain types of decisions. Proper construction of a data warehouse is fundamental to the successful use of data mining. W. H Inmon characterized a data warehouse as: “A subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management’s decisions.” 42

43 Data Warehousing (2) Purpose of Data Warehousing
Traditional databases are not optimized for data access only they have to balance the requirement of data access with the need to ensure integrity of data. Most of the times the data warehouse users need only read access but, need the access to be fast over a large volume of data. Most of the data required for data warehouse analysis comes from multiple databases and these analysis are recurrent and predictable to be able to design specific software to meet the requirements. There is a great need for tools that provide decision makers with information to make decisions quickly and reliably based on historical data. The above functionality is achieved by Data Warehousing and Online analytical processing (OLAP) Online Analytical Processing, or OLAP (IPA: /ˈoʊlæp/), is an approach to quickly provide answers to analytical queries that are multi-dimensional in nature.[1] OLAP is part of the broader category business intelligence, which also encompasses relational reporting and data mining.[2] The typical applications of OLAP are in business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing).[3] Databases configured for OLAP employ a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases and hierarchical databases that are speedier than their relational kin.[4] Ngôn ngữ xử lý phân tích trực tuyến (OLAP - On-Line Analytical Prosessing), rất phù hợp với kho dữ liệu, ngôn ngữ này tương tự với ngôn ngữ truy vấn SQL và tập trung vào các câu lệnh sau : Thu nhỏ (roll-up) : ví dụ : nhóm dữ liệu theo năm thay vì theo quý. Mở rộng (drill-down) : ví dụ : mở rộng dữ liệu, nhìn theo tháng thay vì theo quý. Cắt lát (slice) : nhìn theo từng lớp một. Ví dụ : từ danh mục bán hàng của Q1, Q2, Q3, Q4 chỉ xem của Q1. Thu nhỏ (dice) : bỏ bớt một phần của dữ liệu ( tương ứng thêm điều kiện vào câu lệnh WHERE trong SQL). 43

44 Data Warehousing (3) Applications that data warehouse supports are:
OLAP (Online Analytical Processing) is a term used to describe the analysis of complex data from the data warehouse. DSS (Decision Support Systems) also known as EIS (Executive Information Systems) supports organization’s leading decision makers for making complex and important decisions. Data Mining is used for knowledge discovery, the process of searching data for unanticipated new knowledge. Hệ thống thông tin điều hành EIS là một sản phẩm phụ của DSS EIS hỗ trợ quyết định thực hiện và hỗ trợ khả năng tìm kiếm trong các môi trường một cách tự động. EIS phải xử lý được các vaanf đề thông tin không đầy đủ, không chính xác, không rõ ràng và có liên quan đến vấn đề tương lai. DSS: xác định và giải quyết bài toán. Phân tích dữ liệu. Dữ liệu lấy từ các ứng dụng sử dụng giao dịch 44

45 Conceptual Structure of Data Warehouse
Data Warehouse processing involves Cleaning and reformatting of data OLAP Data Mining 45

46 Comparison with Traditional Databases
Data Warehouses are mainly optimized for appropriate data access. Traditional databases are transactional and are optimized for both access mechanisms and integrity assurance measures. Data warehouses emphasize more on historical data as their main purpose is to support time-series and trend analysis. Compared with transactional databases, data warehouses are nonvolatile. In transactional databases transaction is the mechanism change to the database. By contrast information in data warehouse is relatively coarse grained and refresh policy is carefully chosen, usually incremental. Coarse: kem 46

47 Outline Mobile Databases Multimedia Databases
Geographic Information Systems Bioinformatics XML Data Mining Data Warehousing Introduction to ASIS Lab Revision 47

48 Introduction to ASIS Lab
Advances in Security & Information Systems Lab ( ) Research Directions ( ) Information Systems Security: Database Security Security Issues in E-/M-Commerce Security and Privacy in Location-Based Applications Security Issues in Outsourced Databases Services DBs/ISs Security Visualization E-Learning Systems Security Digital Watermarking and Steganography Privacy and Identity Management 48

49 Introduction to ASIS_Lab(2)
Research Directions ( ) (cont.) Advanced Information Systems: E-/M-Commerce SOA-Based Modern Information Systems Large Database Systems Web Information Systems Modern Information Retrieval Systems Stream Data Management* Bioinformatics* 49

50 Outline Mobile Databases Multimedia Databases
Geographic Information Systems Bioinformatics XML Data Mining Data Warehousing Introduction to ASIS Lab 50

51 Q&A Question ? 51


Download ppt "Emerging Research Directions in DBs/ISs"

Similar presentations


Ads by Google