Presentation is loading. Please wait.

Presentation is loading. Please wait.

Relational Data Mining in Finance Haonan Zhang CFWin03-37 03/04/2003.

Similar presentations


Presentation on theme: "Relational Data Mining in Finance Haonan Zhang CFWin03-37 03/04/2003."— Presentation transcript:

1 Relational Data Mining in Finance Haonan Zhang CFWin03-37 hnzhang@cs.umanitoba.ca 03/04/2003

2 Organization Motivation & Introduction Background Problem statement Solution Outcome Conclusion and future work

3 Motivation & Introduction Motivation: Analyze finance data, find and extract hidden patterns and relations between data. Thus the result will make good support for decision making in finance. To predict financial markets is a complex and challenging task because several reasons:  The dimensionality of the problem is high.  The relationships among independent and dependent variable are weak and non-linear. Data mining is a process that analyze data from different perspectives and explicitly show the interaction between data with a given confidence. It is suitable for predict financial markets

4 Motivation & Introduction cont.. Traditional data mining methods have their limitations, such as lack of knowledge representation and limited forms of background knowledge. The Inductive Logic Programming (ILP) and Relational Data mining (RDM) can overcome these limitations.  The ILP can naturally incorporate background knowledge and relations between objects into the leaning process;  The RDM can discover hidden relations (general first order relations) in numerical and symbolic data using background knowledge (domain theory)

5 Background A predicate is a binary function. A predicate can be defined extensionally as a set of tuples for which the predicate is true, or intentionally as a set of (Horn) clauses for computing whether the predicate is true. A literal is a predicate or its negation A horn clause consists of two parts: a clause head and a clause body. A clause head is defined as a single predicate. A clause body is defined as a conjunction of literals.

6 Background cont.. FOIL (First Order Inductive Learning) algorithm  Input: a target relation to be learned, a set of positive, a set of negative examples of the relation, and a set of background relations.  Learning approach: separate-and-conquer approach: It learns a clause at a time, then it remove training examples covered by the clause, then begin to learn the subsequent clauses. FOIL tries to cover as much as positive training examples and cover no negative training examples. The algorithm will terminate when all positive training examples are covered.  Output: a set of clauses that describe the target relation.

7 Background cont.. The FOCL (First Order Combined Learner) algorithm extends the FOIL algorithm in several ways.  The FOCL constrains the search by using variable typing, and inter-argument constraints.  The FOCL uses background knowledge to improve the learning process, such as rules which are defined by a collection of examples, and a partial possible incorrect rule which is an initial approximation of the predicate to be learned.

8 Background cont.. The MMDR (Machine Method for Discovering Regularities) algorithm focuses on generating probabilistic first-order rules and measurement issues for numerical relational  MMDR permits various forms of background knowledge, such as constraints, predefined predicates and partial (may be incorrect) rules.  MMDR uses the statistical significance of hypotheses and the strength of data types scales, to limit the search space.

9 Problem statement The FOIL input a target relation to be learned, a set of positive and a set of negative examples, and a set of background knowledge. The output would be a set of clauses that describe the target relation using background knowledge

10 Solution separate-and-conquer approach

11 Solution cont.. Two kinds of literals can be appended to develop a clause.  gainful literals: literals may increase the covering of positive examples. Gainful literals are evaluated using information heuristic. The average information provided by discovery that literal of the bindings is When new literal m is added, suppose that some of the bindings are excluded, and k of then + bindings are not excluded. The total gain is

12 Solution cont..  determinate literals: A determinate literal introduces new variable. The new partial clause has the same binding for each positive binding of current clause, and at most one binding for each negative binding of current clause. Therefore, sometimes determinate literal has zero gain.

13 Solution cont.. Four forms of literals are considered can appear in a clause:

14 Solution cont.. In order to learn recursive theories without leading to infinite recursion, FOIL uses three approaches to assure that the recursive literals are risk free.  Ordering constants: the algorithm can discover an ordering of constant and order constants.  Ordering pairs of variables: when a type’s constant is ordered, the ordering of a pair of variable V i and V j of same type in a partial clause may also exist.  Ordering recursive literals: the ordering among variables can be extended to an ordering of literals involving a predicate and variables.

15 Flow Diagram

16 Outcome

17 Conclusion and future work The FOIL algorithm can effectively find the hidden relations between target relation and background knowledge and represent the target relation using background knowledge. The FOIL algorithm uses a very complex recursive control and backup scheme, which increase the complexity of the algorithm. Further implementation needs a better understanding of these schemes. Future work: implement the FOIL algorithm in parallel computers.


Download ppt "Relational Data Mining in Finance Haonan Zhang CFWin03-37 03/04/2003."

Similar presentations


Ads by Google