1 A method of successive elimination of spurious arguments for effective solution the search- based modelling tasks Oleksandr Samoilenko, Volodymyr Stepashko.

Slides:



Advertisements
Similar presentations
2.1 Program Construction In Java
Advertisements

Real-Time Template Tracking
College of Information Technology & Design
Problems and Their Classes
5.4 Basis And Dimension.
Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Experiments and Variables
Time Varying Coefficient Models; A Proposal for selecting the Coefficient Driver Sets Stephen G. Hall, P. A. V. B. Swamy and George S. Tavlas,
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Analysis of Algorithms. Time and space To analyze an algorithm means: –developing a formula for predicting how fast an algorithm is, based on the size.
PART 7 Constructing Fuzzy Sets 1. Direct/one-expert 2. Direct/multi-expert 3. Indirect/one-expert 4. Indirect/multi-expert 5. Construction from samples.
Upper Bounds on the Time and Space Complexity of Optimizing Additively Separable Functions Matthew J. Streeter Carnegie Mellon University Pittsburgh, PA.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Chapter 11: Limitations of Algorithmic Power
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Searching1 Searching The truth is out there.... searching2 Serial Search Brute force algorithm: examine each array item sequentially until either: –the.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
Induction and recursion
CHAPTER 7: SORTING & SEARCHING Introduction to Computer Science Using Ruby (c) Ophir Frieder at al 2012.
PULSE MODULATION.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Chapter 10 Review: Matrix Algebra
Introduction to Error Analysis
Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices
Presentation by: H. Sarper
Methods of Solving the Problem of Construction the Optimum Regression Model as a Discrete Optimization Task Ivan Melnyk The International Research and.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
LECTURE 2. GENERALIZED LINEAR ECONOMETRIC MODEL AND METHODS OF ITS CONSTRUCTION.
Department of Electrical Engineering, Southern Taiwan University Robotic Interaction Learning Lab 1 The optimization of the application of fuzzy ant colony.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Vectors and Matrices In MATLAB a vector can be defined as row vector or as a column vector. A vector of length n can be visualized as matrix of size 1xn.
1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.
Experimental research in noise influence on estimation precision for polyharmonic model frequencies Natalia Visotska.
Section 10.1 Confidence Intervals
Starting Out with C++ Early Objects Seventh Edition by Tony Gaddis, Judy Walters, and Godfrey Muganda Modified for use by MSU Dept. of Computer Science.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
M ONTE C ARLO SIMULATION Modeling and Simulation CS
O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin.
Monte-Carlo method for Two-Stage SLP Lecture 5 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
Kinematic Synthesis 2 October 8, 2015 Mark Plecnik.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Linear Systems – Iterative methods
Optimal Paralleling for Solving Combinatorial Modelling Problems Volodymyr Stepashko, and Serhiy Yefimenko International Research and Training Centre of.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
The Stable Marriage Problem
27-Jan-16 Analysis of Algorithms. 2 Time and space To analyze an algorithm means: developing a formula for predicting how fast an algorithm is, based.
1. Searching The basic characteristics of any searching algorithm is that searching should be efficient, it should have less number of computations involved.
FORECASTING METHODS OF NON- STATIONARY STOCHASTIC PROCESSES THAT USE EXTERNAL CRITERIA Igor V. Kononenko, Anton N. Repin National Technical University.
Single Correlator Based UWB Receiver Implementation through Channel Shortening Equalizer By Syed Imtiaz Husain and Jinho Choi School of Electrical Engineering.
State-Space Recursive Least Squares with Adaptive Memory College of Electrical & Mechanical Engineering National University of Sciences & Technology (NUST)
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
CSE 330: Numerical Methods. What is true error? True error is the difference between the true value (also called the exact value) and the approximate.
ICS 3UI - Introduction to Computer Science
MECH 373 Instrumentation and Measurement
Chapter 7. Classification and Prediction
International Workshop
Going Backwards In The Procedure and Recapitulation of System Identification By Ali Pekcan 65570B.
Lower Bound Theory.
Unfolding Problem: A Machine Learning Approach
Chapter 11 Limitations of Algorithm Power
Paige Thielen, ME535 Spring 2018
Lecture # 2 MATHEMATICAL STATISTICS
Sum this up for me Let’s write a method to calculate the sum from 1 to some n public static int sum1(int n) { int sum = 0; for (int i = 1; i
Unfolding with system identification
16. Mean Square Estimation
Presentation transcript:

1 A method of successive elimination of spurious arguments for effective solution the search- based modelling tasks Oleksandr Samoilenko, Volodymyr Stepashko

2 Introduction : The paper considers the problem of tasks solving with a large number of arguments by combinatorial GMDH algorithm. Previously we considered GMDH algorithms for solving the problems with a large number of arguments based on algorithms with successive selection of the most informative arguments. These algorithms build models very quickly but the accuracy of these models is not always sufficient. The quality of models built in such a way depends on the quality of arguments selection. Thus improvement of the method with successive selection of arguments for the rising of the quality and effectiveness of the informative arguments selection is the main goal of this paper.

3 Contents: 1.Problem statement 2.Solving of the problem  Algorithm with successive complication  Algorithm with successive selection of arguments  Algorithm of successive elimination of spurious arguments with inverted structures 3.Results of experiments to comparison of algorithms effectiveness 4.Conclusion

4 1. Problem statement 1. Suppose we are given a data sample W=(X y), dіm W=n  (m+1). 2.The relationship between an output y and s 0 <m relevant inputs holds: (1) is an exact or true output of an object (signal),  0 is a vector of true parameters, is a vector of stochastic disturbances (noise), is a submatrix of the matrix X with s 0 vectors influencing the output value y in that the number s 0 and composition of the vectors is unknown.

5 1. Problem statement 3.It is necessary to search for the optimal model in the form: (2) is a vector of unknown parameters being estimated. Vector of estimated parameters θ(s) determines a model of the complexity s for the sample W.

6 1. Problem statement The quality of a model is determined as the regularity criterion AR(s) supposing division of the sample X into 2 subsamples X A and X B. We estimate model parameters on the training subsample X A and calculate the error on the testing subsample X B. (3) where θ Аs is the vector of parameters estimated on the subsample X A. The model of optimum quality: (4)

7 1. Problem statement When using the combinatorial algorithm, a retrieval of all possible models with selection of the best model by the criterion (3) is carrying out. When number of arguments is not very large the exhaustive search can be carried out. In such a way, the total amount P m of all possible models containing 1...m arguments is calculated by the formula: (5) When the arguments number is greater than 20, the exhaustive search for the acceptable limit of time is often impossible.

8 2. Solving of the problem Let us start with an example: m=20, n=50, s 0 =10, and analyze the dependence of the criterion AR on the model complexity s. The model quality for the complexity greater than s 0 becomes lower and it has no sense to analyze such models. Hence it is better to use an algorithm which does not consider all models and sorts models of the complexity 1, then 2 and so forth, until the criterion for the next complexity becomes to increase.

Algorithm with successive complication (Alg.1) step 1: The structures of complexity s are generated and matrices X A, y are built; step 2: Model coefficients of any structure are estimated using Gauss method on the training subsample X A ; step 3: The quality criterion AR(s) for a model is calculated on the testing subsample X B and the best model of complexity s is selected; step 4: If the model quality for s is better than that for s-1 then the complexity of models is increased and we turn to step 1 else the cycle is finished.

Algorithm with successive complication Fig. 1. Comparison of the algorithms effectiveness as depending on s 0 The Figure 1 illustrates the computational effectiveness of this algorithm for different s 0 as comparing to the exhaustive search algorithm As the figure shows, if s 0 = 10 than such algorithm spends almost half of the time needed for the exhaustive search algorithm.

11 As it is evident from (5), removing of an argument from the set arguments halves the count of searching models. Consequently, for the acceleration of finding of the best model it is needed to find such arguments which will not substantially influence on the model and to remove them from the set, leaving most informing. Such approach is offered by Aleksey Ivakhnenko and Eugenia Savchenko, where it is suggested to estimate the level of arguments informativeness regarding to the module of the argument correlation coefficient with the output variable. Other approach has been suggested by Volodymyr Stepashko and Yuri Koppa. The level of arguments informativeness can be estimated considering how many of the best models contain this argument

Algorithm with successive elimination of spurious arguments (Alg.2) On the basis of these results, let us consider the algorithm with successive elimination of spurious arguments. Models are selected by this algorithm using the algorithm of successive complication with increasing of it until the calculation time is permissible. The best models and arguments included to these models are then examined only. A new set which consists only of those arguments taking active part in forming the best models is thus formed. Further models are built on this new set and the sequence of such operations are again repeated until the set will contain so many arguments that it would be possible to perform an exhaustive search.

Algorithm with successive elimination of spurious arguments (Alg.2) step 1: build the models of the complexities allowing to fit the given time limit using the Alg. 1; step 2: select a subset of F the best models by an external criterion; step 3:rank all the arguments being contained in this F models by the coefficient q i, i=1…m, specifying the frequency of an i-th argument occurrence in the best models; step 4: form a new sample by removing the arguments with the least values of q i ; step 5: perform the exhaustive search of models if the amount of arguments in the new sample is acceptable or return to the step 1 otherwise.

Algorithm with successive elimination of spurious arguments Let us investigate the effectiveness of this algorithm at first theoretically. When the task with m=20, s0=10 is solved by Alg. 1, then the amount P m of all possible models containing no more than s 0 arguments is calculated with the use of formula (5). (6) The models amount built by the Alg. 2 to get the result of the exhaustive search with 20 arguments is equal to that is considerably less than that by the Alg. 1 (616665, see (6)). As for the computing time, the figures are as follows: 3 sec for Alg. 2 and 24 sec for Alg.1. The combinatorial algorithm with the exhaustive search finds the same model in 48 sec.

15 Let us consider the same task for m = 200 using the Alg. 2. This algorithm takes 4 sec for solving the problem. It’s very quickly for this amount of arguments. But the accuracy of the built models is not sufficient Algorithm with successive elimination of spurious arguments Tab. 1. Amount of models on each stage of the algorithm with successive elimination of spurious arguments, m=200 Stages ms max PmPm Stage Stage Stage Stage Exhaustive search 4415 Total amount of models, Alg

Algorithm with successive elimination of spurious arguments with invert structures (Alg.3) To raise the quality of arguments extraction we propose to use the algorithm with inverted structures (Alg. 3). This algorithm is based on algorithm with successive elimination of spurious arguments, but with an addition. On every stage we will consider the models of complexity s together with the “conjugate” models of complexity m-s notably the models with invert structures. For sample invert structure of is a structure etc.

Algorithm with successive elimination of spurious arguments with invert structures Tab. 2. Amount of models on each stage of the algorithm with successive elimination of spurious arguments with invert structures, m=200

18 3. Results of experiments to comparison of algorithms effectiveness Algorithms Freedom of choice Amount of models stAR Algorithm sec29,98 Algorithm sec31,97 Algorithm sec10,64 Algorithm sec1,74 Tab.3. Comparison the effectiveness of algorithms 2 and 3, m=200.

19 3. Results of experiments to comparison of algorithms effectiveness Fig.5 Algorithms effectivity comparison on exam sample, m=200. Training subsample:p1-p161 Testing subsample:p161-p230 Exam sub:p231-p240

20 3. Results of experiments to comparison of algorithms effectiveness Fig.6. The models built in the experiments

21 Conclusion Use of the algorithm of successive elimination of spurious arguments using inverse structures enables to essentially accelerate the retrieval for the best subset of regressors and to solve tasks with considerably larger number of regressors compared with ordinary combinatorial GMDH algorithm of exhaustive search of arguments.

22 Thank you!