Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Steps of QSAR/QSPR Investigations

Similar presentations


Presentation on theme: "Basic Steps of QSAR/QSPR Investigations"— Presentation transcript:

1 Basic Steps of QSAR/QSPR Investigations
In the name of GOD Basic Steps of QSAR/QSPR Investigations M.H. FATEMI Mazandaran University

2 QSAR Qualitative Structure-Activity Relationships
Can one predict activity (or properties in QSPR) simply on the basis of knowledge of the structure of the molecule? In other, words, if one systematically changes a component, will it have a systematic effect on the activity?

3 What is QSAR? A QSAR is a mathematical relationship between a biological activity of a molecular system and its geometric and chemical characteristics. QSAR attempts to find consistent relationship between biological activity and molecular properties, so that these “rules” can be used to evaluate the activity of new compounds.

4 Why QSAR? The number of compounds required for synthesis in order to place 10 different groups in 4 positions of benzene ring is 104 Solution: synthesize a small number of compounds and from their data derive rules to predict the biological activity of other compounds.

5

6 QSXR X=A Activity X=P Property X=R Retention
X= bo+ b1D1+ b2D2+…..+ bnDn bi regression coefficient Di descriptors n number of descriptors

7 History

8

9

10

11

12

13

14 Early Examples Hammett (1930s-1940s)

15 Hammett (cont.) Now suppose have a related series
s reflect sensitivity to substituent r reflect sensitivity to different system

16 Free-Wilson Analysis Log 1/C = S ai + m where C=predicted activity,
ai= contribution per group, and m=activity of reference

17 Free-Wilson example Log 1/C = -0.30 [m-F] + 0.21 [m-Cl] + 0.43 [m-Br]
activity of analogs Log 1/C = [m-F] [m-Cl] [m-Br] [m-I] [m-Me] [p-F] [p-Cl] [p-Br] [p-I] [p-Me] Problems include at least two substituent position necessary and only predict new combinations of the substituents used in the analysis.

18 Hansch Analysis Log 1/C = a p + b s + c where p(x) = log PRX – log PRH
and log P is the water/octanol partition This is also a linear free energy relation

19 Applications of QSAR 1-Drug design 2-Prediction of Chemical toxicity
3-Prediction of environmental activity 4-Prediction of molecular properties 5-Investigation of retention mechanism

20

21 Steps in QSPR/QSAR QSAR STEPS Structure Entry & Molecular Modeling
Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation

22 Data set selection 1-Structural similarity of studied molecules
2-Data collected in the same conditions 3-Data set would be as large as possible

23

24 Steps in QSPR/QSAR QSAR STEPS Structure Entry & Molecular Modeling
Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation

25 INTRODUCTION to Molecular Descriptors
Molecular descriptors are numerical values that characterize properties of molecules Molecular descriptors encoded structural features of molecules as numerical descriptors Vary in complexity of encoded information and in compute time Examples: Physicochemical properties (empirical) Values from algorithms, such as 2D fingerprints

26

27 Classical Classification of Molecular Descriptors
Constitutional, Topological 2-D structural formula Geometrical 3-D shape and structure Quantum Chemical Physicochemical Hybrid descriptors

28

29

30 Topological Indexes: Example:
Wiener Index Counts the number of bonds between pairs of atoms and sums the distances between all pairs Molecular Connectivity Indexes Randić branching index Defines a “degree” of an atom as the number of adjacent non-hydrogen atoms Bond connectivity value is the reciprocal of the square root of the product of the degree of the two atoms in the bond. Branching index is the sum of the bond connectivities over all bonds in the molecule. Chi indexes – introduces valence values to encode sigma, pi, and lone pair electrons

31

32 Electronic descriptors
Electronic interactions have very important roles in controlling of molecular properties. Electronic descriptors are calculated to encode aspects of the structures that are related to the electrons Electronic interaction is a function of charge distribution on a molecule

33

34

35 Physicochemical Properties Used in this QSAR
Liquid solubility Sw,L in mg/L and mmol/m3 Octanol-water partition coefficient Kow Liquid Vapor Pressure Pv,L in Pa Henry’s Law constant Hc in Pa∙m3/mole Boiling point

36 Steps in QSPR/QSAR QSAR STEPS Structure Entry & Molecular Modeling
Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation

37 Feature Selection E.g. comparing faces first requires the identification of key features. How do we identify these? The same applies to molecules. The second step of comparing items involves the selection of features. Many of our methods in molecular similarity are taken from psychology or computer science: I this example of face recognition, it would introduce much noise to compare every pixel of a number of features (which runs into tens of thousands) Instead, 20 characteristic points are selected which retain much of the information while discarding much of the noise The same step can be employed in the comparison of molecules

38

39 Objective feature selection
After descriptors have been calculated for each compound, this set must be reduced to a set of descriptors which is as information rich but as small as possible 1- Deleting of constant or near constant descriptors 2- Pair correlation cut-off selection 3- Cluster analysis 4- Principal component analysis 5- K correlation analysis

40

41

42

43

44

45

46

47 Variable reduction Principal Component Analysis

48 Principal Component PC1 = a1,1x1 + a1,2x2 + … + a1,nxn
Keep only those components that possess largest variation PC are orthogonal to each other

49 Subjective Feature Selection
The aim is to reach optimal model 1-Search all possible model (Best MLR) 2-Forward, Backward & Stepwise methods 3-Genetic algorithm 4-Mutation and selection uncover models 5-Cluster significance analysis 6-Leaps & bounds regression

50 Feature Selection: ACS
Most existing feature selection algorithms consist of : Starting point in the feature space Search procedure Evaluation function Criterion of stopping the search ACS

51 Feature Selection: ACS Starting point in the feature space
- no features - all features - random subset of features ACS

52 Forward Selection 1- variables are sequentially entered into the model. The first variable considered for entry into the equation is the one with the largest positive or negative correlation with the dependent variable. This variable is entered into the equation only if it satisfies the criterion for entry. 2-If the first variable is entered, the independent variable not in the equation that has the largest partial correlation is considered next. 3-The procedure stops when there are no variables that meet the entry criterion.

53 Forward Selection example

54 Backward Elimination 1- All variables are entered into the equation and then sequentially removed. 2-The variable with the smallest partial correlation with the dependent variable is considered first for removal. If it meets the criterion for elimination, it is removed. 3- After the first variable is removed, the variable remaining in the equation with the smallest partial correlation is considered next. 4-The procedure stops when there are no variables in the equation that satisfy the removal criteria.

55 Stepwise Stepwise. At each step, the independent variable not in the equation that has the smallest probability of F is entered, if that probability is sufficiently small. Variables already in the regression equation are removed if their probability of F becomes sufficiently large. The method terminates when no more variables are eligible for inclusion or removal.

56 Stepwise Example

57 Forward, Backward & Stepwise variable selection methods
Advantages Fast and simple Can do with very packages Limitation Risk of Local minima

58 Genetic algorithm Genetic Algorithm

59 Search Space

60 Definition Genetic algorithm is a general purpose search and optimization method based on genetic principles and Darwin’s law that applicable to wide variety of problems

61 Darvin’s rules Survival of fittest individuals Recombination Mutation

62 Biological background
Chromosome Gene Reproduction Mutation Fitness

63 GA basic operation Population generation (chromosome )
Selection (according to fitness ) Recombination and mutation (offspring) Repetition

64 GA flow chart Initialize population generation Evaluate
compute fitness for each chromosome Exploit perform natural selection Explore recombination & mutation operation

65 Every of chromosome is a string of bit 0 or 1
Binary Encoding Every of chromosome is a string of bit 0 or 1 Chromosome A Chromosome B

66 The best chromosome should survive and create new offspring.
Selection The best chromosome should survive and create new offspring. Roulette wheel selection Rank selection Steady state selection

67 Roulette wheel selection
Fitness 1> 2 > 3 >4

68 Crossover ( binary encoding )
*Single point = * Two point crossover =

69 Mutation * Bit inversion (binary encoding )
=> * Ordering change ( permutation encoding ) ( ) => ( )

70 Population generation
GA flow chart Start Population generation Fitness Selection Replace Crossover Mutation Test End

71 Parameters of GA Crossover rate Mutation rate Population size
Selection type Encoding Crossover and mutation type

72 Advantages of GA Parallelism Provide a group of potential solutions
Easy to implement Provide global optima

73 How many descriptors can be used in a QSAR model?
Rule of tumb: - Per descriptor at least 5 data point (molecule) must be exist in the model Otherwise possibility of finding coincidental correlation is too high

74 Steps in QSPR/QSAR QSAR STEPS Structure Entry & Molecular Modeling
Descriptor Generation Construct Model MLRA or CNN Feature Selection Model Validation

75

76

77 Questions?


Download ppt "Basic Steps of QSAR/QSPR Investigations"

Similar presentations


Ads by Google