Presentation is loading. Please wait.

Presentation is loading. Please wait.

X Y The significance of the structure of data on PLS predictions of protein involving both natural and human experimental design Åsmund Rinnan Lars Munck.

Similar presentations


Presentation on theme: "X Y The significance of the structure of data on PLS predictions of protein involving both natural and human experimental design Åsmund Rinnan Lars Munck."— Presentation transcript:

1 X Y The significance of the structure of data on PLS predictions of protein involving both natural and human experimental design Åsmund Rinnan Lars Munck

2 Three Data-sets of barley B + C: The major substances protein, starch, cellulose, beta- glucan, fat and water are weighted to represent biological composition ABC NaturalSimulatedDoE 31 54 All measured on NIR 6500 from 1100-2498nm with 2 nm intervals Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion Normal barley Protein mutants Carbohydrate mutants

3 Pre-processing of spectra Moving Window SNV with 130 nm window The 1580-2498 nm spectral area visualizes the least differences between the three data sets Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

4 PCA 1100-2500nm Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

5 Interval PCA selects 1804-2060 nm giving the least differences between datasets. Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

6 Predicting protein Using the three datasets NatSimDoE RMSE 0.711.080.69 r2r2 0.90.840.96 nLV 525 intercept 1.092.120.48 slope 0.930.860.97 Regression coefficients Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

7 PLS diagnostics (to protein) A.Simple correlation coefficients: wave-length absorbtion to protein content. B.PLS Regression coefficients Natural Simulated DoE Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

8 Isolating the chemical and biological components of the data-sets. ABC Natural Simulated Natural DoE 31 54 Chemistry SimBiology RestBiology SimBiology Chemistry SimBiology = B – C RestBiology = (A – C) – (B – C) Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

9 Predicting protein: by PLS: Chemistry and non simulated(rest) biology show high contributions while that of simulated biology is low. ChemistrySimBioRestBio RMSE 0.942.531.31 R2 0.870.130.76 nLV 313 intercept 1.5812.93.15 slope 0.900.170.80 Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

10 Normalized regression coefficients Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

11 Back to data, selected wavelengths Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion Full PLSCorrelation-PLS Wavelengths abs to protein Assignment PLS Phil Williams

12 Quick comparison Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

13 Results: Summary Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

14 Interpretation: We are working by ”Permutation science”: 1.By mathematical validation of models  permutation of data in chemometrics i.e cross- validation Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

15 ”Permutation science”: 2.Design of Experiments (DoE)  Permutation of data through experiments by human design. Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

16 ”Permutation science”: 1.By mathematical validation of models  permutation of data in chemometrics i.e. crossvalidation 2.Design of Experiments (DoE)  Permutation of data through experiments by human design. 3. Natural design  Permutation by selection of unique natural states where nature reveals its principles in data. Question: In chemometrics why not combine them all rather than focusing on mathematical permutation alone? All three permutation approaches are in the heart of chemometric validation of models! Why not use them together as we have done here. They are complementary. Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

17 Principles of natural processes are reflected in data The solar eclipse reveals solar eruptions The NIR barley endosperm mutant model developed since 1965 with expression control of genetics and environment Two types of mutants: regulative protein mutants – P and carbohydrate (starch) mutants – C (normal barley – N) *) *) http://science.nationalgeographic.com/science/enlarge/solar-eclipse-moon.html J.Chemometrics 24: 481-495 (2010) Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

18 How were the mutants found? By a bi-variate plot % protein to mmol DBC (Dye binding capacity by acilanorange) The Dyebinding Capacity (DBC) instrument for basic amino acids (lysine). Background: Development of screening methods for improving lysine and nutritional quality in barley LM at the nutritional laboratory of the Swedish seed Ass. Svalöf in 1967. High lysine Mutation Mutation recombinants Normal recombinants DBC % protein Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

19 Selecting endosperm mutants J.Chemometrics 24: 481-495 (2010) No data Vitamin E profileA/P vs. b-gulcan Conclusion: Each mutant produces a unique chemical fingerprint for each individual gene in a controlled genetic background (Bomi). The fingerprint is summerized on the level of chemical bonds by NIR spectroscopy. Cellular computation is soft like a PCA. Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion Any chemical (bi-)plot can select any mutant.

20 There are deterministic differential NIR spectra for each mutant to the gene background Bomi that reveals a spectral absorption reproducibility as high as 10 -5 MSC log 1/R for the P mutant lys3.a(blue) and the C mutant lys5.g (brown). Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

21 Data structure is super-ordinate to chemometric analysis 3.2 3c 3a The 3a and 3c P mutants are differentiated in this PCA However, spectral differences in the area 2450-2500nm represent a much more finely tuned and informative change in β -glucan from 3.1% in 3a to 6.4% in 3c Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

22 How is the chemical composition of the cell decided? Through soft modeling of intercellular dynamics of the whole cell by quantum and chemical cross-talk as revealed by the movements of chromosomes at mitosis (click at the left figure). Cell emergence is like music as directed by the whole chemical orchestra of the cell Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

23 Biological macro data are basically deterministic calculated in situ by “set probability” controlled by the whole cell Holistic analysis is limited by uncertainty specified as irreducibility “top down” and indeterminacy “bottom up” The structure of data is the king that rules mathematical modeling by data inspection Because of the determinism that here is demonstrated, data development of gentle data models (such as MSC) and data inspection software are of essential importance in avoiding a reduction of information. Chemometrics is excellent for over- views but the results have to be checked by data inspection, Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion


Download ppt "X Y The significance of the structure of data on PLS predictions of protein involving both natural and human experimental design Åsmund Rinnan Lars Munck."

Similar presentations


Ads by Google