X Y The significance of the structure of data on PLS predictions of protein involving both natural and human experimental design Åsmund Rinnan Lars Munck.

X Y The significance of the structure of data on PLS predictions of protein involving both natural and human experimental design Åsmund Rinnan Lars Munck

Three Data-sets of barley B + C: The major substances protein, starch, cellulose, beta- glucan, fat and water are weighted to represent biological composition ABC NaturalSimulatedDoE 31 54 All measured on NIR 6500 from 1100-2498nm with 2 nm intervals Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion Normal barley Protein mutants Carbohydrate mutants

Pre-processing of spectra Moving Window SNV with 130 nm window The 1580-2498 nm spectral area visualizes the least differences between the three data sets Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

PCA 1100-2500nm Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Interval PCA selects 1804-2060 nm giving the least differences between datasets. Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Predicting protein Using the three datasets NatSimDoE RMSE 0.711.080.69 r2r2 0.90.840.96 nLV 525 intercept 1.092.120.48 slope 0.930.860.97 Regression coefficients Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

PLS diagnostics (to protein) A.Simple correlation coefficients: wave-length absorbtion to protein content. B.PLS Regression coefficients Natural Simulated DoE Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Isolating the chemical and biological components of the data-sets. ABC Natural Simulated Natural DoE 31 54 Chemistry SimBiology RestBiology SimBiology Chemistry SimBiology = B – C RestBiology = (A – C) – (B – C) Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Predicting protein: by PLS: Chemistry and non simulated(rest) biology show high contributions while that of simulated biology is low. ChemistrySimBioRestBio RMSE 0.942.531.31 R2 0.870.130.76 nLV 313 intercept 1.5812.93.15 slope 0.900.170.80 Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Normalized regression coefficients Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Back to data, selected wavelengths Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion Full PLSCorrelation-PLS Wavelengths abs to protein Assignment PLS Phil Williams

Quick comparison Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Results: Summary Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Interpretation: We are working by ”Permutation science”: 1.By mathematical validation of models  permutation of data in chemometrics i.e crossvalidation Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

”Permutation science”: 2.Design of Experiments (DoE)  Permutation of data through experiments by human design. Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

”Permutation science”: 1.By mathematical validation of models  permutation of data in chemometrics i.e. crossvalidation 2.Design of Experiments (DoE)  Permutation of data through experiments by human design. 3. Natural design  Permutation by selection of unique natural states where nature reveals its principles in data. Question: In chemometrics why not combine them all rather than focusing on mathematical permutation alone? All three permutation approaches are in the heart of chemometric validation of models! Why not use them together as we have done here. They are complementary. Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Principles of natural processes are reflected in data The solar eclipse reveals solar eruptions The NIR barley endosperm mutant model developed since 1965 with expression control of genetics and environment Two types of mutants: regulative protein mutants – P and carbohydrate (starch) mutants – C (normal barley – N) *) *) http://science.nationalgeographic.com/science/enlarge/solar-eclipse-moon.html J.Chemometrics 24: 481-495 (2010) Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

How were the mutants found? By a bi-variate plot % protein to mmol DBC (Dye binding capacity by acilanorange) The Dyebinding Capacity (DBC) instrument for basic amino acids (lysine). Background: Development of screening methods for improving lysine and nutritional quality in barley LM at the nutritional laboratory of the Swedish seed Ass. Svalöf in 1967. High lysine Mutation Mutation recombinants Normal recombinants DBC % protein Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Selecting endosperm mutants J.Chemometrics 24: 481-495 (2010) No data Vitamin E profileA/P vs. b-gulcan Conclusion: Each mutant produces a unique chemical fingerprint for each individual gene in a controlled genetic background (Bomi). The fingerprint is summerized on the level of chemical bonds by NIR spectroscopy. Cellular computation is soft like a PCA. Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion Any chemical (bi-)plot can select any mutant.

There are deterministic differential NIR spectra for each mutant to the gene background Bomi that reveals a spectral absorption reproducibility as high as 10 -5 MSC log 1/R for the P mutant lys3.a(blue) and the C mutant lys5.g (brown). Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Data structure is super-ordinate to chemometric analysis 3.2 3c 3a The 3a and 3c P mutants are differentiated in this PCA However, spectral differences in the area 2450-2500nm represent a much more finely tuned and informative change in β -glucan from 3.1% in 3a to 6.4% in 3c Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

How is the chemical composition of the cell decided? Through soft modeling of intercellular dynamics of the whole cell by quantum and chemical cross-talk as revealed by the movements of chromosomes at mitosis (click at the left figure). Cell emergence is like music as directed by the whole chemical orchestra of the cell Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

Biological macro data are basically deterministic calculated in situ by “set probability” controlled by the whole cell Holistic analysis is limited by uncertainty specified as irreducibility “top down” and indeterminacy “bottom up” The structure of data is the king that rules mathematical modeling by data inspection Because of the determinism that here is demonstrated, data development of gentle data models (such as MSC) and data inspection software are of essential importance in avoiding a reduction of information. Chemometrics is excellent for over- views but the results have to be checked by data inspection, Rinnan Dataset Preprocessing PCA iPCA PLS Biology PLS - again Summary Munck Permutation Mutants Diff spec Data structure Genetics Conclusion

X Y The significance of the structure of data on PLS predictions of protein involving both natural and human experimental design Åsmund Rinnan Lars Munck.

Similar presentations

Presentation on theme: "X Y The significance of the structure of data on PLS predictions of protein involving both natural and human experimental design Åsmund Rinnan Lars Munck."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

X Y The significance of the structure of data on PLS predictions of protein involving both natural and human experimental design Åsmund Rinnan Lars Munck.

Similar presentations

Presentation on theme: "X Y The significance of the structure of data on PLS predictions of protein involving both natural and human experimental design Åsmund Rinnan Lars Munck."— Presentation transcript:

Similar presentations

About project

Feedback