Download presentation
Presentation is loading. Please wait.
Published byLesley York Modified over 9 years ago
1
Hard versus Soft Science: Studies in Biometrics and Psychometrics Peter H. Westfall Horn Professor of Statistics Dept. of ISQS
2
Goals of this Talk Characterize “hard” and “soft” science –Biometrics –Psychometrics –Medicine Differences concern –Measurement –Models –Action orientation Describe pitfalls Recommendations
3
Hard and Soft Measurements Hard(er) endpoints –Patient genotype –Patient bilirubin level Soft(er) endpoints –Patient-reported pain level –Patient reported quality of life
4
Characterizations Hard endpoints –Meaningful units (eg, g/L) –Reliable –Accurate Soft endpoints –Units not as meaningful (e.g., 1-5 Likert scale) –Less reliable –Accurate?
5
Measurement Scales Hard ScienceSoft Science Measurement : 23.2 grams What do you think? Disapprove Approve 1 2 3 4 5 Measurement: “I dunno, …, uh, 4?”
6
A “Hard Science” Model Genotype Phenotype 1 Phenotype 2 Phenotype 3
7
Data for “Hard” Science Model
8
A “Soft Science” Model “Intelligence” Test 1 Test 2 Test 3
9
Data for “Soft” Science Model
10
What is “Intelligence”? “An Intelligent person is one who scores high on tests.” –Circular: Defined in terms of test scores, and yet also is used to predict test scores. –Usual psychometric model simply assumes that there is a number “intelligence” existing in each individual person (like a genotype). –It assumes all people in the universe are perfectly ordered by their “intelligence.” –This is hogwash.
11
Assumed Psychometric Data These numbers are assumed to exist! People are perfectly ordered by them. This is hogwash!
12
SEM (Structural Equations Model)
13
The Utility of Better Models To bring the data into sharper focus:
14
Clearer focus with SEM model:
15
When is a Model Good? Property 1: A good model is one whose predictions (what comes out of the black box) match reality well. Property 2: A good model is one whose constructs (what is inside the black box) match reality well. Property 3: A good model is one that has prescriptive utility.
16
Property 1: Outputs Both models predict data that “looks like” the data we see: –SEM model predicts generally high test scores for a person with “high intelligence.” –Genotype/phenotype model predicts certain physical characteristics for people sharing a common genotype.
17
Property 2: Model Constructs The latent constructs are not real, thus the model fails on this count. The genotype/phenotype constructs are real, and the directional arrows have clear biological justification (genes code for proteins that perform biological functions).
18
Property 3: Prescription Prescriptive use of the SEM model: –Since latent factors do not exist, we cannot use the model prescriptively. –But the model is often used for scoring; and scores might be used prescriptively. Prescriptive uses of Genotype/Phenotype model: –Counseling –Saving lives
19
Is Psychometric Score Construction Helpful? Many variables Psycho- metric Score construction Use score In future analysis (Multiple variables X 1, X 2,…,X 20 ) (Cronbach’s alpha, SEM, discriminant and convergent validity; S= X 1 +X 3 +X 17 ) (Classification, Prediction)
20
Example 1: Arthritis Pain Measurement Ask subjects to rate pain in feet, knees, shoulder, hands, in morning; all in midday, morning, and night. Psychometric score: “Advancement of Arthritic Condition” (essentially a summate of all measures). If used to evaluate a knee therapy, this score will waste the company’s money and delay the progress of science.
21
Example 2: The essence of Turtle Measurements: Log(Length), Log(Width), Log(Height) Reliability of T = Log(Length) + Log(Width) + Log(Height) as a measure of the “essence of turtle”: Males: Cronbach’s Alpha = 0.97 Females: Cronbach’s Alpha = 0.98 Exceptional! Alpha > 0.70 often considered “acceptable”. T is the score we should use in further analysis!
22
Example 2 Continued: Despite its high reliability, T is improper for characterizing Female vs. Male turtles. The best classifier is W = -2.42Log(Length) -0.48Log(Width) + 3.74Log(Height). (Females turtles are shaped differently from Males.) The psychometric scale impedes science.
23
Example 3: Patient Condition Measurements (Likert scale): X i = condition at week i after start of treatment, i=1,2,3,4. Psychometric scale: “Condition” = X 1 +X 2 +X 3 +X 4. But this is an inappropriate: “Improvement” = -1.5X 1 -0.5X 2 + 0.5X 3 +1.5X 4 is better. The pychometric scale will cost the drug company more, delay approval, and possibly result in lives lost.
24
Revised Score Construction Model Many variables Pilot study or Training sample Use score In future analysis (Multiple variables X 1, X 2,…,X 20 ) (Construct score using scientific relevance and statistical predictive ability; S = (X 2 + X 5 ) – (X 7 +X 9 )) (Classification, Prediction)
25
Follow the Money Money talks: “Hard science” approaches receive the money: –Data mining in business Expensive customer scoring data Analyze money spent, not intention to spend –Pharmaceutical company exploration – genes, chemistry experimentation - 100’s of millions of dollars change hands on a single clinical trial
26
Then why do we do so much soft science? Inertia, inbreeding –Journals –Universities, “research methods” Money: –Drug trials: $10,000 per subject –Undergraduate students: $0 per subject
27
Inbreeding: The Exponential BS (bogus science) Theory BS 0 published Time 0123401234 BS 1 published BS 1 published BS 2 3 3 33 33 3 3
28
Comparison Hard Science: Spend a winter collecting and analyzing fungus from caves in Northern Alaska Soft Science: Ask students to pretend they are fungus in caves in Northern Alaska
29
Survey data on undergraduate students
30
analyzed via complex statistical model
31
Conclusions Let’s move towards harder science: –Work harder to get relevant data –Use more real measures, less fictional –Use more models that Predict reality Have real constructs Are prescriptive Are falsifiable –Use more external validation
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.