Presentation on theme: "Introduction to elementary quantitative concepts and methods Guest lecture Carl Henrik Knutsen, 14/5-2008."— Presentation transcript:
Introduction to elementary quantitative concepts and methods Guest lecture Carl Henrik Knutsen, 14/5-2008
Motivation Social sciences, and science in general: We are generally interested in: – “How” questions – “Why” questions. Social scientists seek descriptions of empirical phenomena and try to come up with causal explanations. Both quantitative and qualitative methodology try to respond to such questions. Nature of problem question is important for choice of methodology, even if in the real world of social science, researchers often choose method after their knowledge and “taste”. Knowledge of different methodologies allow researchers and students to fit methodology to problem question Improve analysis. Triangulation can often be a good idea: Usage of different methodologies to illuminate a problem in a more comprehensive fashion. The knowledge of elementary quantitative method enables you to read different types of research.
Causality and the control problem Independent of choice of methodology Theory and clever design needed Three causal structures that might lead to correlation: X YXY X Y Z
Generalization The big advantage of quantitative methods Provides stringent criteria for when we can be relatively certain that our generalizations hold true and are not driven by coincidences. Remember that in the social sciences, we do not face deterministic relationships between factors. Quant. methods takes into account the stochastic structure of social life.
Data There exists a vast number of sources for data constructed by different agencies or researchers: You do not need to construct your own data for many purposes. But: Know the data you use in order to avoid different pit-falls. Sources on the web: World Development Indicators, Penn World Tables, World Governance Indicators, Polity, Freedom House, OECD, UNESCO, UNCTAD etc!
Descriptive statistics Descriptive vs inferential statistics Descriptive statistics: Draw out comprehensible information about the structure of your data 1) Central tendencies, 2) variation, 3) correlation
Variation Range Variance (S^2 = (Σ(X-M)^2)/(N-1)) Standard deviation
Correlation Covariance cov(xy) = (Σ((X-Xm)(Y-Ym)/(N-1) Correlation coefficients Pearson’s r = cov(xy)/(S(x)*S(y)): Always between -1 and 1. NB: Gives only degree of linear relationship.
Presentation of data Tables Histogram Bar- and pie-charts Scatter plots Important to think about the reader: Combrehensible and informative. Need to strike a balance on the amount of information presented in a chart. Label charts.
Table MaleFemale No higher educationUniversityNo higher educationUniversity Mean income (N)150 (2000)300 (1000)100 (2500)250 (700)
Inferential statistics The aim is solid inference from an observed sample to a larger (unobserved) universe. Generalization about populations or about effects. For effects: Can we say that trajectories we observe are due to “real” effects or are they likely only a product of chance?
Law of large numbers... – Population, samples, – Estimates and underlying mean. Random selection? Selection bias ALWAYS a possibility. Sampling techniques: – Experiment – Random draws – Stratification
Hypothesis test Democracy and economic growth as example. – H 0 : Democracy has no effect on growth – H alt : Democracy has an effect on growth In general H 0 is often a hypothesis which claims that there is no effect. We often want to investigate whether we can with relative certainty claim that H alt is valid. Burden of proof is on the alternative hypothesis. Conservative bias: we have to have relatively strong results to claim a relationship is not due to pure chance. Central limit theorem as underlying. How do we know the distribution given H 0 ? Use given distribution to find out what one is likely to arrive at by pure chance. The normal distribution.
Central limit theorem “The central limit theorem is one of the most remarkable results of the theory of probability. In its simplest form, the theorem states that the sum of a large number of independent observations from the same distribution has, under certain general conditions, an approximate normal distribution. Moreover, the approximation steadily improves as the number of observations increases. The theorem is considered the heart of probability theory, although a better name would be normal convergence theorem.” http://davidmlane.com/hyperstat/normal_distribution. html (Berrie Zielman) http://davidmlane.com/hyperstat/normal_distribution. html
Significance levels and p-values Significance level. If we take H 0 as true, then we want to have a critical level beyond which it is unlikely that we will see results. For example 5%. Only in 5% probability that we will see this strong relationship if H 0 is true. Important to have large sample. P-value: The lowest significance level that will give rejection of H 0. If H 0 is true: What is probability that we will see this extreme result.
Models Stockburger: “A model is a representation containing the essential structure of some object or event in the real world.” – 1. Models are necessarily incomplete – (2. The model may be changed or manipulated with relative ease.)
Regression analysis How to fit a straight line through a scatterplot! Best fit: one criteria is to minimize sum of squared residuals Ordinary Least Squares (OLS) Bivariate regression equation: Y = a + bX + ε Regression analysis recognizes that the world is not deterministic. The role of the error term: ε. Large error terms in general implies large uncertainty Interpretation of a: Mean value of Y when X is equal to zero. Often no substantial interpretation. Not so interesting Interpretation of b: Increase in mean of Y when X increases with one unit. Effect of X on Y?
Assumptions of distribution error term when using OLS: Homoskedastic No autocorrelation Normally distributed
Multivariate regression Y = a + b1X1 + b2X2 +b3X3 + ε New interpretation of b: The mean increase in Y when relevant X increases with one unit, given that all other variables are held constant. R-square: How much of the variation in the data is “explained by the model” (A very imprecise interpretation). Goes from 0 to 1. “Control variables” Extensions of regression analysis: Generalized Least Squares, Systems of equations, Instrumental Variables, Logit and Probit models and many more.
Extensions Dummy variable Squared X Logarithmic specifications Splitting the sample
Problems 1) “Simultaneity bias”: Reverse causation. Exogeneity vs endogeneity of X-variables. 2) “Omitted variable bias” 3) Measurement error. – Reliability. Where does the data come from? GDP in developing countries. – Validity (TFP and technological change) Operationalization of variable: Have to be observable, quantifiable and measurable.