1 Simple Regression ( 簡單迴歸分析 ) Social Research Methods 2109 & 6507 Spring, 2006 March 8, 9, 13, 2006.

Slides:

Advertisements

Similar presentations

1 Econometrics. 2 Ch1 The nature and scope of Econometrics Y: dependent var. => effect ( 果 ) X 1, …X k : independent var. => cause ( 因 ) Ch2-Ch5:Review.

Advertisements

03/19/2003 Week #4 江支弘 Chapter 4 Making Predictions: Regression Analysis.

1 政大公企中心產業人才投資課程 -- 企業決策分析方法 -- 黃智聰政大公企中心產業人才投資課程課程名稱：企業決策分析方法授課老師：黃智聰授課內容：企業決策分析之報告結果與計量模型型式之選擇參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge,

Chapter Four Parameter Estimation and Statistical Inference.

McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆資料分析與表達.

第七章抽樣與抽樣分配蒐集統計資料最常見的方式是抽查。這牽涉到兩個問題：抽出的樣本是否具有代表性?是否能反應出母體的特徵?

Section 1.2 Describing Distributions with Numbers 用數字描述分配.

第九章運銷通路授課老師簡立賢. 授課大綱運銷通路之涵意及其基本結構  何謂運銷通路  運銷通路的基本結構影響農產品運銷通路選擇之因素  產品因素  市場因素  廠商因素  法規因素運銷效率之判斷  通路中階段數目與運銷效率  通路競爭與運銷效率.

Chapter Two Data Summary and Presentation. Statistics II2 敘述統計 Vs. 推論統計 n 敘述統計 : 使用分析方法或圖形來描述一組來自於母體或樣本之資料 n 推論統計 : 利用抽樣方法取得一樣本, 並針對此樣本計算樣本統計量, 以推論未之母體之參數.

社研法助教課， 2007/04/11 如何閱讀 SPSS 圖表（迴歸分析篇） By 黃昱珽. 小考題目大華用 SPSS 得到以下的資料：（圖表見下面）說明 : BABYMORT = 嬰兒死亡率， GDP_CAP = 一國國民生產毛額， LIT_FEMA = 女性識字率。資料來源 : 聯合國，

1 政治大學財政所與東亞所選修 -- 應用計量分析 -- 中國財政研究黃智聰政治大學財政所與東亞所選修課程名稱：應用計量分析 -- 中國財政研究授課老師：黃智聰授課內容：簡單線性迴歸模型：共線性與虛擬變數參考書目： Hill, C. R., W. E. Griffiths, and G.

亂數產生器安全性評估之統計測試 SEC HW7 姓名：翁玉芬學號：

©Ming-chi Chen 社會統計 Page.1 社會統計第十講相關與共變. ©Ming-chi Chen 社會統計 Page.2 Covariance, 共變量當 X, Y 兩隨機變數不互為獨立時，表示兩者間有關連。其關連的形式有很多種，最常見的關連為線性的共變關係。隨機變數 X,Y.

Review of Chapter 3 - 已學過的 rules( 回顧 )- 朝陽科技大學資訊管理系李麗華教授.

Section 2.3 Least-Squares Regression 最小平方迴歸

STAT0_sampling Random Sampling  母體： Finite population & Infinity population  由一大小為 N 的有限母體中抽出一樣本數為 n 的樣本，若每一樣本被抽出的機率是一樣的，這樣本稱為隨機樣本 (random sample)

第 4 章迴歸的同步推論與其他主題.

1 政大公企中心產業人才投資課程 -- 企業決策分析方法 -- 黃智聰政大公企中心產業人才投資課程課程名稱：企業決策分析方法授課老師：黃智聰授課內容：利用分公司之追蹤資料進行企業決策分析參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge,

Structural Equation Modeling Chapter 7 觀察變數路徑分析＝路徑分析觀察變數路徑分析.

STAT0_corr1 二變數的相關性  變數之間的關係是統計研究上的一大目標  討論二分類變數的相關性，以列聯表來表示  討論二連續隨機變數時，可以作 x-y 散佈圖觀察它們的關係強度  以相關係數來代表二者關係的強度.

Quantitative Data Analysis Social Research Methods 2109 & 6507 Spring, 2006 March

Section 2.2 Correlation 相關係數. 散佈圖 1 散佈圖 2 散佈圖的盲點兩座標軸的刻度不同，散佈圖的外觀呈現的相聯性強度，會有不同的感受。散佈圖 2 相聯性看起來比散佈圖 1 來得強。以統計數字相關係數做為客觀標準。

Part 1 Understanding Data Chapter 1 Examining Distributions Chapter 2 Examining Relationships Chapter 3 Producing Data.

McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆資料分析與表達.

03/05/2003 Week #2 江支弘 Measuring Center or Average 量度中心或平均 Stemplot: Mean: 平均數 arithmetic average of observations Median: 中位數 middle value of... (in increasing.

1 政治大學東亞所選修 -- 計量分析與中國大陸研究黃智聰政治大學東亞所選修課程名稱：計量分析與中國大陸研究（量化分析）授課老師：黃智聰授課內容：時間序列與橫斷面資料的共用參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge, (2001),

Monte Carlo Simulation Part.2 Metropolis Algorithm Dept. Phys. Tunghai Univ. Numerical Methods C. T. Shih.

1 Part IC. Descriptive Statistics Multivariate Statistics ( 多變量統計 ) Focus: Multiple Regression ( 多元迴歸、複迴歸 ) Spring 2007.

2009fallStat_samplec.i.1 Chap10 Sampling distribution (review) 樣本必須是隨機樣本 (random sample) ，才能代表母體 Sample mean 是一隨機變數，隨著每一次抽出來的樣本值不同，它的值也不同，但會有規律性為了要知道估計的精確性，必需要知道樣本平均數.

Quant_reg11 第三章迴歸分析  如何估計一合理的股價？  影響股價的因素：紅利 (dividend) 、報酬率、營業額、公司利潤、其它 ( 不確定因素 )  每一因素的影響程度可能不一樣  以一數學式描述股價 =β 1 ( 紅利 ) +β 2 ( 報酬率 ) +β 3 (

1 政治大學財政所與東亞所選修 -- 應用計量分析 -- 中國財政研究黃智聰政治大學財政所與東亞所選修課程名稱：應用計量分析 -- 中國財政研究授課老師：黃智聰授課內容：簡單線性迴歸模型：報告結果與選擇函數型式參考書目： Hill, C. R., W. E. Griffiths, and.

© The McGraw-Hill Companies, Inc., 2008 第 6 章製造流程的選擇與設計.

1 政治大學公企中心必修課 -- 社會科學研究方法（量化分析） -- 黃智聰政治大學公企中心必修課課程名稱：社會科學研究方法（量化分析）授課老師：黃智聰授課內容：簡單線性迴歸模型：共線性與虛擬變數參考書目： Hill, C. R., W. E. Griffiths, and G. G.

1 開南大學公管所與國企所合開選修課 -- 量化分析與應用 -- 黃智聰開南大學公管所與國企所合開選修課課程名稱：量化分析與應用授課老師：黃智聰授課內容：簡單線性迴歸模型：共線性與虛擬變數參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge,

1 政大公企中心產業人才投資課程 -- 企業決策分析方法 -- 黃智聰政大公企中心產業人才投資課程課程名稱：企業決策分析方法授課老師：黃智聰授課內容：質化因素在企業決策分析之重要性參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge, (2001),

1 政大公企中心產業人才投資課程 -- 企業決策分析方法 -- 黃智聰政大公企中心產業人才投資課程課程名稱：企業決策分析方法授課老師：黃智聰授課內容：企業質化決策之應用與分析參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge, (2001),

選舉制度、政府結構與政黨體系 Cox (1997) Electoral institutions, cleavage strucuters, and the number of parties.

: Multisets and Sequences ★★★★☆ 題組： Problem Set Archive with Online Judge 題號： 11023: Multisets and Sequences 解題者：葉貫中解題日期： 2007 年 4 月 24 日題意：在這個題目中，我們要定義.

政治大學公企中心必修課-- 社會科學研究方法（量化分析）--黃智聰

觀測量的權權的觀念與計算.

公用品.  該物品的數量不會因一人的消費而受到影響，它可以同時地被多人享用。角色分配  兩位同學當我的助手，負責：  其餘各人是投資者，每人擁有 $100 ，可以投資在兩種資產上。  記錄  計算  協助同學討論.

1 政治大學國務院國安碩專班選修課 -- 社會科學研究方法（量化分析） -- 黃智聰政治大學國務院國安碩專班選修課課程名稱：社會科學研究方法（量化分析）授課老師：黃智聰授課內容：簡單線性迴歸模型：共線性與虛擬變數參考書目： Hill, C. R., W. E. Griffiths, and.

變異數分析迴歸分析因素分析區別分析集區分析

演算法 8-1 最大數及最小數找法 8-2 排序 8-3 二元搜尋法.

845: Gas Station Numbers ★★★ 題組： Problem Set Archive with Online Judge 題號： 845: Gas Station Numbers. 解題者：張維珊解題日期： 2006 年 2 月題意：將輸入的數字，經過重新排列組合或旋轉數字，得到比原先的數字大，

介紹不同坐標系之間的轉換以LS平差方式求解坐標轉換參數

1 政治大學國務院國安碩專班選修課 -- 社會科學研究方法（量化分析） -- 黃智聰政治大學國務院國安碩專班選修課課程名稱：社會科學研究方法（量化分析）授課老師：黃智聰授課內容：簡單線性迴歸模型：報告結果與選擇函數型式參考書目： Hill, C. R., W. E. Griffiths,

描述統計描述統計(Descriptive Statistics)-將蒐集到的資料加以整理和記錄,並以數字和統計圖表的方式來分析及解釋資料所具有的特性. 基本統計值(平均數,中位數,標準差,變異量….) 相關性測量(卡方,相關係數,迴歸…)

1 Part IB. Descriptive Statistics Multivariate Statistics ( 多變量統計 ) Focus: Multiple regression Spring 2007.

Unit 3 ：變異數分析 --ANOVA 3.1 範例說明行銷研究方面， One-Way ANOVA 可用以研擬市場區隔及目標選擇策略。教育研究方面，此一模式可用以評估教師之教學績效。農業研究方面，此一模式則可用以挑選使玉米收穫量極大化的肥料。

1 開南大學公管所與國企所合開選修課 -- 量化分析與應用 -- 黃智聰開南大學公管所與國企所合開選修課課程名稱：量化分析與應用授課老師：黃智聰授課內容：簡單線性迴歸模型：報告結果與選擇函數型式參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge,

: Help My Brother ★★★☆☆ 題組： Problem Set Archive with Online Judge 題號： 11033: Help My Brother 解題者：呂明璁解題日期： 2007 年 5 月 14 日.

2005/7 Linear system-1 The Linear Equation System and Eliminations.

Quan_model1 在建立迴歸模式時，需要先選出對因變數有影響的一些自變數作為解釋變數 (explanatory var.) ，其次，要選擇一適當的數學式作為模式本章主題：共線性現象選擇適當的解釋變數利用虛擬變數建立模式第五章建立迴歸模式.

Cluster Analysis 目的 – 將資料分成幾個相異性最大的群組基本問題 – 如何衡量事務之間的相似性 – 如何將相似的資料歸入同一群組 – 如何解釋群組的特性.

冷凍空調自動控制 - 系統性能分析李達生. Focusing here … 概論自動控制理論發展自控系統設計實例 Laplace Transform 冷凍空調自動控制控制系統範例控制元件作動原理控制系統除錯自動控制理論系統穩定度分析系統性能分析 PID Controller 自動控制實務.

Structural Equation Modeling Chapter 8 潛伏變數路徑分析＝完全 SEM 潛伏變數路徑分析.

Ch 3 Central Tendency 中央集中趨勢測量.

連續隨機變數連續變數：時間、分數、重量、……

Inference for Simple Regression Social Research Methods 2109 & 6507 Spring 2006 March 15, 16, 2006.

Regression 相關 –Cross table –Bivariate –Contingency Cofficient –Rank Correlation 簡單迴歸多元迴歸.

牽涉兩個變數的 Data Table 汪群超 11/1/98. Z=-X 2 +4X-Y 2 +6Y-7 觀察 Z 值變化的 X 範圍觀察 Z 值變化的 Y 範圍.

: Finding Paths in Grid ★★★★☆ 題組： Contest Archive with Online Judge 題號： 11486: Finding Paths in Grid 解題者：李重儀解題日期： 2008 年 10 月 14 日題意：給一個 7 個 column.

財務管理概論劉亞秋‧薛立言合著（東華書局, 2007)

Ch 11 建立研究工具的效度與信度.

McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆資料分析與表達.

1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 

Copyright © 2011 Pearson Education, Inc. Linear Patterns Chapter 19.

CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.

Presentation transcript:

1 Simple Regression ( 簡單迴歸分析 ) Social Research Methods 2109 & 6507 Spring, 2006 March 8, 9, 13, 2006

2 From Correlation to Regression: Correlation ( 相關分析、相關係數 ): measures the strength of linear association between 2 quantitative variables ( 二變數線性關係的強度 ) Regression ( 迴歸分析 ): 1.Description ( 描述 ): summarize the relationship between the two variables with a straight line, what does the line look like? ( 如何用一直線描述二變數的關係 ?) 2.Prediction ( 預測 ): how to make predictions about one variable based on another? ( 如何從一變數預測另一變數 ?)

3 Example: summarize the relationship with a straight line

4 Draw a straight line, but how? ( 怎麼畫那條直線 ?)

5 Notice that some predictions are not complete accurate

6 How to draw the line? Purpose: draw the regression line to give the most accurate predictions of y given x Criteria for “accurate”: Sum of (observed y – predicted y) 2 = sum of (prediction errors) 2 [ 觀察值與估計值之差的平方和 ] Called the sum of squared errors or sum of the squared residuals (SSE)

7 Ordinary Least Squares (OLS) Regression ( 普通最小平方法 ) The regression line is drawn so as to minimize the sum of the squared vertical distances from the points to the line ( 讓 SSE 最小 ) This line minimize squared predictive error This line will pass through the middle of the point cloud ( 迴歸線從資料群中間穿過 )(think as a nice choice to describe the relationship)

8 To describe a regression line (equation): Algebraically, line described by its intercept ( 截距 ) and slope ( 斜率 ) Notation: y = the dependent variable x = the independent variable y_hat ( )= predicted y based on the regression line β = slope of the regression line α= intercept of the regression line

9 The meaning of slope and intercept: slope = change in (y_hat) for a 1 unit change in x (x 一單位的改變導致 y 估計值的變化 ) intercept = value of (y_hat) when x is 0

10 General equation of a regression line: (y_hat) = α +βx where α and β are chosen to minimize: sum of (observed y – predicted y) 2 A formula for α and β which minimize this sum is programmed into statistical programs and calculators

11 An example of a regression line

12 Residuals ( 殘差 ) Residual = difference between the predicted y and the observed y for an observation residual i = y i – (y_hat) i

13 Interpreting regression coefficients Slope = change in y predicted with a one unit change in x –Slope = 0: no linear relationship between x and y (r = 0) Intercept = predicted value of y when x is 0 –Often, we are not interested in the intercept Note: interpretation of the slope and intercept requires thinking in the units of x and y ( 解釋截距與斜率時要注意到 x and y 的單位 )

14 Regression and Correlation Distinct but related measures Correlation: measures strength of relationship, a major aspect of which is how closely the points form a line shape Regression slope: how steep is the slope of the line?

15 To get slope and intercept for a regression:

16 How slope and correlation are mathematically related: β = r (s y )/ (s x ) α = (y_bar) – β(x_bar)

17 Fit: how much can regression explain? ( 迴歸能解釋 y 多少的變異？ ) Look at the regression equation again: (y_hat) = (y_hat) = α +βx y = α +βx + ε Data = what we explain + what we don’t explain Data = predicted + residual ( 資料有我們不能解釋的與可解釋的部分，即能預估的與誤差的部分）

18 In regression, we can think “fit” in this way: Total variation = sum of squares of y explained variation = total variation explained by our predictions unexplained variation = sum of squares of residuals R 2 = (explained variation)/ (total variation) （判定係數） [y 全部的變易量中迴歸分析能解釋的部分 ]

19 R 2 = r 2 NOTE: a special feature of simple regression (OLS), this is not true for multiple regression or other regression methods. [ 注意：這是簡單迴歸分析的特性，不適用於多元迴歸分析或其他迴歸分析 ]

20 Some cautions about regression and R 2 It’s dangerous to use R 2 to judge how “good” a regression is. ( 不要用 R 2 來判斷迴歸的適用性 ) –The “appropriateness” of regression is not a function of R 2 When to use regression? –Not suitable for non-linear shapes [you can modify non-linear shapes] – regression is appropriate when r (correlation) is appropriate as a measure

21 Residuals and residual plots residual i = y i – (y_hat) I We can use residual plots to help us assess the fit of a regression line A residual plot: a scatterplot of the regression residuals against the explanatory variable ( 殘差在 y 軸，自變數在 x 軸 )

22 Example of a residual plot

23 Look at a residual plot 殘差 (residuals) 的分布是否平均散佈在 0 的上面及下面？對整個自變數的分佈而言，殘差的垂直分佈 (vertical spread) 是否都差不多？

24 Types of residual plots

25 Outliers and influences Outlier ( 極端值 ): a point that falls outside the overall patterns of the graph Influential observation ( 深具影響的觀察值 ) = a point which, if removed, would markedly change the position of the regression line NOTE: Outliers are not necessarily influential.

26 The differences between outliers and influential outliers

27 Outliers and influential observations Outliers which are at the extremes of x are more likely to be influential than those are at the extremes of y ( 自變數的極端值比依變數的極端值較有可能是對迴歸影響力大的觀察值 ) It is often a good idea to eliminate any influential outliers and recompute our regression without them.( 建議 : 將對迴歸影響力大的觀察值刪除，再計算一次迴歸線 )

28 Cautions about correlation and regression: Extrapolation is not appropriate Regression: pay attention to lurking or omitted variables –Lurking (omitted) variables: having influence on the relationship between two variables but is not included among the variables studied –A problem in establishing causation Association does not imply causation. –Association alone: weak evidence about causation –Experiments with random assignment are the best way to establish causation.