Presentation is loading. Please wait.

Presentation is loading. Please wait.

NPSAS DAS Training December 2006 Training Shefali V. Mehta Minnesota Office of Higher Education.

Similar presentations


Presentation on theme: "NPSAS DAS Training December 2006 Training Shefali V. Mehta Minnesota Office of Higher Education."— Presentation transcript:

1 NPSAS DAS Training December 2006 Training Shefali V. Mehta Minnesota Office of Higher Education

2 NPSAS Background NPSAS 2004 Data Based on training by: Lutz Berkner of MPR Associates, Inc. Tracy Hunt- White and James Griffith of the National Center for Education Statistics

3 November 2006Minnesota Office of Higher Education3 What is the National Postsecondary Student Aid Survey? The National Postsecondary Student Aid Survey, or NPSAS, is a nationally- representative stratified random sample of undergraduate, graduate and first- professional students attending postsecondary institutions. Todays presentation will focus on the undergraduate sample data- how it was collected, what it contains and how to access and use it.

4 November 2006Minnesota Office of Higher Education4 NCES Recent Surveys: Higher Education Longitudinal and Cross-Sectional Studies

5 November 2006Minnesota Office of Higher Education5 Data Sources for the NPSAS 2004 Central Processing System (CPS) Match Institutional Records (CADE) Student Interviews NSLDS Loan Match NSLDS Pell Grant File Match ETS File Match ACT File Match

6 November 2006Minnesota Office of Higher Education6 NPSAS 2004 Data Collection Timeline

7 November 2006Minnesota Office of Higher Education7 Products related to NPSAS Public Use Data Systems (DASs) Methodology Reports describing study design, procedures, and outcomes Restricted use research files ED Tabs and Descriptive Reports based on analyses of merged data.

8 Using the DAS online Accessing the NPSAS 2004 Data

9 November 2006Minnesota Office of Higher Education9 What is the DAS? The Data Application System, or DAS, is a software application that produces tables and correlation matrices for NCES datasets. The DAS, which is available for each NCES dataset, includes Over 1,000 variables with full descriptions and Statistical information, such as standard errors and the distribution of the data. It is available online through the NCES website: http://nces.ed.gov/dasol/

10 November 2006Minnesota Office of Higher Education10 DAS Home Page: http://nces.ed.gov/das/

11 November 2006Minnesota Office of Higher Education11 DAS Online http://nces.ed.gov/dasol/

12 November 2006Minnesota Office of Higher Education12 DAS Online: Select a dataset

13 November 2006Minnesota Office of Higher Education13 DAS Online

14 November 2006Minnesota Office of Higher Education14 NCES Data Usage Agreement Select I agree.. to continue to the DAS for NPSAS 2004. Note: To use DAS online, you need to enable pop-up windows from this website. The application relies heavily on pop-up windows, such as this usage agreement.

15 November 2006Minnesota Office of Higher Education15 DAS Online Window Toolbar

16 November 2006Minnesota Office of Higher Education16 DAS Online Window Subject Category Topic Subtopic

17 November 2006Minnesota Office of Higher Education17 DAS Online Window Variable list Blue = continuous variable Green = categorical variable Red = weight

18 November 2006Minnesota Office of Higher Education18 Available variables Click on the view/download list of variables link to see all available variables.

19 November 2006Minnesota Office of Higher Education19 Locating variables in the NPSAS l Frequently Used: Variables l Aid: Application, Federal, Grants, Institutional, Net Price, Outside, Package, Ratio, State, Total l Background: Demographics, Family, Residence l Education: Attendance, Program l Employment: Description, Employer, Future, Licensure, Status, While Enrolled l Finances: Income l Institution: Other, Price, Type l Parent: Education, Family l Public Service: Participation l Survey: Sample, Weights There are two ways to select variables. The first is through the drop-down menus available on the main page. The menus are organized in the following categories:

20 November 2006Minnesota Office of Higher Education20 The second way to locate a variable is by clicking on the Search for variable link on the toolbar. This pop-up window will appear. Locating variables in the NPSAS

21 Using the DAS online Using the Variable Tags

22 November 2006Minnesota Office of Higher Education22 What kind of estimates can the DAS produce? Means (including observations = 0) Averages (of observations > 0) Percent distributions Percent positive (or greater than a selected value) Percentiles (10 th, 25 th, 50 th, 75 th, and 90 th ) (with or without observations = 0) Medians (the 50th centile) or Correlation matrices

23 November 2006Minnesota Office of Higher Education23 Variable Description Window Each variable window contains the following: a description of the variable the sources for the variable

24 November 2006Minnesota Office of Higher Education24 Variable Description Window And the distribution of the variable. In this case, 63.2 percent of the data has a value for the total amount received. The range for this variable is $50- $56,740. Remember- this information is for the national level, each state has its own distribution.

25 November 2006Minnesota Office of Higher Education25 Select a Tag for the Variable Click on the Select a tag tab to show the tag options available for the variable. These tags tell you the various ways this variable can be represented in your table.

26 Using the DAS online Practice exercises to illustrate the tags

27 November 2006Minnesota Office of Higher Education27 NPSAS - Exercise 1 What is the percent distribution of full-time, full-year undergraduates according to degree program and gender, by dependency status, institution sector, aid status, and age? Find the percentage of full-time, full-year independent male students who attended a public 4-year institution.

28 November 2006Minnesota Office of Higher Education28 Exercise 1 – Breakdown Run 1 – What is the percent distribution of undergraduates according to degree program, by dependency status and institution sector? Run 2 – What is the percent distribution of undergraduates according to degree program, by dependency status, institution sector, aid status, and age? Run 3 – What is the percent distribution of full-time, full-year undergraduates according to degree program and gender, by dependency status, institution sector, aid status, and age?

29 November 2006Minnesota Office of Higher Education29 Tags: Column_Cat Creates percentages for each category of a variable Missing values and legitimate skips are not included in any of the categories Responses coded as 0 are not included Pertains to categorical variables only Also applies to: Row_Cat, Span_Cat, By_Cat

30 November 2006Minnesota Office of Higher Education30 Tags: Row_Cat Similar to Column_Cat Creates a row of estimates for each category Responses coded as 0 are not included Pertains to categorical variables only Also applies to: Column_Cat, Span_Cat, By_Cat

31 November 2006Minnesota Office of Higher Education31 Tags: Row_Lump Creates customized categories by grouping existing variable categories Responses coded as 0 can be included Legitimate skips can be excluded or included in the new categorization Allows reordering of existing categories Pertains to categorical variables only Also applies to: Column_Lump, Span_Lump, By_Lump

32 November 2006Minnesota Office of Higher Education32 Tags: Row_Cut Divides a continuous variable into categories by specifying ranges Creates a row of estimates for each category Specify beginning cut-point value in each range Cut-point must be a number with a decimal (e.g., 10.5) Also applies to: Column_Cut, Span_Cut, By_Cut

33 November 2006Minnesota Office of Higher Education33 Tags: Row_Cut Range 1: (>= 0.5 and < 18.5) 2: (>= 18.5 and < 23.5) 3: (>= 23.5 and < 29.5) 4: (>= 29.5 up to infinity/max value) Range 1: (>= -0.5 and < 0.5) includes 0 2: (>= 0.5 up to infinity) at least $1 in aid

34 November 2006Minnesota Office of Higher Education34 Tags: Filter And_Filter Subsets (focuses on) the population of interest All conditions have to be met (filters selected) in order for case to be included Or_Filter Subsets (focuses on) the population of interest If any condition is met (filter is selected) the case will be included

35 November 2006Minnesota Office of Higher Education35 Tags: Filter Integer filter: Limit population to the categories selected. Cut-point filter: Limit population to those with values greater than or less than a specific point or between two points.

36 November 2006Minnesota Office of Higher Education36 Tags: Span_Cat Uses all of a variables categories to group sets of rows in the table Creates a subtable of estimates for each variable category Does not provide an overall summary table Warning: Drastically increases the number of estimates in the table See also: Span_Cut, Span_Lump

37 November 2006Minnesota Office of Higher Education37 NPSAS - Exercise 2 What percentage of full-time, full-year undergraduates received financial aid by dependency status, institution sector, and age? What was the average amount they received? Steps: Import exercise 1 Delete Column_Cat and Span_Cat tags Delete Row_Cut tag for Total Aid Add Percent and Average tags

38 November 2006Minnesota Office of Higher Education38 Tags: Percent> Defines a column of percentages based on values greater than a specified cut point Can be used with the Mean and Average>0 tags

39 November 2006Minnesota Office of Higher Education39 Tags: Mean versus Average Mean will include zeros in the denominator Average will not include zeros in the denominator

40 November 2006Minnesota Office of Higher Education40 Mean vs. Average All respondents, including those with no aid Only respondents who have aid

41 November 2006Minnesota Office of Higher Education41 Tags: By_Cat Creates a column of Average, Mean, or Percent> estimates for each category of a variable Can be used with only ONE Mean, Average>0, or Percent> variable Provides an overall summary column Will increase the size of your table See also: By_Cut, By_Lump

42 November 2006Minnesota Office of Higher Education42 Example of By_Cat with Percent> Percent> yields percent FT, full-year UG with aid By_Cat generates percent FT, full-year UG with aid by degree program. Ex: 77.4% of FT, full-year UG in a certificate degree program received aid.

43 November 2006Minnesota Office of Higher Education43 Representative Sample States NPSAS:04 is not designed to be representative at the state level except for undergraduates attending public 2-year, public 4-year, and private not-for-profit 4-year institutions in the 12 specific states. Use these to look at these representative sample states: - INSTSAST (NPSAS institution representative sample states) - INSTSTSE (NPSAS institution representative state sample by sector) Do not use: INSTSTAT ( NPSAS institution state)

44 November 2006Minnesota Office of Higher Education44 Tags: Centile vs. Centile>0 Generates percentile columns from continuous variables Produces the cut points for the following percentiles: 10 th, 25 th, 50 th, 75 th, 90 th Median = the 50 th centile -- the value above and below which half of the observations lie Centile includes zero values Centile>0 excludes zero values

45 November 2006Minnesota Office of Higher Education45 Example of Centile>0 Note: Last column shows the percentage of FT, FY undergraduates who received no aid.

46 Using the DAS online Saving, modifying and loading files:.tpf files

47 November 2006Minnesota Office of Higher Education47 Saving tables you created You can save the parameter file for re-use and modification Files containing the specifications for tables are called.tpf files, or table parameter files After creating a file in the DAS window, click on Save in the toolbar. The.tpf file will be saved to the location specified by you

48 November 2006Minnesota Office of Higher Education48 Uploading tables to the DAS application Click on Import in the toolbar. Locate the.tpf file to be uploaded and upload it. Note: for the DAS online application to read the file, they must be saved with the extension.tpf Once the file is uploaded, it can be altered and run as usual.

49 November 2006Minnesota Office of Higher Education49 Reproducing or modifying tables created by others You can download and use any parameter file used to create a report or ED Tab from our web site: http://nces.ed.gov/das.tpf files can be edited in a text editor (such as Notepad or Wordpad) but they must be saved with the.tpf extension (not the.txt default extension)

50 November 2006Minnesota Office of Higher Education50 Using the batch processor The batch processor allows you to run several tpfs at once You must create an account and log-in by clicking on Batch processor on the left-hand side of http://nces.ed.gov/dasol/ The files must be in added to a.zip file and then uploaded. After uploading the file, COPY down your batch number to retrieve your files

51 November 2006Minnesota Office of Higher Education51 Using the batch processor: rules for naming files There is one catch with the batch processor- it will not run files unless they have specific names (while the DAS has no such rules) All file names (.ZIP/.TPF/.CPF) must fulfill the following requirements Begin with a letter (for example, A, B, C,...X,Y,Z) Contain at least 2 but no more than 8 characters Not contain spaces between characters Not include symbols or special characters (underscore is allowed) These guidelines are available on the DAS website: http://nces.ed.gov/das/das_windows/run_1.asp http://nces.ed.gov/das/das_windows/run_1.asp

52 Using the DAS online Sampling and Data Issues

53 November 2006Minnesota Office of Higher Education53 Data sources by percentage Which sources did NCES use to collect the student data? Primary sources Institution records (CADE)95% Student interviews (CATI)70% Federal aid applications (CPS)60% Combinations of primary sources All three sources40% Two sources50% One source10% Additional sources Federal loans and Pell Grants (NSLDS)50%

54 November 2006Minnesota Office of Higher Education54 Data issues: data collection problems Data collection problems arose such as: Missing data No source or incomplete sources Data did not exist (EFC, student budgets) Discrepancies among sources Timing issues Reporting or data entry errors Students make guesses during interview Mismatches Student social security numbers Institution identification numbers

55 November 2006Minnesota Office of Higher Education55 Data issues: addressing the collection problems Imputation used to complete missing data or to check inconsistencies. NCES used two types of statistical imputation methods: Logical Stochastic (hot deck) Perturbation used to protect privacy of individuals. Social security numbers switched around for individuals. Reconciliation used to confirm data okay after imputation and perturbation.

56 November 2006Minnesota Office of Higher Education56 Sample size and weights 15 million undergraduates enrolled in Fall 2003 19 million undergraduates enrolled anytime during the 2003-04 academic year 80,000 undergraduate cases in NPSAS sample: Represent about 1 out of 240 undergraduates Therefore, average weight for each respondent = about 240

57 November 2006Minnesota Office of Higher Education57 Sample size and weights (cont.) Each NPSAS sample case has one record containing about 600 derived variables Each case has been assigned a weight The average weight for each case is 240, but there is a wide range of weight values There is only one weight for each case In general, the weights are lower for the 12 state cases

58 November 2006Minnesota Office of Higher Education58 Why Do The Weights Vary? Initial sampling rates differ (for the type of institution, type of student, 12 states, etc) Non-response weight adjustments- need to adjust for those who did not respond to certain questions Poststratification to known totals- the samples adjusted using poststratification to match known population totals Smaller sample sizes result in larger weights Lower institutional/student response result in larger weights Larger weights mean less precision in estimates

59 November 2006Minnesota Office of Higher Education59 An example to illustrate weights Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Total With grants 100 200 300 400 500 2000 1600 Weighted Total Grant $200,000 100,000 150,000 0 150,000 1,000,000 1,600,000 $2000 500 0 300 2000 Case Weight Example Case Grant Average $ 80% (1600/2000) $1,000 ($1.6 million/1600) $800 ($1.6 million/2000) % with grants Average grant Mean grant

60 November 2006Minnesota Office of Higher Education60 DAS Output Example: Total Grant Amount (TOTGRT) The weighted N shown in cells is the denominator Percentage>0TOTGRTTotal students Average>0TOTGRTStudents with grants MeanTOTGRTTotal students Function: Weighted N in cells: (denominator)

61 November 2006Minnesota Office of Higher Education61 Small Sample Sizes: Low N in DAS Output DAS will produce low N instead of an estimate When does this occur? if the denominator has less than 30 cases (meaning the sample size is less than 30) the result is suppressed by low N The rule-of-thumb in statistics: if the sample size is less than 30, you can not produce meaningful estimates of the population Percentages: The row (denominator) must have 30+ cases Average>0: The number in the cell (denominator) must have 30+ cases

62 November 2006Minnesota Office of Higher Education62 Small Sample Size Example Dependents Independents Weighted Ns shown: Dependents Independents Average grant # of Cases [not shown] % Grants Cases in denom. [not shown] [100] [50] 5,000 20% 80% 5,000 Low N $400 Low N 4,000 [20] [40] 5,000 Mean grant $80 $320 5,000 Cases in denom. [not shown] [100] [50] 5,000 Note: The weighted Ns do not give an indication of the size of the samples. The number of cases in each category is not shown in the DAS output. Only those with access to the raw data know this information.

63 November 2006Minnesota Office of Higher Education63 Poststratification to known totals Primary weights were adjusted in computer models using 75 control totals to reflect: National enrollment totals for sectors (9 totals) National total Pell Grant dollars by sector (9 totals) National total Stafford loan dollars by sector (9 totals) 12 state Pell dollars by sector (36 totals) 12 state Stafford loan dollar totals (12 totals)

64 Statistical Analysis Standard errors and analyzing estimates

65 November 2006Minnesota Office of Higher Education65 Reliability of NPSAS data Representative data At the national level For the three major sectors at the state level Unlike the Census, this does not provide data for the whole population, only for a sample of institutions and students. When analyzing data, the uncertainty and errors related to sample data must be kept in mind.

66 November 2006Minnesota Office of Higher Education66 Standard errors Standard errors accompany certain statistical estimates- such as percents, averages, and means. Specify expected uncertainty in study results. Reflects the extent to which a study result represents the true value in the population. Calculated from two general sources of error.

67 November 2006Minnesota Office of Higher Education67 Errors in data Sampling error occurs due to... Random-chance selection of too many of a particular type of student or institution. Measurement error occurs due to... Refusal of some students or institutions to participate Not all students and institutions provide data for each item Respond differently to items. Mistakes in recording and coding responses.

68 November 2006Minnesota Office of Higher Education68 Analyzing estimates by assessing their errors All estimates have some measure of error accompanying them. There are 2 ways of analyzing the errors in NPSAS data: One-Sample Case For any given statistic, how representative is the statistic of the population (parameter)? Two-Sample Case: Comparing 2 statistics Do the sample statistics differ enough to conclude that the populations actually differ on the measured characteristic (or parameter)?

69 November 2006Minnesota Office of Higher Education69 One-sample case: confidence intervals Confidence intervals provide a range for the estimate- this interval represents the probability that the populations true statistics is actually in the interval The larger the confidence interval, the less precise the estimate and the wider the range of possible population statistics This will be easier to illustrate with an example.

70 November 2006Minnesota Office of Higher Education70 One-sample case: confidence intervals (CI) (cont) Constructing a CI for the percent of all dependent students in Minnesota who applied for federal aid: NPSAS institution representative sample states = Minnesota Applied for any aid Applied for federal aid (%>0.5) --------- Dependency status = Dependent --------- - Estimates Total88.177.6 Race-ethnicity (with multiple) White88.477.6 Minority/non -white 85.577.5 Standard Errors Total1.201.27 Race-ethnicity (with multiple) White1.331.53 Minority/non -white 3.842.97 To construct a confidence interval with 95 percent confidence level (which means that the interval contains the true population average 95 percent of the time), find the estimate and its standard error Multiply the standard error by 1.96 1.96*1.27=2.489 Subtract and add this number from the estimate 77.6 -/+ 2.489 = (75.111, 80.089) This is the 95 percent CI for this estimate- about 95 percent of the time (if this sample is repeated), the actual number of dependent students in MN who applied for federal aid is between 75%-80%

71 November 2006Minnesota Office of Higher Education71 One-sample case: confidence intervals (CI) (cont) The CI for the percent of all dependent students in Minnesota who applied for federal aid: This interval represents the upper and lower values, with 95% probability, that we would expect to observe the true population characteristic (or parameter) i.e. the actual percent of dependent students in MN who applied for federal aid is between 75%-80% 75.1% 77.6% 80.1% % receiving aid

72 November 2006Minnesota Office of Higher Education72 Two-sample case: comparing two estimates Construct CIs to compare the difference between the percent of white and minority/non-white dependent students in MN who applied for federal aid: NPSAS institution representative sample states = Minnesota Applied for any aid Applied for federal aid (%>0.5) --------- Dependency status = Dependent ---------- Estimates Total88.177.6 Race-ethnicity (with multiple) White88.477.6 Minority/non- white 85.577.5 Standard Errors Total1.201.27 Race-ethnicity (with multiple) White1.331.53 Minority/non- white 3.842.97 Construct a CI with 95 percent confidence level for each estimate: The CI for the % of white students who applied for federal aid: 88.4 -/+ (1.33*1.96) = (85.8, 91.0) The CI for the % of minority/non-white students who applied for federal aid: 85.5 -/+ (3.84*1.96) = (78, 93) Now compare these two CIs- do they overlap?- In this case, they overlap which means that the differences are NOT statistically significant. For two estimates to be statistically significant, the CIs must not overlap.

73 November 2006Minnesota Office of Higher Education73 Two-sample case: comparing two estimates Not only are these estimates not statistically significantly different, but we can learn something else from this sample. The large standard error for the minority/non-white estimate indicates that there is some error in this estimate. In this case, the sample is small which reflects the fact that the population in Minnesota is small (thus a larger standard error is to be expected). White students Minority/non-white students 78% 85.5% 93% 85.8% 88.4% 91%

74 November 2006Minnesota Office of Higher Education74 Two-sample case: another approach for comparing two estimates Besides constructing CIs, you can use the two sample t-test. Either you can do this by hand using the equation below or by going to the DAS help center and selecting on T-tests. The two-sample t-test uses the estimates and the standard errors: Estimate 1 – Estimate 2 ((Std Error 1 ) 2 + (Std Error 2 ) 2 ) The result of this calculation is compared to 1.96; if it is larger than 1.96, then the difference between the estimates is statistically significant. In this case, 45.82 – 13.47 ((3.2) 2 + (1.42) 2 ) This equals 9.24. Since this is larger than 1.96, the difference between these two estimates is statistically significant. NPSAS institution representative sample states= Minnesota State grants total (>0.5%) Estimates Total18.71 Income of dependent student's parents < $40,00045.82 $40,000 +13.47 Standard Errors Total1.08 Income of dependent student's parents < $40,0003.2 $40,000 +1.42

75 November 2006Minnesota Office of Higher Education75 Two-sample case: another approach for comparing two estimates The two sample tests (both the CI comparisons and the two-sample t-test) are meant for comparing two distinct populations (i.e. no overlap). If the populations overlap, such as if one is a subset of the other (like Minnesota and the U.S.), then the two-sample t-test has a correction factor and the following test statistic is used: Estimate a – Estimate b Square root of (SE a 2 + SE b 2 – 2 * r ab * SE a * SE a ) Since the middle term, 2*r ab, is not available, we can set this up without that term. Then it looks like the regular two sample t-test. Note, this test statistic is more conservative than it would be if we had used the correct formulation.

76 The End – Thank you! For more information, contact Tricia Grimes Tricia.Grimes@state.mn.usTricia.Grimes@state.mn.us Shefali Mehta Shefali.Mehta@state.mn.usShefali.Mehta@state.mn.us For technical support: Aurora D'Amico (NCES) Aurora.DAmico@ed.govAurora.DAmico@ed.gov For questions about the NPSAS 2004: Tracy Hunt-White (NCES) Tracy.Hunt-White@ed.govTracy.Hunt-White@ed.gov


Download ppt "NPSAS DAS Training December 2006 Training Shefali V. Mehta Minnesota Office of Higher Education."

Similar presentations


Ads by Google