2 Probability and Statistics Representation of DataMeasures of Center for DataSimple Analysis of Data
3 OverviewIn this module you’ll be learning about the basics of statistics: Statistical Displays – Data can be displayed graphically in different ways. You will learn how to choose displays by the type of date and the message to be delivered to the audience. Some “do not” examples will also be covered. Measures of Center – A single number or data is commonly used to describe an entire set of data. You will explore the different types of “averages” and learn why you might choose one over another. Analysis – This module covers the simple analysis of the data. You will look at what information can be obtained from the data and how to make comparisons of various data sets.
4 Topics An Introduction to statistics Types of data Displaying data How NOT to display dataArithmetic MeanMedianModeWeighted MeanTypes of distributionsMeasures of center vs. variation
5 Statistics: An Introduction Introduction to StatisticsStatistics: An IntroductionThe first page at this site gives an explanation of how statistics is used. Clicking on the “Continue” link on the bottom right of the page will take you to the section on “Revealing Patterns Using Descriptive Statistics”. It may be worthwhile to read the first page of this section to review some of the common terms/vocabulary used in describing data. Return to this presentation when you are ready.(On the right of the web page, you will notice additional information that is beyond the scope of this module. Feel free to come back at a later time to explore further.)Statistics is the set of mathematical tools for collecting, organizing and analyzing data; and then interpreting the information to make decisions
7 Types of Data What is Data? Read the text on the web site. Answer the nine questions at the bottom of the page to check your understanding of the topic. Return to this presentation when finished.Data can be qualitative, describing distinct categories, or quantitative, describing numerical counts or measurements. Qualitative data can be nominal, where no natural order exists between the categories, or ordinal, meaning an order does exist. Quantitative date can be divided into continuous, when data are values within a range, and discreet, when the measurements are integers.
8 Another explanation of Types of Data Types of Data (cont.)Another explanation of Types of DataIf you are still unsure about recognizing qualitative and quantitative data, click on the link above to review how to distinguish between these two variables. When you have completed the “Progress check” at the bottom of the web page, return to this presentation.You should now be able to classify data as qualitative nominal, qualitative ordinal, quantitative discrete or quantitative continuous, and are ready to explore how to display data. Continue to the next slide!
9 Types of Graphs Common Graphs Visit the above website for a brief description and representation of ten of the most common graphs.You should have a basic idea of the types of graphs that can be created to display data. As you move on through the slides, you will learn how to create these graphs and how to determine which graph gives the best representation of the data you want to display.
10 Types of Graphs (cont.) Bar graph Histogram Pie graph Line graph Math DictionaryThis web site is home to a mathematics dictionary. It has examples to the graphs listed below. Return to this presentation when finished.There are various ways to display your data. The differences arise from the type of data and the information and/or message you want to deliver. Following is a list of the more traditional types of graphs.Bar graphHistogramPie graphLine graphBox plotScatter plotLine plot/Dot plotPictographStem & Leaf plot
11 Bar GraphsBar GraphsClick on the link above to learn how to create a bar graph. After reading the information on bar graphs, answer the ten questions in the “Your Turn” section at the bottom of the page.One way to graphically represent data is by using a bar graph/chart. What type of data is best represented by a bar graph? What information about a data set should you be able to interpret from a bar graph?
12 Histograms Histograms Read about histograms by following the link above. Check your understanding by answering the ten questions in the “Your turn” section at the bottom of the web page.Histograms can be used to represent continuous data. You should be able to identify data that is continuous and be able to create a histogram to represent that data.
13 Create a Histogram (video) HistogramsCreate a Histogram (video)This video demonstrates how to take a data set and create a histogram.Histograms are best used when the data variable on the x-axis is quantitative. The bars most often represent a range of values. Each bar could also represent an individual value. In this case, the histogram would more accurately be called a frequency distribution graph.
14 A Histogram is NOT a Bar Chart Histograms (cont.)A Histogram is NOT a Bar ChartIt is important to distinguish the difference between a histogram and a bar chart. This is the first of several sites that will help you determine when to display data as one or the other. Read the information on the first page and then return to this presentation.Histograms and bar charts can look similar even though they display very different representations of the data. After reading the information on the web page linked above, you should be able to identify three differences between histograms and bar charts.
15 Bar Charts and Histograms (includes a video) Histograms (cont.)Bar Charts and Histograms (includes a video)This webpage has additional information on when to use a bar chart or histogram to display your data. You can also view the video which shows how to create a bar chart and histogram.
16 Histograms vs. Bar Graphs Histograms (cont.)Histograms vs. Bar GraphsClick on the link above to read more about the differences between histograms and bar charts. The information is set up as a conversation between a teacher and student reasoning through how each graph can be used to display specific types of data. When you have finished reading the discussion, please return to this presentation.Can you answer the following questions?What type of data would be best represented by a histogram?What information should you be able to identify when data is represented in a bar graph?
17 Pie ChartsPie ChartsRead about how to create a pie chart and what type of data displays best in this format. Be sure to complete the questions at the bottom of the web page in the “Your turn” section and then return to this presentation!Pie charts represent data as a part-to-whole relationship.
18 Pie Charts (cont.) Pie Charts This site looks at how NOT to use pie charts, along with showing many examples found in the news, in business reports and other media.You should be able to answer the following question regarding pie charts:What is the best type of data to represent graphically in a pie chart?When interpreting information from a pie chart, what are three areas you should pay attention to in the representation?
19 Scatterplots What is a Scatterplot? This site will introduce you to scatterplots. Click the blue “View Video” button to see how to make and read scatterplots. Once you have watched the video and read through the information on this webpage, return to this presentation.Main points:A scatterplot is used to graph the relationship between two quantitative variables or bivariate data;Scatterplots may show patterns – weak or strong, positive or negative correlations;Correlation does not indicate cause and effect.
20 Scatterplots and Correlation Scatterplots (cont.)Scatterplots and CorrelationThis site presents another view of scatterplots and correlation. After reading this information, answer the nine question in the “Your turn” section at the bottom of the page.Explore further… At the above website, under the correlation graphs, is a link More About Correlation. Here you will see how correlation is calculated. In most cases, you will use a calculator or software function for this; however, it’s beneficial to know how the correlation coefficient is derived.
21 Line GraphLine GraphThis website gives many examples of line graphs and explains what makes a line graph different from a scatter plot. Read through this information and then return to this presentation.Main ideas:Line graphs help to determine the relationship between two sets of values;Value sets represent an independent variable and an independent variable;Line graphs are useful in showing trends and making predictions.
22 Line Graph (cont.) Line Graphs Check your understanding in interpreting line graphs by answering the ten “Your turn” questions at the bottom on this webpage.You should now be able to answer to following questions:What are the main differences between scatterplots and line graphs?What type of data is best represented in a line graph?
23 Box Plots (YouTube video) This video introduces you to Box Plots, as it demonstrates how to create a box plot and defines the vocabulary terms listed below. When you have finished viewing the video, return to this presentation.Vocabulary to understand box plots:DistributionMedianAverage (Mean)ExtremesQuartilesInterquartile Range
24 Quartiles / Interquartile Range / Box and Whisker Plot Box PlotQuartiles / Interquartile Range / Box and Whisker PlotThis webpage gives another look at the breakdown of Box Plots. Once you have read through the information, try answering the ten “Your turn” questions at the end of the page. (Tip: It will be helpful to have scrap paper available)At this point, you should be able to:determine the lower, middle and upper quartiles of a data set;calculate interquartile range;construct a Box and Whisker Plot to represent the data;compare box plots from two data sets and make observations about the distributions.
25 Boxplot (aka, Box and Whisker Plot) If you need additional information to understand boxplots, click on the link above and “View Video”, which gives more details on how to read a boxplot. When you have finished reading through Boxplots Basics and How to Interpret a Boxplot, return to this presentation.
26 Stemplots (aka, Stem and Leaf Plots) Stem & Leaf PlotStemplots (aka, Stem and Leaf Plots)Click the blue button to View Video and then read the information on stem and leaf plots. For additional explanation about this type of graph, proceed to the next slide.Use to display quantitative dataBest used with small sets of dataShows shape of distributionStem values can have any number of digitsLeaves can only be represented by one digitLimitations displaying decimals
27 Stem & Leaf Plot Stem and Leaf Plots This site provides additional details on “splitting the stems” and “splitting stems using decimal values”.You should now know: • under what circumstances stems should be split; • how to organize decimal data in a stem and leaf plot; • how to interpret data by looking at a stem plot.
28 Line Plot (YouTube Video) Line Plot / Dot PlotLine Plot (YouTube Video)View this video on how to make a Line Plot then return to this presentation.Vocabulary:ClustersGapsOutliers
29 Dot Plot vs. Line Plot (YouTube Video) Line Plot / Dot PlotDot Plot vs. Line Plot (YouTube Video)This YouTube video does a good job describing the similarities and differences between a line plot and dot plot. Then return to this slide and click here to re-enforce what you have have learned about Line and Dot plots.
30 Picture Graph / Pictograph PictographsRead the information on Pictographs and then answer the nine “Your turn” questions at the bottom of this webpage.In a Pictograph, symbols are used to display statistical data. Symbols can be misleading if not accurately proportioned or if the symbols can not be divided evenly to represent fractional parts.
31 Types of Graphs Comparing Graphs Test your understanding of the graphs covered in this unit. At this website, read through the problems and decide which graph most clearly represents the data and what information is to be conveyed to the reader. Also, work through the five questions at the bottom of the page.Most data can be represented using multiple graphs. Decisions on the most appropriate display should be make based on what information you want the reader to draw from the graph.
32 An Advanced Display of Data Hans RoslingProbably one of the most informative and modern displays of data can be seen from the work of Hans Rosling. The link above shows a video of his TED talk in It is a 20 minute video and it gets very interesting about 4 minutes into the video. Watch it all if you have time but we recommend at least 10 minutes.The point of this experience is not that we expect you to duplicate this extraordinary presentation, but that you appreciate the power of displaying data in a clear and understandable method. Any enhancement of the display should be for the purpose of clarity and not just distracting visuals.
33 Misleading Line Graphs by Khan Academy How NOT to Display DataMisleading Line Graphs by Khan AcademyThe above link is by Salman Khan, founder of the Khan Academy. In his video he highlights the misleading visual displays of a line graph (5 min). Return to this presentation when finished.Often times a data display can mislead the reader. At times this may be intentional when the creator wants to persuade the reader in some way. Other times it may be unintentional when the creator tries to make the display more visually appealing and causes the reader to misinterpret the results.
34 Misleading Graphs by Wikipedia How NOT to Display DataMisleading Graphs by WikipediaThe above link by Wikipedia, shows various ways a graphic display can mislead its intended audience. Return to this presentation when finished.Typically the displays we see are technically accurate but they use visual “tricks” to mislead the reader who may not pay close attention to the details of the graphic display.
36 Definitions of these terms Measures of CenterDefinitions of these termsIf you are not familiar with the terms listed below, follow the link above to familiarize yourself with these terms.(The above site includes other measures of center that are beyond the scope of this presentation)Different ways to measure the center of dataArithmetic mean (commonly called average or just mean)MedianModeWeighted mean
37 Measures of Center Central Values The link above gives some simple examples of measures of center and compares the mean, median, and mode. Check your understanding with the ten questions at the bottom of the page before returning.What is meant by “Measure of Center”? Sometimes we want to describe a group of data (numbers, values) by a single number. The advantage of this is the ability to more easily compare different groups of data. The disadvantage is when you describe a data set by a single number you lose the details and could mislead someone.
38 Arithmetic MeanThe mean of a set of data is found by adding all the data values and dividing that answer by the number of points. (often referred to as “n”)StrengthsIts calculation includes all the dataIt is common and more likely understood by othersIt is often used in other statistical formulas
39 Arithmetic Mean Weaknesses Sometimes you don’t know all the data points needed to calculate the mean (data may be in a graph only)An extremely large data set may be difficult to calculate.It can be influenced by outliers, those values much larger or smaller than the rest of the data.It is often a value that is different than any of the data valuesWhen best to useThe mean is best used when you data is continuous and symmetrical.Often necessary for use in other statistical measures.
40 Lessons on Arithmetic Mean How to Find the MeanVisit the web site above to learn more about the arithmetic mean. After reading the lesson make sure and check your understanding by answering the ten questions at the end. In case you missed it, make sure and check out the “mean machine”. Run this virtual machine to see the relationship between the data points and the mean value.
41 Wikipedia defines median The web site above give a very detailed definition of median. (Many of the examples are beyond the scope of this presentation)The median of a set of data is found by arranging all the data in numerical order and then selecting the data point in the middle. If the data has an even number of values the median is the mean of the two central values.StrengthsRequires little if any mathematical calculationIt is not effected by outliers (large or small data points)It can be approximated from a frequency distribution or a distribution graph
42 Median Weaknesses When best to use Arranging a large set of data in order can be very difficult.When best to useThe median is usually preferred when the data distribution is skewedIt is used with ordinal data when the mean cannot be used
43 How to Find the Median Value Lessons on MedianHow to Find the Median ValueVisit the following web site to learn more about the median. After reading the lesson make sure and check your understanding by answering the ten questions at the end.
44 Comparing Mean & Median Mean / Median AppletThe link above gives you the ability to see how the mean and median change as the data points change. The applet allows you to drag data points on the line or move data points on the line. Take some time and play with this applet and see how the mean and median change and compare.You can also check the box for “box plot” to see how a boxplot would look with the data that shows on the line.When you have finished, jot down the patterns you have observed and then return to this presentation
45 Comparing Mean & Median Seeing StatisticsUse the link above for a more comprehensive lesson on the attributes and differences between the mean and median. The link will take you to an introduction of the web interphase. When you think you are familiar with how to navigate the system, click on the icon in the left column. When the table of contents show, click on lesson #3 “Describing the Center”. You can advance from one page to the next by clicking on the icon in the top, right corner of the page. Return to this presentation when you finish.
46 Wikepedia defines mode The web site above give a very detailed definition of mode.(Many of the examples are beyond the scope of this presentation)The mode of a set of data is found by identifying the data element that occurs most often. Many people remember this by associating the word “most” with mode.StrengthsDepending on the display of the data or the size of the data, it is often easy to identifyIt is the ONLY measure of center you can use for non-numeric data (nominal data). Example: What is the best measure of center for the eye color of this group of people?
47 Mode Weaknesses When best to use Sometimes the data set could have more than one mode or even multiple modes.Often the data does not have any data element that is more numerous than any other.Sometimes the mode is nowhere near the center of the data.When best to useIt is the only measure of center valid with nominal data (Example: data on student’s eye color)It can support the validity of the mean and median if it has a similar value. If the data is perfectly normal, mean=median=mode
48 How to Find the Mode Value Visit the web site above to learn more about the mode. After reading the lesson make sure and check your understanding by answering the ten questions at the end.
49 Wikipedia defines weighted mean The web site above give a very detailed definition of weighted mean.(Many of the examples are beyond the scope of this presentation)Sometimes certain values in a data set contribute more to a measure of center than other values. In this situation, we calculate a weighted mean.
50 Dr. Math explains weighted mean (weighted average) The web site above gives examples of calculating a weighted mean or weighted average.A simple example:Consider a university that teaches two classes. One class has 10 students, the other has 100 students. If you ask the university the average (mean) class size they respond with 55. (100+10)/2. However, if you ask every student what size class they are in to find the mean you would get [(100 * 100) + (10 * 10)] / 110The 100 students in the larger class carry more WEIGHT that the 10 students in the smaller class.
51 Weighted MeanWeighted MeanVisit the web site above to learn more about the weighted mean. After reading the lesson make sure and check your understanding by answering the ten questions at the end.
53 Simple Exploration and Analysis of Data Exploring DataRead the first page in the link above to learn about the importance of thinking carefully about how to interpret what a data set can reveal.Measures of center, like the mean, median, and mode, give useful information about a data set, but is hidden by such single number summaries.To understand the information in a complex or large data set, it is important to examine the integrity of the data, to look out for interesting and useful patterns, and to summarize the data skillfully.Patterns are likely to be found more easily in visual, rather than numerical, representations of the data.A single number is unlikely to summarize the data effectively.
54 Data Integrity - Outliers Outliers are numbers in a data set that are very different from all the others. Read the information in the link above to learn more about outliers. Then work the problems at the end to test your understanding.Why do some data sets have outliers? Have these numbers been recorded wrongly? Do they correspond to bad mistakes, e.g. in measurements?You should have found from the problems you worked thatOutliers don’t have much effect on the median;Outliers can have a big effect on the mean.How can we tell when a “suspicious” number is a “genuine” outlier rather than just being at the limits of what is “normal”?What, if anything, should we do about outliers? If we ignore them, will they make problems for how we interpret the data?We’ll discuss some of these issues in the pages ahead.
55 Patterns of DataPatterns of DataOpen the link above to read about various ways to describe and identify patterns in data sets.To begin with, we will concentrate on finding ways to measure the spread of a data set.This will give us two numbers (a measure of center and a measure of spread) to use when we summarize a data set. Two is better than one.The interaction of center and spread is important. If a data set has small spread, the values will be clustered closely around the center, so the center will represent the data values well.
56 RangeRangeDiscover what is meant by the range of a quantitative data set by reading the explanation in the link above and working through the activities.The range of a data set is the difference between the largest and smallest values. It is the simplest measure of the spread of the data.Strength:Very simple to compute.Weaknesses:Very sensitive to unreliable data or outliers; it is easy for an inaccurate measurement to be much bigger or smaller than the others; this could have a big and misleading impact on the range.Only uses two data values, so a lot of information about the data set is lost.
57 Quartile Definition and Computation QuartilesQuartile Definition and ComputationClick the link above to read about quartiles and how to compute them. Then check your understanding by working the problem at the end.Half the values in a data set are at least as big as the median – and half the values are no bigger than the median.The first (or lower) quartile Q1 is essentially the median of the lower half of the data set;The third (or upper) quartile Q3 is essentially the median of the upper half of the data set.Controversy:There is no generally accepted agreement about how precisely to compute upper and lower quartiles. In fact, different calculators or software packages will give different results for the quartiles of the same data set.
58 Interquartile RangeInterquartile RangeOpen the link above to learn what interquartile range (IQR) means and how to compute it. Don’t miss the imbedded YouTube video. Then read more here and check your understanding by answering the ten questions at the end.The IQR tells you about the spread of a data set through focusing on the middle 50% of the data. Its value is the length of the box in a box and whisker plot.Advantages of the IQR as a measure of spread:Easy to compute;Much less likely than the range to be affected by outliers.When best to use:When you use the median as a measure of center, the IQR is a good measure of the spread of a distribution.
59 Interquartile Range and Outliers IQR and OutliersVisit the link above to find how to use the interquartile range to identify outliers in a data set. Then try your hand at the problems in this set.A standard convention is that a number in a data set is an outlier if it is at least 1.5 IQRs away from the median.It is important to understand that the choice of 1.5 IQRs is not specified by any theory. It is an arbitrary convention – but it has worked well for many years.An outlier can easily be spotted in a box and whisker plot – the end of a whisker that is more than one and a half times as long as the box.
60 Five Number SummaryFive number summaryThe link above explains how a box and whisker plot provides five numbers that conveniently summarize a data set. Five summary numbers allow us a much richer analysis of a data set than a single measure of center. Practice problems here.The five number summary of a data set is based on a box and whisker plot. It consists ofThe biggest value;The third quartile;The median;The first quartile;The smallest value.It provides representative information about a data set that easily leads to a measure of center and a measure of spread at the same time as making outlier detection a simple matter of checking for over-long whiskers.
61 Standard DeviationStandard DeviationSo far we have measured the spread of a data set in ways(range, IQR) that associate well with the median measure of center. Read the link above to see how to measure spread in a way (standard deviation) that is compatible with the mean measure of center. Then work the problems at the end.The deviation of a data value from the mean is just the difference between the two.The variance of a data set is the average of the squares of the deviations (with a slight adjustment if the data consists of a selection from all possible values.)The standard deviation of a data set is the square root of its variance.It is usually better to avoid computing variance or standard deviation by hand.Many calculators have these computations pre-programmed, so it is easy to get the information once the data is entered.
62 Standard Deviation Video Standard Deviation and OutliersStandard Deviation VideoHere’s a video to help you understand the concept of standard deviation.Standard deviation behaves like the average distance of the data values from their mean. It is a measure of the spread of the data set:When the average distance is small, many of the data values will be clustered around the mean, and the spread will be small;When the average distance is large, many of the data values will be far from the mean, and the spread will be large.We viewed a data value as an outlier when it was “far away” in IQR terms from the median. We shall also label a data value as an outlier when it is far away from the mean, as measured in terms of the standard deviation.One convention is that an outlier is at least 3 standard deviations from the mean – but this is a matter of debate, as it what you should do about outliers..
63 Distinguishing Between Data Sets Anscombe's QuartetIt is important to realize that very different data sets can have identical means and standard deviations. In other words, they have identical center and spread, but look very different. The link above is to Anscombe’s famous examples.Our objective is to understand and analyze our data sets. Because of Anscombe’s and similar examples, simple number summaries cannot give complete answers to our questions.It is necessary to look for patterns in other ways.
64 Challenging ProblemsChallenging ProblemsBefore changing direction, here are some problems that you will need to think about carefully. If you get stuck, there are solutions posted.
65 Frequency Distributions Consult the link above as well as this one to find out about frequency distributions. Make sure you confirm your understanding by working the problems.Sometimes a value occurs more than once in a data set. Its frequency is the number of times it appears in the list of values.By putting the values into bins if appropriate and counting up the total frequency in each bin, it is not difficult to create a frequency table that can be represented as a histogram.
66 Relative Frequency Distributions To create a relative frequency distribution, we proceed as for the frequency distributions, but scale each of the frequencies by dividing by the total count of data values. This scaled frequency is the relative frequency. Read the link above and here, working the problems.Relative frequency distributions are actually distributions of probabilities.Using relative frequency distributions allows us to compare data sets of different sizes on an equal basis.If we selected one data set of 100 measurements and another data set of 1000 measurements by taking samples from a massive collection, there shouldn’t be too much difference between how each value compares with the others, regardless of which data set we examine.However, we should expect the frequency (e.g. 49) of one value in the second data set to be roughly 10 times its frequency (e.g. 5) in the first set. On the other hand, the relative frequencies should be similar (e.g. 4.9 and 5.0.)
67 Describing Data Patterns Describing Data Patterns IDescribing Data PatternsThe link above describes some basic patterns that can occur in frequency distributions of data. These descriptions go beyond measuring center and spread, and focus as well on shape and unusual features.Some distributions are symmetric around the mean – which must then coincide with the median. (Why is this the case?)Some distributions are skewed with a tail to the right or the left.Some distributions have more than one mode.
68 Describing Data Patterns II Data Pattern VideoSee whether you can use your knowledge of the previous slide to answer the questions in the YouTube video linked above.It turns out that distributions that are symmetric around the mean/median and that have a single mode that also coincides with the mean/median are the most important of all.
69 Characteristics of a Normal Distribution Consult the link above for basic facts about the normal distribution and how it arises. As always, work the problems at the end. Check out the YouTube video here for ways to test carefully and systematically whether or not your data really is normal.The continuous normal distribution has a distinctive bell shape, with the mode, mean and median all at the same place. The bell is wider when the standard deviation is greater, but the basic form is always the same.Approximately 68% of all values are within 1 standard deviation of the mean, 95% are within 2 standard deviations of the mean, and 99.7% are within 3 standard deviations of the mean.Normal distributions are continuous distributions that tend to arise in measuring heights, sizes, pressures, temperatures, and so on.
70 Standard Normal Distribution Consult the link above for basic facts about the standard normal distribution and work the problems at the end.A normal distribution is standard if it has mean 0 and standard deviation 1.All normally distributed data sets can be converted simply to a standard normal distribution. If the original data set has mean μ and standard deviation σ, the new data set created by replacing the old data values x by new data values z = (x-μ)/σ will be a standard normal distribution.It is common and useful to convert normal distributions to standard form; this makes it much easier to make comparison.
71 Characteristics of a Binomial Distribution Poisson DistributionConsult the Wikipedia link above to seek out basic facts about the Binomial distribution. It is a discrete probability distribution that “expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.A Poisson distribution is a discrete distribution that takes on values that are 0 or a positive integer.The mean and variance of a Poisson distribution are always the same.When the mean/variance is small, the Poisson distribution is skewed wit a long tail to the right.When the mean/variance is large, the Poisson distribution closely resembles the normal distribution with the same mean and variance.
72 Binomial Distribution Characteristics of a Binomial DistributionBinomial DistributionConsult the Wikipedia link above to fish for basic facts about the Poisson distribution. It is a discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.A binomial distribution B(n,p) is a discrete distribution that takes on values that are 0 or a positive integer no greater than n.The mean is np and variance is np(1-p).A Poisson distribution is not symmetric, except when p = ½.When n is large and both np and n(1-p) are not too small, the binomial distribution closely resembles the normal distribution with the same mean and variance.