Histograms and Probability Distribution
In mathematics, a probability distribution assigns to every interval of the real numbers a probability, so that the probability axioms are satisfied. A probability distribution is a special case of the more general notion of a probability measure. Every random variable gives rise to a probability distribution, and this distribution contains most of the important information about the variable. If X is a random variable, the corresponding probability distribution assigns to the interval [a, b] the probability P [a ≤ X ≤ b], i.e. the probability that the variable X will take a value in the interval [a, b].
Probability distributions are basically of two types:
Discrete Probability Distribution is that which can take only a limited number of values, which can be listed. The probability of taking birth in a given month is discrete because there are only 12 possible values (12 months of the year). Binomial and Poisson distributions are examples of discrete distributions.
In a Continuous Probability Distribution, the variable is allowed to take on any value within a given range. Normal distribution is an example of continuous distribution.
Histogram and Probability Distribution Curve
Histograms are bar charts in which the area of the bar is proportional to the number of observations having values in the range defining the bar. Just as we can construct histograms of samples, we can construct histograms of populations. The population histogram describes the proportion of the population that lies between various limits. It also describes the behavior of individual observations drawn at random from the population, that is, it gives the probability that an individual selected at random from the population will have a value between specified limits. When the area of a histogram is standardized to 1, the histogram becomes a probability density function. The area of any portion of the histogram (the area under any part of the curve) is the proportion of the population in the designated region.
Strictly speaking, the histogram is properly a density, which states the proportion that lies between specified values. A (cumulative) distribution function is something else. It is a probability distribution curve whose value is the proportion with values less than or equal to the value on the horizontal axis, as the example to the left illustrates. Densities have the same name as their distribution functions. For example, a bell-shaped curve is a normal density. Observations that can be described by a normal density are said to follow a normal distribution.
Here are a few examples of histograms illustrating normal distribution. These histograms illustrate the probability distribution for success in various coin tosses. The x-axis here indicates the number of heads in a particular sequence of coin tosses; the y-axis represents the theoretical frequency of that result in the given number of fair tosses.
The histograms will have different sizes and shapes, because the frequency distribution changes with the number of tosses. All the histograms are perfectly symmetrical around the centre (the tallest and therefore most frequent value). These diagrams represent probability distributions, or the frequency of results theoretically calculated. And since the total of all the probabilities for an event equals 1, the shaded area contained in all the columns equals 1.
This diagram indicates that in a three-coin-toss sequence (or three coins tossed simultaneously) there are four possible results: 0 heads, 1 head, 2 heads, and 3 heads (the values on the X-Axis). The percentage frequency of these four possibilities we read off the Y-Axis. We read the following diagrams in the same way: the number of heads on the X-Axis, and the percent probability on the Y-Axis. Notice the perfect symmetry in these distributions.
Notice in the above histogram (for 20 coin tosses) how at the extremes (0, 1, 2, 18, 19, 20) the percent probability is so small that the value does not show on the graph. Virtually all the results in a 20 coin-toss sequence will fall between 3 and 17, with the most frequent value in the centre (at 10). The frequencies on either side of 10 are perfectly symmetrical (we can see that by the equal heights of 9 and 11, of 8 and 12, of 7 and 13, of 6 and 14, of 5 and 15, or 4 and 16, of 3 and 17.
The Probability Distribution Curve
As the number of columns increases, the entire shape of the histogram begins to approximate a curve, with the shaded areas all under the top line. And, in fact, we can readily convert these histograms (using rectangles) to a curve by joining up the central points on the top of each column.
When we join up the columns in the histogram in this way, we produce a particularly useful statistical shape, the normal curve, which is a probability distribution curve.
The normal curve is a theoretical depiction of the distribution of frequencies of the values. It does not tell us that in any particular series of measurements of a normally distributed item half must lie above and half below the mean. It indicates that there is a 0.5 probability that in any series of values, any particular score will lie above or below the mean and that the average will fall in the centre of the distribution. Or, put another way, in any measurement of a heritable characteristic (height, intelligence, weight, and so on) 50 percent of the population will be below the arithmetical average (the mean), because such characteristics are normally distributed. It is not the case that in any distribution exactly 50 percent of the population will fall below the meanbut that must be the case if the frequency distribution is a normal curve.
No comments:
Post a Comment