Q.4 Explain the following terms with examples.
Course: Introduction to Educational Statistics
Course Code 8614
Topics
- Degree of Freedom
- Spread of Scores
- Sample
- When can we measure spread?
- Why do we measure spread?
- Confidence Interval
- Z Score
- Confidence Interval
- When can we measure spread?
- Why do we measure spread?
- Confidence Interval
a) Degree of Freedom
In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. The number of independent ways by which a dynamic system can move, without violating any constraint imposed on it, is called several degrees of freedom. In other words, the number of degrees of freedom can be defined as the minimum number of independent coordinates that can specify the position of the system completely.
Estimates of statistical parameters can be based on different amounts of information or data. The numbers of independent pieces of information that go into the estimate of a parameter are called the degrees of freedom. In general, the degrees of freedom of an estimate of a parameter are equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself (e.g. the sample variance has N − 1 degrees of freedom, since it is computed from N random scores minus the only 1 parameter estimated as intermediate step, which is the sample mean).
Mathematically, degrees of freedom is the number of dimensions of the domain of a random vector, or essentially the number of "free" components (how many components need to be known before the vector is fully determined). The term is most often used in the context of linear models (linear regression, analysis of variance), where certain random vectors are constrained to lie in linear subspaces, and the number of degrees of freedom is the dimension of the subspace. The degrees of freedom are also commonly associated with the squared lengths (or "sum of squares" of the coordinates) of such vectors, and the parameters of chi-squared and other distributions that arise in associated statistical testing problems.
While introductory textbooks may
introduce degrees of freedom as distribution parameters or through hypothesis
testing, it is the underlying geometry that defines degrees of freedom and is
critical to a proper understanding of the concept. Walker (1940) has stated
this succinctly as "the number of observations minus the number of
necessary relations among these observations."
b) Spread of Scores
Measures of spread describe how
similar or varied the set of observed values are for a particular variable
(data item). Measures of spread include the range, quartiles, interquartile
range, variance, and standard deviation.
When can we measure spread?
The spread of the values can be
measured for quantitative data, as the variables are numeric and can be
arranged into a logical order with a low-end value and a high-end value.
Why do we measure spread?
Summarizing the dataset can help us
understand the data, especially when the dataset is large. As discussed in the Measures of Central Tendency
page, the mode, median, and mean summarize the data into a single value that is
typical or representative of all the values in the dataset, but this is only
part of the 'picture' that summarizes a dataset. Measures of spread summarize
the data in a way that shows how the values scattered are and how much they
differ from the mean value.
c) Sample
The data sample may be drawn from a
population without replacement (i.e. no element can be selected more than once
in the same sample), in which case it is a subset of a population; or with
replacement (i.e. an element may appear multiple times in the one sample), in
which case it is a multi subset.
d) Confidence Interval
In statistics, a confidence interval (CI) is a type of interval estimate (of a population parameter) that is computed from the observed data. The confidence level is the frequency (i.e., the proportion) of possible confidence intervals that contain the true value of their corresponding parameter.
In other words, if confidence intervals are constructed using a given confidence level in an infinite number of independent experiments, the proportion of those intervals that contain the true value of the parameter will match the confidence level. Confidence intervals consist of a range of values (interval) that act as good estimates of the unknown population parameter.
However, the interval computed from a particular sample does not necessarily include the true value of the parameter. Since the observed data are random samples from the true population, the confidence interval obtained from the data is also random. If a corresponding hypothesis test is performed, the confidence level is the complement of the level of significance; for example, a 95% confidence interval reflects a significance level of 0.05. If it is hypothesized that a true parameter value is 0 but the 95% confidence interval does not contain 0, then the estimate is significantly different from zero at the 5% significance level.
The desired level of confidence is
set by the researcher (not determined by data). Most commonly, the 95%
confidence level is used. However, other confidence levels can be used, for
example, 90% and 99%. Factors affecting the width of the confidence interval
include the size of the sample, the confidence level, and the variability in
the sample. A larger sample size normally will lead to a better estimate of the
population parameter. Confidence
intervals were introduced to statistics by Jerzy Neyman in a paper published in
1937.
e) Z Score
Simply put, a z-score is the number of standard deviations from the mean of a data point. But more technically it’s a measure of how many standard deviations below or above the population mean a raw score is. A z-score is also known as a standard score and it can be placed on a normal distribution curve. Z-scores range from -3 standard deviations (which would fall to the far left of the normal distribution curve) up to +3 standard deviations (which would fall to the far right of the normal distribution curve). To use a z-score, you need to know the mean μ and also the population standard deviation σ.
Z-scores are a way to compare results
from a test to a “normal” population. Results from tests or surveys have
thousands of possible results and units. However, those results can often seem
meaningless. For example, knowing that someone’s weight is 150 pounds might be good
information, but if you want to compare it to the “average” person’s weight,
looking at a vast table of data can be overwhelming (especially if some weights
are recorded in kilograms). A z-score can tell you where that person’s weight
is compared to the average population’s mean weight.
Related Topics
No comments:
Post a Comment
If you have any question related to children education, teacher education, school administration or any question related to education field do not hesitate asking. I will try my best to answer. Thanks.