This guide is intended to support the data analysis work that is an integral part of the Measurement and Evaluation Course. It is essential to acquire a firm grasp of the basics (descriptive statistics) since they will be used throughout the course for a wide array of analytical purposes.
Following presentation of ways to modify data, information specific to various statistics is provdided. The information presented in each section provides both context (when to use) and menu paths within PASW to follow to execute various analyses.
Descriptive Statistics Item Analysis Data Transformations, Recode Reliability Objectivity Graphs
"Through and through the world is infested with quantity:
To talk sense is to talk quantities. It is no use saying the nation is large
- how large?
It is no use saying that radium is scarce - how scarce? You
cannot evade quantity.
You may fly to poetry and music, and quantity and
number will face you in your rhythms and your octaves."
When you first open SPSS you will notice that on the bottom of the screen are two tabs. One is the data view the other is the variable view.
Data View. From the data view you enter your data. Each horizontal line is for data pertaining to an individual. Each vertical column is for data pertaining to a variable.
Variable View: From the variable view, you provide information pertaining to each variable in your data set. This will include providing:
Variable name: Should be a short descriptive name - no spaces permitted
Variable label: A longer descriptive phrase to describe the variable.
Value labels: For categorical data (e.g. gender) where the numbers represent categories, the values column is where you specify which category each number represents. For example Male = 0; Female = 1.
Frequently due to the nature of the group that measures have been obtained from, analyses on a subset of the entire group are of interest. When this is the case you first identify the subset (select cases) then proceed with the analysis.
Some of the analyses to be conducted may need to be repeated on all groups that make up a variable (e.g. gender: males/females). For example you may want to look at the correlation between exercise frequency and cholesterol level for men then for women. You could of course use the procedure above first for the males then repeat for females. However, the split file feature lets you do the two analyses at the same time.
Regardless of the nature of the variable, it is often useful to condense information before reporting it. For example: Assume you collected information on years of education in 5 categories (< High School, High School, some college, Bachelors degree, > Masters degree) but only wanted to report the proportion of people with no college work and those with at least some college work. You would not want to manipulate the original variable so you would first create a new variable then recode the new variable.
In situations where you have component information and you need for example a total for each individual, a new variable needs to be created. This is easily done within the transform menu.
To obtain a listing of all variable information (e.g. labels, names) contained in the variable view:
Notice that the information produced in the output file is essentially the same as that in the variable view. The information will be displayed in two parts: the Variable Information and the Variable Values.
Summarizing group information is typically the first step in the search for patterns, highlights, and meaning in a data set. Summary information can be presented both visually with the use of graphs and in the form of summary statistics. This section will focus on:
| Level of Measurement | Applicable Statistics |
| Nominal/Categorical | Percentages, Mode |
| Ordinal | Percentages, Mode, Median* |
| Interval | Mean, Median, Mode, Standard Deviation, Range |
| Ratio | Mean, Median, Mode, Standard Deviation, Range |
It is very important to conduct analyses approiate to the level of measurement of your data.
For categorical and ordinal data the construction of frequency distribution tables is an excellent way to summarize group information.
If you were to make a frequency distribution table by hand you would simply list each category/value observed followed by a count (also called absolute frequency) of the number of individuals in that category. An additional column called the relative frequency is often useful since it notes the percentage of the group in a particular category. For example:
| Gender | f | rf | ||
| Male | 28 | 48% | ||
| Female | 30 | 52% |
f: absolute frequency - count
rf: relative frequency - count/N (100) - record as %
To get a frequency distribution table for all cases in the data file:
To get a frequency distribution table for a subset of cases in the data file:
With subgroup now selected:
Remember to go back through data menu to reselect all cases before starting analyses where all cases are needed.
| Note: You would not construct frequency distribution tables for continuous data when the intent is to summarize information. The reason is that such data can take on a great number of values and since each value is listed in a frequency distribution table little summary may accomplished. Measures of Central Tendency and Variability are much more useful in summarizing group information for continuous data. |
Following entry of data into the PASW spreadsheet it is important
to check for errors. For example, consider the variable GENDER with value
labels of 1 for male and 2 for female. It is reasonable to assume that a typing
error could result in entries of other than a 1 or 2. One way to detect this
error is to have PASW produce a frequency distribution table for this variable.
It might look like this:
| Gender | frequency | ||
| Male | 35 | ||
| Female | 41 | ||
| 3 | 6 | ||
| 6 | 2 |
This table makes it clear that 8 of the entries are erroneous. For six subjects the value 3 was entered for gender and for another two subjects the value 6 was entered. With the errors detected, you would use the search feature in PASW to find these data entry errors and correct them.
To get a frequency distribution table for all variables and all cases in the data file:
When data entry errors located, but you cannot correct them then in the variable view of the data identify that number as
a missing value so PASW does not use it in any analyses. If you identify values that appear incorrect but only for select cases, then enter a blank in place of the value you deem inappropriate in the spreadsheet view of the data. For example, consider the situation where you have obtained two heart rates. One resting and the other one minute after jogging in place. If for one of the cases the two values were 128 and 128 that seems likely to be an error since the resting heart rate is quite high and the exercise heart rate is unlikely to be the same as the resting heart rate. If you don't have access to the original data so you can re-enter the correct values then you need to make these values missing. But since 128 may be a legitimate value for some other cases you can't just assign it a missing value. You need to go into the spreadsheet, find this case and delete each 128 leaving blank cells for these two variables for this particular case.
Note: Constructing frequency distribution tables for every variable for the purpose of error checking is important to complete prior to initiating any analytical work. |
For categorical and ordinal data the construction of crosstabulation tables is an excellent way to cross-reference summary information for two or more variables.
If you were to make a crosstabulation table by hand you would in rows list each category/value of one variable and in columns list each category/value of a second variable. The table then would contain a count of the number of individuals in cells representing the various combinations of values for the two variables. For example, you might want to combine in one table gender (categorical) and age group (ordinal).
| Age Group | ||||
| 20-25 | 26-30 | 31-35 | ||
| Male | 28 | 20 | 15 | |
| Gender | ||||
| Female | 30 | 18 | 20 |
From this table you can see that 28 of the subjects were male and in the youngest age group, and 18 of the subjects were female and in the middle age group.
| Note: You would not construct crosstabulation tables for continuous data when the intent is to summarize information. The reason is that such data can take on a great number of values and each value would be listed in a crosstabulation table. Therefore little summary may be accomplished. Measures of Central Tendency and Variability are much more useful in summarizing group information for continuous variables. |
Measures of central tendency summarize data by identifying where the center of a distribution of scores is. Measures of variability summarize data by quantifying the spread or dispersion of scores around the center.
For categorical and ordinal data with few categories, the Mode (though not an optimal measure) is an acceptable measure of central tendency and the range is an appropriate measure of variability. Frequently however, such data is best summarized with a frequency distribution table.
For data at least interval scaled, the Median and Mean are appropriate measures of central tendency. If the distribution of scores is skewed the Median is the best measure of central tendency. The most common measure of variability is the standard deviation and is appropriate for use with data at least interval scaled.
In addition to being used to summarize a data set, measures of central tendency and variability are critical compoenents of other statistical procedures.
Using the frequencies option in PASW:
If working with interval or ratio data and the data is normally distributed you can obtain the mean and standard deviation from the descriptives option in PASW:
REMEMBER, you must check the shape (obtain histogram under graphs option) of the distribution of scores to decide what measure of central tendency is appropriate. If the shape is skewed then you need to obtain a median.
To get measures of central tendency and variability for continous measures on subgroups of your sample,
REMEMBER, you must check the shape (obtain histograms under graphs option) of the distribution of scores for each group to decide what measure of central tendency is appropriate. If the shape is skewed for either group then you need to obtain medians.
Z scores are a type of standardized score. Their particular feature is that they have a mean of zero and standard deviation of one. Standard scores tell you how many standard deviation units above or below the mean a value falls.
Useful for conveying relative information about an individual are percentiles (raw score) and percentile ranks (percentage of a group scoring below a particular value). Approximate percentile ranks are readily available from PASW in the frequency distribution table output (in cumulative freqency column). Specific percentiles can be requested under the statistics option under frequency distribution tables.
There are several types of correlation coefficients to choose from. The choice is based on the nature of the data being correlated.
| Pearson Product Moment Correlation | Use when both variables have continuous data |
| Phi | Use when both variables have dichotomous data |
| Kendall's Tau | Use when both variables have ordinal data |
| Point Biserial Correlation | Use when one variable has continuous data and the other a true dichotomy |
The PPMC can be used to describe the strength and direction of the linear relationship between two continuous variables. When two variables are not linearly related, the PPMC is likely to underestimate the true strength of the relationship. A graph of the x and y values can show whether or not the relationship is linear.
Kendall's Tau can be used to describe the strength and direction of the relationship between two ordinal variables. It is a rank-order correlation coefficient (as is PPMC) and can convey the extent to which pairs of values (x,y) are in the same rank order.
The Point Biserial Correlation can be used to describe the strength of the relationship between one continuous variables and one dichotomous variable. The point biserial correlation coefficient is useful in detecting a pattern in group measures (e.g one group's scores tending to be higher than another group). The sign carries little meaning. It only indicates which group tended to have higher scores. The point biserial coefficient is a signed number between -1 and 1 where zero represents no relationship.
The computational formula for the point biserial coefficient is
Where:
X0 = mean of x values for those in category 0
X1 = mean of the x values for those in category 1
Sx = standard deviation of all x values
P0 = proportion of people in category 0
P1 = proportion of people in category 1
To obtain the components you need from PASW so you can do Point Biserial by hand, you would:
Graphs are the visual counterparts to descriptive statistics and are very powerful mechanisms for revealing patterns in a data set. In addition, when used appropriately in a report they can highlight trends and summarize pertinent information in a way no amount of text could.
When summarizing categorical data, pie or bar charts are the most efficient and easy to interpret though line graphs may be more helpful particularly at times when trying to draw attention to trends in the data. For continuous data, histograms are a good choice, easily constructed and simple to interpret. When attempting to represent visually the relationship between two continuous variables a scattergram can be used.
To create simple bar, or pie charts for categorical and ordinal (with few categories) data:
To create a scattergram (two continous variables)
To create a histogram (continuous variable) you can work from the frequencies option
To create histograms (continuous variable) for subsets of a group:
Following administration of an exam comprised of multiple choice items, statistical examination of the quality of the items with respect to difficulty and ability to discriminate among ability levels can be done using the correlation statistic.
Of interest is what proportion of the group got the item correct. While PASW does not provide this information directly, provided you have labeled correct a one and incorrect as zero the proportion can be easilty obtained.
Depending on the type and purpose of a test, criterion-related validity of can be examined from one or more of several perspectives. The two situations covered in this class are:
This is examined when you are interested in the extent to which a particular measure is as good as an already established criterion known to provide valid and reliable data. You determine this by correlating your scoress (x is continuous) with scores or classifications from a criterion measure (y).
The process would entail:
This is examined when you are interested in the extent to which a particular measure is a good predictor of another variable. You determine this by correlating your scoress (x is continuous) with scores or classifications from the measure you are trying to predict (y).
Depending on the type and purpose of a test, criterion-related validity of can be examined from one or more of several perspectives. The two situations covered in this class are:
The concurrent validity of classifications is examined when you are interested in the extent to which classifications (master/non master) are correct. You determine this by correlating your classifications (x) with classifications or scores from a criterion measure (y).
This is examined when you are interested in the extent to which classifications are good predictors of another set of classifications or scores. You determine this by correlating your classifications (x) with classifications or scores from a criterion measure (y).
The primary concern here is the accuracy of measures. Reducing sources of measurement error is the key to enhancing the reliability of the data.
Reliability is typically assessed in one of two ways:
To estimate reliability you need 2 or more scores (or classifications) per person.
| Note: When interpreting coefficient alpha or the intraclass R, a value > .70 reflects good reliability. |
If cognitive and motor skills/physiological measures collected at one time only, the most common way of getting 2 scores per person is to split the measures in half - usually by odd/even or first half/second half by time or trials.
You now have the reliability of scores for the 1/2 length test. To get reliability for the full length test, use the spearman brown prophecy formula:

If every individual can be measured twice on the variable you're interested in then you readily have data from which reliability can be examined.
Once you have 2 scores per person the question is how consistent overall were the scores.
In many situations reliability has been estimated incorrectly using the Pearson correlation coefficient. This is not appropriate since (1) the PPMC is meant to show the relationship between two different variables - not two measures of the same variable, and (2) the PPMC is not sensitive to fluctuations in test scores. The PPMC is an interclass coefficient; what is needed is an intraclass coefficient. The most commonly used reliability coefficients are the intraclass R calculated from values in an analysis of variance table and coefficient alpha.
In this instance you are interested in the consistency of classifications from a mastery test. The two statistics of interest are the proportion of agreement (compute by hand from values in a crosstabulation table) and Kappa.
In motor skill performance settings it is often necessary to collect measures through observation. To examine the objectivity of these measures you look at the consistency of measures across observers (inter-rater consistency). Note: you may also video tape a group and have one person record measures on two occasions (intra-rater consistency).
To assess objectivity, your task, since the measures come from observations, is to examine the objectivity of the measures produced by observers using a rating scale. To do this, have two people observe one group of examinees and evaluate their performance using a rating scale. The measures from the two observers (you could also videotape the group and have one person evaluate the group twice) give you two scores per person to use in the coefficient alpha or intraclass R formulas. The Spearman-Brown formula is not needed in this situation since test length is not manipulated.
| Note: When interpreting coefficient alpha or the intraclass R, a value > .70 reflects good objectivity. |
In this instance you are interested in the consistency of classifications from a mastery test. The two statistics of interest are the proportion of agreement (compute by hand from values in a crosstabulation table) and Kappa.
The data you work with can either be scores that are converted to classifications based on a cut score or direct classifications from observers. To assess objectivity, your task, since the classifications come from observations, is to examine the objectivity of the classifications produced by observers using a rating scale or checklist. To do this, have two people observe one group of examinees and evaluate their performance using a rating scale or checklist. The classifications from the two observers (you could also videotape the group and have one person evaluate the group twice) give you two classifications per person to use in the proportion of agreement and Kappa statistics. The Spearman-Brown formula is not needed in this situation since test length is not manipulated.