******this
website is currently under construction******
STATISTICAL
CONSULTING PROGRAM
The
goal of this website is to provide you with many useful facts on
various kinds of DATA TYPES known in statistics.
Here are four major
data types whose more detailed descriptions can be found below:
Quantitative
Discrete
Continuous
Categorical
The
descriptions are provided below. You will see that the data types
are described in the same order as you see them listed above---starting
with the quantitative and ending with the categorical data type.
Quantitative data (metric or continuous) is often referred
to as the measurable data. This type of data allows statisticians to perform
various arithmetic operations, such as addition and multiplication, to
find parameters of a population like mean or variance. The observations
represent counts or measurements, and thus all values are numerical.
Each observation represents a characteristic of the individuals in a population
or a sample.
Example: A set
containing annual salaries of all your family members, measured to
the nearest thousand, contains quantitative data. Take, for instance,
family X. Here is a possible data set for this family: mother $25,000,
father $30,000, myself $35,000, my wife $32,000, uncle Joe $20,000,
etc.
According
to the New World Dictionary of the American Language, the definition
of "discrete" is the following: separate and distinct; not attached to
others; unrelated; made up of distinct parts;
discontinuous.
Statistically speaking, discrete data result from either
a finite or a countable infinity of possible options for the values present
in a given discrete data set. The values of this data type can constitute
a sequence of isolated or separated points on the real number line. Each
observation of this data type can therefore take on a value from a discrete
list of options.
The discrete data type usually represents a count of
something. Some examples of this type include the number of cars
per family, a student's height, the number of times a person yawns during
a day, a number of defective light bulbs on a production line, and a number
of tosses of a coin before a head appears (which process could be infinite
in length).
Here are three kinds of discrete data:
-
1. Discrete numerical
data is discrete data that consists of
numerical measurements or counts. A set that consists
of discrete numerical data contains numbers. It is all quantitative data,
which allows us to find population or sample parameters like mean, variance,
and others. Discrete numerical data sets do not always consist of
whole-numbers (or integers), but they may also take on the values of fractions
and decimals.
Example: A set containing
the heights of students in your high school graduating class, rounded to
the nearest inch, represents such data type. It is very important
to remember here to round the numbers up. If we do not, and we accept
measurements such as 62.896 in., 63.277... in., 67.8435... in., we will
be dealing with continuous data type, described below. With
discrete data type, there is a countable number of observations involved.
For example, a set containing possible students' heights will consist of
integers starting at 0 in and ending at perhaps 84 in, unless there are
students over 7 feet tall, which is highly unlikely. Integers are countable
and that is what makes this set discrete.
Also, a number of Farmland Dairies ultra-pasteurized
whole milk bottles in different stores can result in values like 0, or
1, or 2, or 3, and so on, and that would also be considered discrete numerical
data. We can count all possible values. Scroll down for a variation to
this example used to describe the continuous data type.
-
2. Discrete ordinal datais
discrete data that may be arranged in some order or succession, but differences
between values either cannot be determined or are meaningless.
There are only relative comparisons made about the differences between
the ordinal levels.
Example: Consider the following
statement: "In a group of twenty workers, five are "best," ten are "good,"
and five "need improvement." Although there are obvious differences
between each category (best, good, and need improvement), and we can arrange
them in order of worse to best or vice versa, there is not much more we
can do to compare them. We do not know how much better is "best"
from "good" or "good" from "need improvement." In this case, we could have
also used numbers instead of words, ex. 1 for best, 2 for good, and 3 for
need improvement, and the data type would still be ordinal. The numbers
still lack any computational significance.
-
3. Discrete qualitative
(nominal) data is discrete data that cannot
be arranged in any order. It can be represented by numbers, letters, words,
and other forms of notation or symbolism, but there are no ranking differences
to be determined. Each category or group will certainly be different
from the others, but it will be equally significant. This data type
will only consist of names, labels, or categories.
Example: Gender, political
parties, or religions are just some of many qualitative sets that exist
around us. Take, for example, these statements: 1. "In a group of
twenty workers, there are ten women and ten men," or 2. "In a group of
twenty workers, there are five Republicans and four Democrats, and 1 Independent."
The categories such as women and men, or republicans, democrats,
and independents, can be talked about, described, and even criticized,
but not officially ranked. There are no accepted schemes to put these
categories in any meaningful order.
According
to the New World Dictionary of the American Language, the definition
of "continuous" is the following:
going on or extending without interruption or break; unbroken; connected;
points whose value at each point is approached
by its values at neighboring points.
Continuous quantitative data result from infinitely
many possible values that the observations in a set can take on.
The term "infinitely," however, does not refer to the "countable" term
we have seen with discrete data types. Continuous data types
involve the uncountable or non-denumerable kind of infinity, which is frequently
referred to as the number of points on a number line (or an interval on
a number line). In other words, the observations of this data type can
be associated with points on a number line, where any observation can take
on any real-number value within a certain range or interval.
Example: Temperature
readings are one example of such data set. Each reading can take
on any real number value on a thermometer. If we agree that during
a particular day the temperatures between 10am and 6pm will be somewhere
between 32 and 100 degrees Fahrenheit, the truth is that these temperatures
could take on any value in that range. For example, consider the
following possible temperature readings given in degrees Fahrenheit: 90.333...,
75.324, 40.23..., 85, or 65 multiplied by Pi (or 65 multiplied by 3.1415...).
Another example will be a different approach to the
Farmland Dairies ultra-pasteurized whole milk bottle example used with
a description of the discrete numerical data. If, instead of measuring
the number of bottles in different stores, we measure the amount of milk
in each one half gallon bottle in different stores, those values could,
for instance, be 0.498 gallon, or 0.5025 gallon, or any value in between.
The observed values will be represented by real-line values, and there
is an uncountable number of possibilities for that to occur.
Categorical data, also called qualitative or nominal,
result from placing individuals into groups or categories.
The values of a categorical variable are labels for the categories.
We have described both ordinal and qualitative categorical data types above.
1. Discrete ordinal data
--described
above
2. Discrete qualitative
data--described above
Should you have a comment or would like to contact
us for statistical consulting, you may e-mail us at scp@stat.montclair.edu
or go to our SCP
main page.
Go to the
Dept. of Science and
Mathematics Web Page.