Frequency table construction |
3. Frequency table with R |
To prepare the frequency table, we will use the "fdth" package. If
it is not installed, it can be installed either using the menu Tools
-> Install Packages, or entering the command line:
> install.packages("fdth")
Once it is installed, we enable the package:
> library(fdth)
We can now obtain a frequency table with 5 classes with:
> print(fdt(df$Age,5))
where the second argument (5) is the number of intervals that we
want. The result is the following:
Class limits f rf rf(%) cf cf(%)
[9.9,14) 4 0.08 8 4 8
[14,19) 15 0.30 30 19 38
[19,23) 19 0.38 38 38 76
[23,28) 9 0.18 18 47 94
[28,32) 3 0.06 6 50 100
The first column shows us the value intervals, while the second gives
us the absolute frequency. In the third column we find the relative
frequency, while the 4th column shows the relative frequency
(in percentages). Finally, the last column shows the accumulated
absolute frequency and the accumulated relative frequency (in percentages).
You can also specify the lower limit of the first interval (with
"start="), the upper
limit of the last interval (with "end="), and the interval width (with
"h="). For instance to get 5 intervals of width equal to 5, the
command is the following:
> print(fdt(df$Age,start=10,end=35,h=5))
This is the result:
Class limits f rf rf(%) cf cf(%)
[10,15) 4 0.08 8 4 8
[15,20) 19 0.38 38 23 46
[20,25) 19 0.38 38 42 84
[25,30) 7 0.14 14 49 98
[30,35) 1 0.02 2 50 100
A similar command allows us to get a frequency table for a categorical
variable. Suppose that we have data on preferred colors for 30
persons. We can find the data in the following Excel file:
Preferred colors
Enter the data as explained in the previous page.
To get a frequency table for the categorical variable of this data
set, we enter:
> print(fdt_cat(color_eng$Color))
This produces the following freqency table:
Category f rf rf(%) cf cf(%)
Purple 9 0.30 30.00 9 30.00
Red 9 0.30 30.00 18 60.00
Yellow 8 0.27 26.67 26 86.67
Black 3 0.10 10.00 29 96.67
Blue 1 0.03 3.33 30 100.00
As you can see, we get the absolute and relative frequency, as well as
the accumulated values, for the categorical variable. R orders the
categories from largest to lowest absolute frequency. If you want an
alphabetical order of the categories, you can add "sort=FALSE":
> print(fdt_cat(color_eng$Color, sort=FALSE))
and you get:
Category f rf rf(%) cf cf(%)
Black 3 0.10 10.00 3 10.00
Blue 1 0.03 3.33 4 13.33
Purple 9 0.30 30.00 13 43.33
Red 9 0.30 30.00 22 73.33
Yellow 8 0.27 26.67 30 100.00
File translated from
TEX
by
TTH,
version 4.12.