This post describes how to use the R *dplyr* package to calculate percentages. A data set from the U.S Census Bureau was used. Three tests check that the calculation using *dplyr* was accurate. The code incorporate the pipe operator `%>%`

that was introduced into R in 2014 via the magrittr package.

The general question is: *What is the percentage a measure value associated with an instance of a sub group of data represents of the sum total of the measure values of all instances within the same sub group?* This general question statement is accurate but not easy to understand. An example in SQL makes it clear. This statement computes the percentage of an individualâ€™s salary over the total salary within his department:

`Select depname, empno, salary, salary/sum(salary) over (partition by depname) from empsalary`

Itâ€™s the same as a windows function in SQL.

### Calculation in dplyr

Here’s the code:

### Census data set

The data set is publicly available; the URL is listed in the `read.table()`

on line 10 of the gist. I selected this data set because it has characteristics that made it useful; it contains data that can be grouped, it contains a calculation that is already available in the data set that I was able to compare my results to. Additionally, I didn’t want *Puerto Rico* in the results so was able to use the `filter()`

feature to remove *Puerto Rico* prior to executing the calculation. The aim was to make sure I knew how to use *dplyr*.

### Results

Here’s a screen shot of the results for region 4, the West:

The field I calculated was the *pct18Plus*, the right-most column in above table. (This is a screen shot of the result and the column header isn’t included.)

The result for *Wyoming* is interesting. The percentage calculated is so small because the population of Wyoming is small compared to population of California, not because the 18+ population in Wyoming is a small percentage of Wyoming’s population.

### Key takeaway

The R *dplyr* package just works. It’s easier than using base R to complete the same tasks. It’s not arcane. It’s elegant to read and use, and it’s decreased my development time.