dplyr Package in R Programming - GeeksforGeeks (2024)

Last Updated : 20 Dec, 2023

Improve

In this article, we will discuss Aggregating and analyzing data with dplyr package in the R Programming Language.

dplyr Package in R

The dplyr package inR Programming Languageis a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles.

By limiting the choices the focus can now be more on data manipulation difficulties.
There are uncomplicated “verbs”, functions present for tackling every common data manipulation and the thoughts can be translated into code faster.
There are valuable backends and hence waiting time for the computer is reduced.

Here are some key functions and concepts within the dplyr package in R.

Data Frame and Tibble

Data frames in dplyr in R is organized tables where each column stores specific types of information, like names, ages, or scores.for creating a data frame involves specifying column names and their respective values.

R

df <- data.frame(

Name = c("vipul", "jayesh", "anurag"),

Age = c(25, 23, 22),

Score = c(95, 89, 78)

)

df

Output:

 Name Age Score
1 vipul 25 95
2 jayesh 23 89
3 anurag 22 78

On the other hand, tibbles, introduced through the tibble package, share similar functionality but offer enhanced user-friendly features. The syntax for creating a tibble is comparable to that of a data frame.

Pipes (`%>%`)

dplyr in R The pipe operator (%>%) in dplyr package, which allows us to chain multiple operations together, improving code readability.

R

# Load necessary libraries

library(dplyr)

# Example: Chain operations using the pipe operator

result <- mtcars %>%

filter(mpg > 20) %>% # Filter rows where mpg is greater than 20

select(mpg, cyl, hp) %>% # Select specific columns

group_by(cyl) %>% # Group the data by the 'cyl' variable

summarise(mean_hp = mean(hp)) # Calculate the mean horsepower for each group

# Display the result

print(result)

Output:

 cyl mean_hp
 <dbl> <dbl>
1 4 82.6
2 6 110

Verb Functions

dplyr in R provides various important functions that can be used for Data Manipulation. These are:

filter() Function

For choosing cases and using their values as a base for doing so.

R

# Create a data frame with missing data

d <- data.frame(name = c("Abhi", "Bhavesh", "Chaman", "Dimri"),

age = c(7, 5, 9, 16),

ht = c(46, NA, NA, 69),

school = c("yes", "yes", "no", "no"))

# Display the data frame

print(d)

# Finding rows with NA value

rows_with_na <- d %>% filter(is.na(ht))

print(rows_with_na)

# Finding rows with no NA value

rows_without_na <- d %>% filter(!is.na(ht))

print(rows_without_na)

arrange():

For reordering of the cases.

R

# Create a data frame with missing data

d <- data.frame( name = c("Abhi", "Bhavesh", "Chaman", "Dimri"),

age = c(7, 5, 9, 16),

ht = c(46, NA, NA, 69),

school = c("yes", "yes", "no", "no") )

d

# Arranging name according to the age

d.name<- arrange(d, age)

print(d.name)

Output:

 name age ht school
1 Abhi 7 46 yes
2 Bhavesh 5 NA yes
3 Chaman 9 NA no
4 Dimri 16 69 no
 
Arranging name according to the age
 name age ht school
1 Bhavesh 5 NA yes
2 Abhi 7 46 yes
3 Chaman 9 NA no
4 Dimri 16 69 no

select() and rename():

For choosing variables and using their names as a base for doing so.

R

# Create a data frame with missing data

d <- data.frame(name=c("Abhi", "Bhavesh",

mutate() and transmute():

Addition of new variables which are the functions of prevailing variables.

R

# Create a data frame with missing data

d <- data.frame( name = c("Abhi", "Bhavesh",

"Chaman", "Dimri"),

age = c(7, 5, 9, 16),

ht = c(46, NA, NA, 69),

school = c("yes", "yes", "no", "no") )

# Calculating a variable x3 which is sum of height

# and age printing with ht and age

mutate(d, x3 = ht + age)

# Calculating a variable x3 which is sum of height

# and age printing without ht and age

transmute(d, x3 = ht + age)

Output:

 name age ht school
1 Abhi 7 46 yes
2 Bhavesh 5 NA yes
3 Chaman 9 NA no
4 Dimri 16 69 no
Calculating a variable x3 which is sum of height
 
 name age ht school x3
1 Abhi 7 46 yes 53
2 Bhavesh 5 NA yes NA
3 Chaman 9 NA no NA
4 Dimri 16 69 no 85
Calculating a variable x3 which is sum of height 
 x3
1 53
2 NA
3 NA
4 85

summarise():

Condensing various values to one value.

R

# Create a data frame with missing data

d <- data.frame( name = c("Abhi", "Bhavesh",

"Chaman", "Dimri"),

age = c(7, 5, 9, 16),

ht = c(46, NA, NA, 69),

school = c("yes", "yes", "no", "no") )

# Calculating mean of age

summarise(d, mean = mean(age))

# Calculating min of age

summarise(d, med = min(age))

# Calculating max of age

summarise(d, med = max(age))

# Calculating median of age

summarise(d, med = median(age))

Output:

Calculating mean of age
 mean
1 9.25
Calculating minimum age
 med
1 5
Calculating max of age
 med
1 16
Calculating median of age
 med
1 8

sample_n() and sample_frac():

For taking random specimens.

R

# Create a data frame with missing data

d <- data.frame( name = c("Abhi", "Bhavesh",

"Chaman", "Dimri"),

age = c(7, 5, 9, 16),

ht = c(46, NA, NA, 69),

school = c("yes", "yes", "no", "no") )

# Printing three rows

sample_n(d, 3)

# Printing 50 % of the rows

sample_frac(d, 0.50)

Output:

 name age ht school
1 Chaman 9 NA no
2 Dimri 16 69 no
3 Abhi 7 46 yes
 Printing 50 % of the rows
 name age ht school
1 Abhi 7 46 yes
2 Dimri 16 69 no

geeksforgeeks user

Improve

Data visualization with R and ggplot2

Grid and Lattice Packages in R Programming

Please Login to comment...

dplyr Package in R Programming - GeeksforGeeks (2024)

FAQs

What is the purpose of dplyr package in R? ›

The dplyr package makes these steps fast and easy: By constraining your options, it helps you think about your data manipulation challenges. It provides simple “verbs”, functions that correspond to the most common data manipulation tasks, to help you translate your thoughts into code.

Explore More ›

Why is the dplyr package very useful in big data analysis? ›

The dplyr package provides a concise set of operations for managing data frames. With these functions we can do a number of complex operations in just a few lines of code. In particular, we can often conduct the beginnings of an exploratory analysis with the powerful combination of group_by() and summarize() .

Get More Info ›

Which are 5 of the most commonly used dplyr functions? ›

We're going to learn some of the most common dplyr functions: select() , filter() , mutate() , group_by() , and summarize() .

Keep Reading ›

What are the R packages for Geeksforgeeks? ›

Packages in R Programming language are a set of R functions, compiled code, and sample data. These are stored under a directory called “library” within the R environment. By default, R installs a group of packages during installation. Once we start the R console, only the default packages are available by default.

Read The Full Story ›

What does %>% mean in dplyr? ›

The pipe operator (%>%) forces R to read functions left to right instead of right to left. It pipes, or transfers, output from the first function to the input of a second function. In the following code, we will invoke the select function, then invoke arrange. mtcars %>% select(cyl, mpg) %>% arrange (cyl, mpg)

Get More Info Here ›

What is the difference between dplyr and tidyverse? ›

dplyr: A package for data manipulation that uses a consistent and intuitive syntax that makes data manipulation tasks more straightforward. tidyr: A package for data tidying that helps you transform data between different formats, such as converting wide data to long format or vice versa.

Find Out More ›

Is a data table better than dplyr? ›

While dplyr has very flexible and intuitive syntax, data. table can be orders of magnitude faster in some scenarios. One of those scenarios is when performing operations over a very large number of groups.

What is the difference between SQL and dplyr? ›

SQL and dplyr both are industry standards and are used in industry and academia equally. In SQL SELECT is a clause used to select the columns' subset and the dplyr has select(dataset, col01, col02, ...) verb used for the same task, similarly WHERE clause and filter(dataset, col01 > val1, ...)

Learn More ›

What is the dplyr function select used for? ›

The select() function of dplyr package is used to choose which columns of a data frame you would like to work with. It takes column names as arguments and creates a new data frame using the selected columns. select() can be combined with others functions such as filter() .

Read On ›

Why is it called dplyr? ›

d is for data. frame , plyr as in a set of pliers to manipulate things with. dplyr is a data. frame specific set of tools like plyr .

View Details ›

What is the use of arrange () with dplyr package? ›

dplyr Package – arrange()

The arrange() function is used to reorder rows of a data frame according to one of the variables. Reordering rows of a data frame (while preserving corresponding order of other columns) is normally a pain to do in R.

dplyr Package in R Programming - GeeksforGeeks (2024)

dplyr Package in R

Data Frame and Tibble

R

Pipes (%>%)

R

Verb Functions

filter() Function

R

arrange():

R

select() and rename():

R

mutate() and transmute():

R

summarise():

R

sample_n() and sample_frac():

R

Please Login to comment...

FAQs

What is the purpose of dplyr package in R? ›

What is the use of arrange () with dplyr package? ›

References

Pipes (`%>%`)