This document is a work in progress, please give us feedback
Functions allows you to perform a specific task in a more powerful and general way. When you need to run the same thing repeatedly, the most efficient and reproducible way is to write a function. Functions will avoid incidental mistakes when you copy and paste code in different parts of your script.
Let’s look at the following example of code which rescales each column of a data frame to a range from 0 to 1
df<-data.frame(
a=rnorm(10),
b=rnorm(10),
c=rnorm(10)
)
df$Rescale_a<-(df$a-min(df$a,na.rm=TRUE))/(max(df$a,na.rm=TRUE)-min(df$a,na.rm=TRUE))
df$Rescale_b<-(df$b-min(df$b,na.rm=TRUE))/(max(df$b,na.rm=TRUE)-min(df$b,na.rm=TRUE))
df$Rescale_c<-(df$c-min(df$c,na.rm=TRUE))/(max(df$c,na.rm=TRUE)-min(df$c,na.rm=TRUE))
df$Rescale_a
## [1] 0.5266659 0.1690370 0.3059742 0.3205623 0.2578741 0.2164492 1.0000000
## [8] 0.2427424 0.5413472 0.0000000
function_name <- function(inputs) {
output_value <- do_something(inputs)
return(output_value)
}
Let’s try our first function called rescale01
rescale01<-function(x){
rescale_value<-(x-min(x, na.rm=TRUE))/(max(x,na.rm=TRUE)-min(x,na.rm=TRUE))
return(rescale_value)
}
rescale01(x=df$a)
## [1] 0.5266659 0.1690370 0.3059742 0.3205623 0.2578741 0.2164492 1.0000000
## [8] 0.2427424 0.5413472 0.0000000
rescale01(df$a)
## [1] 0.5266659 0.1690370 0.3059742 0.3205623 0.2578741 0.2164492 1.0000000
## [8] 0.2427424 0.5413472 0.0000000
df$Rescale_a<-rescale01(df$a)
We can combine functions to do larger tasks in two ways. In this case, we can use the function range()
which gives us the minimum and maximum values directly
rescale01<-function(x){
ran<-range(x, na.rm=TRUE)
rescale_value<-(x-ran[1])/(ran[2]-ran[1])
return(rescale_value)
}
rescale01(df$a)
## [1] 0.5266659 0.1690370 0.3059742 0.3205623 0.2578741 0.2164492 1.0000000
## [8] 0.2427424 0.5413472 0.0000000
df$Rescale_a<-rescale01(df$a)
rescale01
df$a
to the x
argument (or input)rescale_value
objectrescale_value
object back as outputdf$Rescale_a
You can also write the function in a simpler way.
rescale01<-function(x){
(x-min(x, na.rm=TRUE))/(max(x,na.rm=TRUE)-min(x,na.rm=TRUE))
}
volume=length * width * height
Plants | length (in) | width (in) | height (in) |
---|---|---|---|
A | 20.32 | 40.64 | 50.80 |
B | 25.40 | 38.10 | 60.96 |
C | 25.4 | 50.8 | 127 |
Calculate a function that converts inches in centimeters (there are 2.54 cm in one in). Use that function to transform the lenght, width, height plant values.
Create a new function that calculates the volume of a plant in cm^3
using a combination between the functions in 1 and 3.
Conditionals evaluate whether a logical statement is either TRUE
or FALSE
. Conditional statements use operators such as ==
, !=
, <
,>
, <=
, >=
. You can combine these with &
(and) or |
(or). You can also use some predefine functions such as identical()
20>7
## [1] TRUE
"A"=="B"
## [1] FALSE
"A"!="B"
## [1] TRUE
identical("C","D")
## [1] FALSE
Taken from data carpentry[http://www.datacarpentry.org/semester-biology/exercises/Making-choices-choice-operators-R/]
w <- 10.2
x <- 1.3
y <- 2.8
z <- 17.5
dna1 <- "attattaggaccaca"
dna2 <- "attattaggaacaca"
TRUE
or FALSE
x<-sqrt(2)^2
x
## [1] 2
x==2
## [1] FALSE
Floating points are unexpected results from computer arithmetics in numbers that contain very small trailing digits.
# Shows the number in a C-style format
formatC( x, format='f', digits=20)
## [1] "2.00000000000000044409"
To avoid floating-point issues, use the function dplyr::near()
or round the values of the variable
require(dplyr)
near(x, 2)
## [1] TRUE
round(x)==2
## [1] TRUE
if
statements allows you to conditionally execute code
if(condition){
# code executed when condition is TRUE
}else{
# code executed when condition is TRUE
}
Size_tree<-function(x){
if(x>5){
print("Large tree")
}else{
print("Small tree")
}
}
Size_tree(100)
## [1] "Large tree"
Size_tree(3)
## [1] "Small tree"
Size_tree<-function(x){
if(x>5){
print("Large tree")
}else if(x<=5 & x>3 ){
print("Medium tree")
}else{
print("Small tree")
}
}
Size_tree(100)
## [1] "Large tree"
Size_tree(5)
## [1] "Medium tree"
Size_tree(2)
## [1] "Small tree"
in_to_cm<-function(x){
x*2.54
}
PlantVolume_in_m<-function(l, w, h, units="cm"){
if(units=="in") {
new_l<-in_to_cm(l)/100
new_h<-in_to_cm(h)/100
new_w<-in_to_cm(w)/100
volume <- new_l*new_h*new_w
}else if (units=="cm"){
new_l<-l/100
new_h<-h/100
new_w<-w/100
volume <- new_l*new_h*new_w
}else if (units=="m"){
volume <- l*h*w
}else{
message("Use either m, cm or in")
volume<-NULL
}
return(volume)
}
PlantVolume_in_m(0.254,0.2794,0.3048,units="m")
## [1] 0.02163092
PlantVolume_in_m(10,11,12,units="in")
## [1] 0.02163092
PlantVolume_in_m(25.4,27.94,30.48,units="cm")
## [1] 0.02163092
PlantVolume_in_m(25.4,27.94,30.48,units="km")
## Use either m, cm or in
## NULL
Go to this link and complete the assignment from Data carpentry for Biologist website
Hadley Wickham and Garrett Grolemund. 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (1st ed.). O’Reilly Media, Inc.
Data carpentry for Biologist. Functions in R