Biomedical Engineering Theory And Practice/R Language

Data Types

R has a various objects for holding data, including scalars, vectors, matrices, arrays, data frames, and lists.

Scalar and Constant

"Scalar" generally means "one-dimensional"vector. Constants only have one value ever. You can constants is similar to zero-dimensional values (a single point).

Scalar

> x<-3
> y<-6
> z<-x+y
> z
[1] 9

Constant

> 2+3
[1] 5
> 5-4
[1] 1
> 6*4
[1] 24

Vector

Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data. The combine function c() is used to form the vector. Here are examples of each type of vector:

> a<-c(1,2,5,-3,-6,5) #nummeric vector
> b<-c("one","two","three") #character vector
> d<-c(TRUE,FALSE,TRUE,FALSE,TRUE,TRUE) #logical vector
> a[c(2,4)]
[1]  2 -3
> a[4]
[1] -3
> a[2:4]
[1]  2  5 -3

Matrix

A matrix is a two-dimensional array where each element has the same mode (numeric, character, or logical). Matrices are created with the "matrix" function . The general format is as follows:

> mymatrix <- matrix(vector, nrow=number of rows, ncol=number of columns,byrows=logical value, dimnames=list(vector-of-rownames,vector-of-colnames))

> A<-matrix(1:9,nrow=3)
> A
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> A[2,1]
[1] 2

> A<-matrix(1:9,nrow=3,byrow=T)
> A
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Array

> myarray <-array(vector, dimensions, dimnames)

> dim1 <- c("A1","A2","A3")
> dim2 <- c("B1","B2","B3","B4")
> dim3 <- c("C1","C2")
> x<-array(1:24,c(3,4,2),dimnames=list(dim1,dim2,dim3))
> x
, , C1

   B1 B2 B3 B4
A1  1  4  7 10
A2  2  5  8 11
A3  3  6  9 12

, , C2

   B1 B2 B3 B4
A1 13 16 19 22
A2 14 17 20 23
A3 15 18 21 24

Data Frame

>mydata <-data.frame(col1,col2,col3....)

> patientID<-LETTERS[1:4]
> age<-c(24,35,28,52)
> diabetes<-c("Type1","Type2","Type1","Type2")
> stats<-c("Poor","Improved","Poor","Excellent")
> patientDATA<-data.frame(patientID,age,diabetes,stats,row.names=letters[1:4])
> patientDATA
  patientID age diabetes     stats
a         A  24    Type1      Poor
b         B  35    Type2  Improved
c         C  28    Type1      Poor
d         D  52    Type2 Excellent

Factors

> patientID<-LETTERS[1:4]
> age<-c(24,35,28,52)
> diabetes<-c("Type1","Type2","Type1","Type2")
> stats<-c("Poor","Improved","Poor","Excellent")
> status <- factor(stats, order=TRUE)
> patientdata <- data.frame(patientID, age, diabetes, status)
> str(patientdata)
'data.frame':	4 obs. of  4 variables:
 $ patientID: Factor w/ 4 levels "A","B","C","D": 1 2 3 4
 $ age      : num  24 35 28 52
 $ diabetes : Factor w/ 2 levels "Type1","Type2": 1 2 1 2
 $ status   : Ord.factor w/ 3 levels "Excellent"<"Improved"<..: 3 2 3 1
> summary(patientdata)
 patientID      age         diabetes       status 
 A:1       Min.   :24.00   Type1:2   Excellent:1  
 B:1       1st Qu.:27.00   Type2:2   Improved :1  
 C:1       Median :31.50             Poor     :2  
 D:1       Mean   :34.75                          
           3rd Qu.:39.25                          
           Max.   :52.00

Lists

>mylist <- list(name1=object1,name2=object2,...)

> x<-"TheList"
> y<-c(25,19,20)
> z<-matrix(1:10,nrow=2,byrow=TRUE)
> theta<-LETTERS[1:10]
> delta<-c(2+3i,4-6i)
> mylist<-list(title=x,components=y,z,theta,delta)
> mylist
$title
[1] "TheList"

$components
[1] 25 19 20

[[3]]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10

[[4]]
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"

[[5]]
[1] 2+3i 4-6i

> mylist[[2]]
[1] 25 19 20
> mylist[["components"]]
[1] 25 19 20

> lapply(mylist,length)
$title
[1] 1

$components
[1] 3

[[3]]
[1] 10

[[4]]
[1] 10

[[5]]
[1] 2

> lapply(mylist,class)
$title
[1] "character"

$components
[1] "numeric"

[[3]]
[1] "matrix"

[[4]]
[1] "character"

[[5]]
[1] "complex"

> lapply(mylist,mean)
$title
[1] NA

$components
[1] 21.33333

[[3]]
[1] 5.5

[[4]]
[1] NA

[[5]]
[1] 3-1.5i

Warning messages:
1: In mean.default(X[[1L]], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(X[[4L]], ...) :

Basic Functions

Arithmetic Operators

The arithmetic operators and their examples which are used in R programming are listed in the table below.

Function	R Command	Example
Exponentiation, $a^{n}$	> a^b	> 3+6 [1] 9
Multiplication, $a\times b$	> a*b	> 22*5 [1] 110
Division, $a\div b$	> a/b	> 30/3 [1] 10
Addition, $a+b$	> a+b	> 10+9 [1] 19
Subtraction, $a-b$	> a-b	> 10-3 [1] 7
Integer(Quotient)	> a%/%b	> 20%/%3 [1] 6
Modulo(Remainder)	> a%%b	> 20%%3 [1] 2

Complex Number

> x<-5.2-3i

	R command		R command
Complex number	> Re(x) [1] 5.2	Real part	> Im(x) [1] -3
Imaginary part	> Im(x) [1] -3	Modulus	> Mod(x) [1] 6.003332
Argument	> Arg(x) [1] -0.5232783	Conjugate	> Conj(x) [1] 5.2+3i
Membership	> is.complex(x) [1] TRUE	Coercion	> as.complex(19.6) [1] 19.6+0i

Rounding

Function	R Command	Function	R Command
Greatest integer less than	> floor(9.9) [1] 9 > floor(-9.9) [1] -10	Next integer	> ceiling(9.9) [1] 10 > ceiling(-9.9) [1] -9
Rounding function	> round(9.9) [1] 10 > round(9.2) [1] 9	Strip off the decimal	> trunc(8.6) [1] 8 > trunc(-8.6) [1] -8

Trigonometric Functions

Function	Trigometric Function	Trigometric Inverse Function	Hyperbolic Function	Hyperbolic Inverse Function
sine	sin(x)	asin(x)	sinh(x)	asinh(x)
cosine	cos(x)	acos(x)	cosh(x)	acosh(x)
tangent	tan(x)	atan(x)	tanh(x)	atanh(x)

Log and Exponential Functions

Function	R command	R Example
Absolute, $\|x\|$	abs(x)	> abs(-7.4) [1] 7.4
Log to the base e, $\log _{e}(x)$	> log(10) [1] 2.302585
Log to the base 10, $\log _{10}(x)$	log10(x)	> log10(100) [1] 2
Log to the base n of x	log(x,n)	> log(64,4) [1] 3
$e^{x}$	exp(x)	> exp(3) [1] 20.08554
${\sqrt {x}}$	sqrt(x)	> sqrt(25) [1] 5
$n!$	factorial(x)	> factorial(10) [1] 3628800
${\frac {n!}{r!(n-r)!}}$	combinations(n,r)	> choose(5,4) [1] 5

Relational Operators and Logical Variables

Relational Operators

Relational Operator
Equal	==
Not equal	!=
Less than	<
Greater than	>
Less than or equal	<=
Greater than or equal	>=

TRUE=1,FALSE=0

> x<-c(6,3,4)
> y<-c(5,15,9)
> z<-(x<y)
> z
[1] FALSE  TRUE  TRUE
> z<-(x<y)+5
> z
[1] 5 6 6

Logical Operators

$A$	$B$	$!A$	$A$ & $B$	$A\|B$	$xor(A,B)$
False(0)	False(0)	True(1)	False(0)	False(0)	False(0)
False(0)	True(1)	True(1)	False(0)	True(1)	True(1)
True(1)	False(0)	False(0)	False(0)	True(1)	True(1)
True(1)	True(1)	False(0)	True(1)	True(1)	False(0)

> x<-c(6,2,8)
> y<-c(14,6,7)
> z<-c(4,5,11)
> z1<-x>y
> z1
[1] FALSE FALSE  TRUE
> z2<-y>z
> z2
[1]  TRUE  TRUE FALSE
> z3<-(x>y) & (y>z)
> z3
[1] FALSE FALSE FALSE

> z1<-xor(x,y) > z1 [1] FALSE FALSE FALSE

Sequence Generation and Repeats

> x1
[1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0
> x2 <- seq(from=0.4,by=0.01,length=15)
> x2
 [1] 0.40 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50 0.51 0.52 0.53 0.54
> x3<-seq(1.4,2.1,0.3)
> x3
[1] 1.4 1.7 2.0
> x4<-rep(15,7)
> x4
[1] 15 15 15 15 15 15 15
> x5<-rep(1:4,3)
> x5
 [1] 1 2 3 4 1 2 3 4 1 2 3 4
> x6<-rep(1:3,each=2,times=3)
> x6
 [1] 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3
> x7<-rep(c("a","b","c"),c(1,2,3))
> x7
[1] "a" "b" "b" "c" "c" "c"

Random Number Generation

> set.seed(100)
> runif(5)
[1] 0.5465586 0.1702621 0.6249965 0.8821655 0.2803538
> runif(5)
[1] 0.3984879 0.7625511 0.6690217 0.2046122 0.3575249

> x<-c(5,10,8,6,9,11,14,16,18)
> sample(x)
[1]  6 11 18  9  8 16 10  5 14
> sample(x)
[1] 16  9 10  8 18 14 11  6  5
> sample(x,4)
[1] 10 11 14  5

Vector Functions

Length and Statistics

> x<-c(6,9,11,14,12,2,33,76,0,90)

Function	R command	Function	R command
Length	> length(x) [1] 10	Mean	> mean(x) [1] 25.3
Max	> max(x) [1] 90	Min	> min(x) [1] 0
Distribution	> quantile(x) 0% 25% 50% 75% 100% 0.00 6.75 11.50 28.25 90.00	Sort	> sort(x) [1] 0 2 6 9 11 12 14 33 76 90

Function	R command
Reference the 5th element of Vector from the vector	> x[5] [1] 12
Delete the 3rd element of vector from the vector	> x1<-x[-3] > x1 [1] 6 9 14 12 2 33 76 0 90
Delete the last element of vector from the vector	> x2<-x[-length(x)] > x2 [1] 6 9 11 14 12 2 33 76 0
Delete 1st and the last element of vector from the vector	> x3<-x[c(-1,-length(x))] > x3 [1] 9 11 14 12 2 33 76 0
Remove the smallest 2 and the largest 3 element from the vector	> trim <-function(x)sort(x)[-c(1,2,length(x)-2,length(x)-1,length(x))] > trim(x) [1] 6 9 11 12 14

	R code
Sum	> sum(x) [1] 253
Mean,Median	> mean(x) [1] 25.3	> median(x) [1] 11.5
Range	> range(x) [1] 0 90
Standard Deviation,variance	> sd(x) [1] 31.87841	> var(x) [1] 1016.233
Which is the largest and smallest number	> which(x==max(x)) [1] 10	> which(x==min(x)) [1] 9
sort and reverse sort	> sort(x) [1] 0 2 6 9 11 12 14 33 76 90	> rev(sort(x)) [1] 90 76 33 14 12 11 9 6 2 0

> x<-matrix(rpois(15,1.2),nrow=3)
> x
     [,1] [,2] [,3] [,4] [,5]
[1,]    2    1    0    3    3
[2,]    0    2    3    1    2
[3,]    2    2    0    1    1
> mean(x[,5])
[1] 2
> var(x[3,])
[1] 0.7
> rowSums(x)
[1] 9 8 6
> colSums(x)
[1] 4 5 3 5 6
> rowMeans(x)
[1] 1.8 1.6 1.2
> colMeans(x)
[1] 1.333333 1.666667 1.000000 1.666667 2.000000

Parallel min and max

> x<-c(2,5,10,-6,29,45)
> y<-c(5,9,15,-22,38,88)
> z<-c(9,10,2,7,55,24)
> q<-c(22,3,5,6,-23,88)
> pmin(x,y,z,q)
[1]   2   3   2 -22 -23  24
> pmax(x,y,z,q)
[1] 22 10 15  7 55 88

'table' and 'tapply'

> data(ChickWeight)
weight Time Chick Diet
1     42    0     1    1
2     51    2     1    1
3     59    4     1    1
4     64    6     1    1
5     76    8     1    1
6     93   10     1    1
.......................
576    234   18    50    4
577    264   20    50    4
578    264   21    50    4
> tapply(ChickWeight$weight,ChickWeight$Time,mean)
        0         2         4         6         8        10        12        14 
 41.06000  49.22000  59.95918  74.30612  91.24490 107.83673 129.24490 143.81250 
       16        18        20        21 
168.08511 190.19149 209.71739 218.68889 
> tapply(ChickWeight$weight,ChickWeight$Diet,median)
    1     2     3     4 
 88.0 104.5 125.5 129.5

> codon1=c("UUU","UUC","UUA","UUG","UUA","UUG","UUC")
> table(codon1)
codon1
UUA UUC UUG UUU 
  2   2   2   1 
> aminoacid=list(Phe=c("UUU","UUC"),Leu=c("UUA","UUG"))
> codon=as.factor(codon1)
> levels(codon)=aminoacid
> codon
[1] Phe Phe Leu Leu Leu Leu Phe
Levels: Phe Leu
> table(codon)
codon
Phe Leu 
  3   4

'apply'

> x<-matrix(1:15,nrow=3,byrow=T)
> x
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
> apply(x,1,sum)
[1] 15 40 65
> apply(x,2,sum)
[1] 18 21 24 27 30
> apply(x,1,sqrt)
         [,1]     [,2]     [,3]
[1,] 1.000000 2.449490 3.316625
[2,] 1.414214 2.645751 3.464102
[3,] 1.732051 2.828427 3.605551
[4,] 2.000000 3.000000 3.741657
[5,] 2.236068 3.162278 3.872983
> apply(x,2,sqrt)
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 1.000000 1.414214 1.732051 2.000000 2.236068
[2,] 2.449490 2.645751 2.828427 3.000000 3.162278
[3,] 3.316625 3.464102 3.605551 3.741657 3.872983

Closets

> x<-c(3,22,15,11,50,85)
> x-10
[1] -7 12  5  1 40 75
> abs(x-10)
[1]  7 12  5  1 40 75
> min(abs(x-10))
[1] 1
> which(abs(x-10)==min(abs(x-10)))
[1] 4

Sort,Rank,Order

> x<-c(2,5,10,-6,29,45)
> # rank: the rank of unsorted vector
> rank(x)
[1] 2 3 4 1 5 6
> # order:the rank of the sorted vector
> order(x)
[1] 4 1 2 3 5 6

Unique and Duplicated

> x<-c("a","b","c","a","a","a","b","c")
> table(x)
x
a b c 
4 2 2 
> unique(x)
[1] "a" "b" "c"
> duplicated(x)
[1] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
> x[!duplicated(x)]
[1] "a" "b" "c"

Run length

> x<-rpois(20,0.5)
> x
 [1] 2 0 0 1 0 1 0 0 1 1 0 0 2 0 0 0 0 0 0 0
> rle(x)
Run Length Encoding
  lengths: int [1:10] 1 2 1 1 1 2 2 2 1 7
  values : int [1:10] 2 0 1 0 1 0 1 0 2 0

Set functions

> setA <-c("I","II","III","IV","V")
> setB <-c("III","IV","V","VI")
> union(setA,setB)
[1] "I"   "II"  "III" "IV"  "V"   "VI" 
> intersect(setA,setB)
[1] "III" "IV"  "V"  
> setdiff(setA,setB)
[1] "I"  "II"
> setdiff(setB,setA)
[1] "VI"