Introduction to R
R is a free software package that is suitable for data analysis and graphical visualization. Although R is good, it is not particularly efficient in handling large data sets. In those cases C/C++ or Python with GPU support are more suitable.
Basic Data Types
R has several basic data types including numeric, character, and logical.
To assign a value to a variable, use the assignment operator <- (Note that = can also be used, but is discouraged).
a <- 3b <- sqrt(a*a + 3)> b[1] 3.464102To make a list we can use the c (combine) function.
a <- c(1, 2, 3, 4, 5)> a[1][1] 1> a[5][1] 5> a[0]numeric(0)We also have some special numbers,
> a[6][1] NA
a[2] <- NA> a[1] 1 NA 3 4 5
> 1/0[1] Inf
> 0/0[1] NaNFor logical values we have TRUE and FALSE.
a <- c(1, 2, 3, 4, 5)> a > 3[1] FALSE FALSE FALSE TRUE TRUE
> sum(a > 3)[1] 2Characters and strings are also supported,
a <- "Hello"> a[1] "Hello"
b <- c("Hello", "World")> b[1][1] "Hello"
> b[2][1] "World"
> b[1] "Hello" "World"Basic Operations and Functions
a <- c(1, 2, 3, 4, 5)
> a[1] 1 2 3 4 5
> a + 5[1] 6 7 8 9 10
> a * 4[1] 4 8 12 16 20
> a / 5[1] 0.2 0.4 0.6 0.8 1.0
> a^2[1] 1 4 9 16 25In R we have something called the recyle rule.
> b <- c(1, 2)> a + b[1] 2 4 4 6 6We of course have basic mathemetical functions,
> sqrt(a)[1] 1.000000 1.414214 1.732051 2.000000 2.236068
> exp(a)[1] 2.718282 7.389056 20.085537 54.598150 148.413159
> log(a)[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379
> exp(log(a))[1] 1 2 3 4 5
a <- c(1, -2, 3, -4)b <- c(-1, 2, -3, 4)
> min(a)[1] -4
> min(a, b)[1] -4
> pmin(a, b)[1] -1 -2 -3 -4
x <- c(1, 3, 2, 10, 5)
> sum(x)[1] 21
> cumsum(x)[1] 1 4 6 16 21
> diff(x)[1] 2 -1 8 -5
> x[1] 1 3 2 10 5
> sort(x)[1] 1 2 3 5 10
> sort(x, decreasing = TRUE)[1] 10 5 3 2 1Matrices
The easiest way to create a matrix is to combine vectors of equal length using cbind(),
x <- c(1, 3, 2, 10, 5)y <- 1:5
m1 <- cbind(x, y)
> m1 x y[1,] 1 1[2,] 3 2[3,] 2 3[4,] 10 4[5,] 5 5
> t(m1) [,1] [,2] [,3] [,4] [,5]x 1 3 2 10 5y 1 2 3 4 5
> dim(m1)[1] 5 2
m2<-matrix(c(1,3,2,5,-1,2,2,3,9), ncol=3, byrow=T)> m2 [,1] [,2] [,3][1,] 1 3 2[2,] 5 -1 2[3,] 2 3 9
> m2[2,3][1] 2
> m2[2, ][1] 5 -1 2
> m2[, 3][1] 2 2 9
> m2[, -1] [,1] [,2][1,] 3 2[2,] -1 2[3,] 3 9
> m2[-1, 2:3] [,1] [,2][1,] -1 2[2,] 3 9
> sum(m2)[1] 26
> apply(m2, 2, sum)[1] 8 5 13
> apply(m2, 1, mean)[1] 2.000000 2.000000 4.666667
m1<-matrix(1:4, ncol=2)m2<-matrix(c(10,20,30,40),ncol=2)
> 2*m1 [,1] [,2][1,] 2 6[2,] 4 8
> m1+m2 [,1] [,2][1,] 11 21[2,] 32 42
> m1*m2 [,1] [,2][1,] 10 60[2,] 90 160
# Note that this is not usual matrix multiplication, for that we use %*%
> m1 %*% t(m2) [,1] [,2][1,] 50 110[2,] 70 150
> solve(m1) # Inverse of m1 [,1] [,2][1,] -2.0 1.0[2,] 1.5 -0.5
> solve(m1)%*% m1 # Check if the inverse is correct [,1] [,2][1,] 1 0[2,] 0 1
> diag(3) [,1] [,2] [,3][1,] 1 0 0[2,] 0 1 0[3,] 0 0 1
> diag(c(2,2,3)) [,1] [,2] [,3][1,] 2 0 0[2,] 0 2 0[3,] 0 0 3
> diag(m1)[1] 1 4Basic Probability
In R we have many built in distributions, such as the normal distribution.
> dnorm(0)[1] 0.3989423
> dnorm(0)*sqrt(2*pi)[1] 1
> dnorm(0, mean=4)[1] 0.0001338302
> dnorm(0, mean=4, sd=10)[1] 0.03682701
v <- c(0, 1, 2)> dnorm(v)[1] 0.3989423 0.2419707 0.05399097We can also use the CDF,
> pnorm(0)[1] 0.5
> pnorm(1)[1] 0.8413447
> pnorm(0, mean=2)[1] 0.02275013
> pnorm(0, mean=2, sd=3)[1] 0.2524925And the quantile function,
> qnorm(0.5)[1] 0We can also sample,
> rnorm(4)[1] 1.2387271 -0.2323259 -1.2003081 -1.6718483And some basic statistics,
> x <- rnorm(100)
> mean(x)[1] 0.07336607
> median(x)[1] 0.07336607
> var(x)[1] 0.978366
> sd(x)[1] 0.9891386
> quantile(x) 0% 25% 50% 75% 100%-2.603073 -0.573073 0.073366 0.678927 2.496073
> summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max.-2.6031 -0.5731 0.0734 0.0734 0.6789 2.4961And for other distributions, for example the Binomial distribution,
> dbinom(0:5, size=5, prob=0.5)[1] 0.03125 0.15625 0.31250 0.31250 0.15625 0.03125Okay, that’s it for now, go and actually do something to learn.