Overview

Part 1 - Introduction to R

MA3518
Date: September 5, 2024
Last modified: July 10, 2025
5 min read
MA3518

Introduction to R

R is a free software package that is suitable for data analysis and graphical visualization. Although R is good, it is not particularly efficient in handling large data sets. In those cases C/C++ or Python with GPU support are more suitable.

Basic Data Types

R has several basic data types including numeric, character, and logical. To assign a value to a variable, use the assignment operator <- (Note that = can also be used, but is discouraged).

a <- 3
b <- sqrt(a*a + 3)
> b
[1] 3.464102

To make a list we can use the c (combine) function.

a <- c(1, 2, 3, 4, 5)
> a[1]
[1] 1
> a[5]
[1] 5
> a[0]
numeric(0)

We also have some special numbers,

> a[6]
[1] NA
a[2] <- NA
> a
[1] 1 NA 3 4 5
> 1/0
[1] Inf
> 0/0
[1] NaN

For logical values we have TRUE and FALSE.

a <- c(1, 2, 3, 4, 5)
> a > 3
[1] FALSE FALSE FALSE TRUE TRUE
> sum(a > 3)
[1] 2

Characters and strings are also supported,

a <- "Hello"
> a
[1] "Hello"
b <- c("Hello", "World")
> b[1]
[1] "Hello"
> b[2]
[1] "World"
> b
[1] "Hello" "World"

Basic Operations and Functions

a <- c(1, 2, 3, 4, 5)
> a
[1] 1 2 3 4 5
> a + 5
[1] 6 7 8 9 10
> a * 4
[1] 4 8 12 16 20
> a / 5
[1] 0.2 0.4 0.6 0.8 1.0
> a^2
[1] 1 4 9 16 25

In R we have something called the recyle rule.

> b <- c(1, 2)
> a + b
[1] 2 4 4 6 6

We of course have basic mathemetical functions,

> sqrt(a)
[1] 1.000000 1.414214 1.732051 2.000000 2.236068
> exp(a)
[1] 2.718282 7.389056 20.085537 54.598150 148.413159
> log(a)
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379
> exp(log(a))
[1] 1 2 3 4 5
a <- c(1, -2, 3, -4)
b <- c(-1, 2, -3, 4)
> min(a)
[1] -4
> min(a, b)
[1] -4
> pmin(a, b)
[1] -1 -2 -3 -4
x <- c(1, 3, 2, 10, 5)
> sum(x)
[1] 21
> cumsum(x)
[1] 1 4 6 16 21
> diff(x)
[1] 2 -1 8 -5
> x
[1] 1 3 2 10 5
> sort(x)
[1] 1 2 3 5 10
> sort(x, decreasing = TRUE)
[1] 10 5 3 2 1

Matrices

The easiest way to create a matrix is to combine vectors of equal length using cbind(),

x <- c(1, 3, 2, 10, 5)
y <- 1:5
m1 <- cbind(x, y)
> m1
x y
[1,] 1 1
[2,] 3 2
[3,] 2 3
[4,] 10 4
[5,] 5 5
> t(m1)
[,1] [,2] [,3] [,4] [,5]
x 1 3 2 10 5
y 1 2 3 4 5
> dim(m1)
[1] 5 2
m2<-matrix(c(1,3,2,5,-1,2,2,3,9), ncol=3, byrow=T)
> m2
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 5 -1 2
[3,] 2 3 9
> m2[2,3]
[1] 2
> m2[2, ]
[1] 5 -1 2
> m2[, 3]
[1] 2 2 9
> m2[, -1]
[,1] [,2]
[1,] 3 2
[2,] -1 2
[3,] 3 9
> m2[-1, 2:3]
[,1] [,2]
[1,] -1 2
[2,] 3 9
> sum(m2)
[1] 26
> apply(m2, 2, sum)
[1] 8 5 13
> apply(m2, 1, mean)
[1] 2.000000 2.000000 4.666667
m1<-matrix(1:4, ncol=2)
m2<-matrix(c(10,20,30,40),ncol=2)
> 2*m1
[,1] [,2]
[1,] 2 6
[2,] 4 8
> m1+m2
[,1] [,2]
[1,] 11 21
[2,] 32 42
> m1*m2
[,1] [,2]
[1,] 10 60
[2,] 90 160
# Note that this is not usual matrix multiplication, for that we use %*%
> m1 %*% t(m2)
[,1] [,2]
[1,] 50 110
[2,] 70 150
> solve(m1) # Inverse of m1
[,1] [,2]
[1,] -2.0 1.0
[2,] 1.5 -0.5
> solve(m1)%*% m1 # Check if the inverse is correct
[,1] [,2]
[1,] 1 0
[2,] 0 1
> diag(3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
> diag(c(2,2,3))
[,1] [,2] [,3]
[1,] 2 0 0
[2,] 0 2 0
[3,] 0 0 3
> diag(m1)
[1] 1 4

Basic Probability

In R we have many built in distributions, such as the normal distribution.

> dnorm(0)
[1] 0.3989423
> dnorm(0)*sqrt(2*pi)
[1] 1
> dnorm(0, mean=4)
[1] 0.0001338302
> dnorm(0, mean=4, sd=10)
[1] 0.03682701
v <- c(0, 1, 2)
> dnorm(v)
[1] 0.3989423 0.2419707 0.05399097

We can also use the CDF,

> pnorm(0)
[1] 0.5
> pnorm(1)
[1] 0.8413447
> pnorm(0, mean=2)
[1] 0.02275013
> pnorm(0, mean=2, sd=3)
[1] 0.2524925

And the quantile function,

> qnorm(0.5)
[1] 0

We can also sample,

> rnorm(4)
[1] 1.2387271 -0.2323259 -1.2003081 -1.6718483

And some basic statistics,

> x <- rnorm(100)
> mean(x)
[1] 0.07336607
> median(x)
[1] 0.07336607
> var(x)
[1] 0.978366
> sd(x)
[1] 0.9891386
> quantile(x)
0% 25% 50% 75% 100%
-2.603073 -0.573073 0.073366 0.678927 2.496073
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.6031 -0.5731 0.0734 0.0734 0.6789 2.4961

And for other distributions, for example the Binomial distribution,

> dbinom(0:5, size=5, prob=0.5)
[1] 0.03125 0.15625 0.31250 0.31250 0.15625 0.03125

Okay, that’s it for now, go and actually do something to learn.