Basic Introduction to R

Hans W Borchers
ABB Corporate Research, Ladenburg
January 19, 2016

Computing With R

Datatypes in R are numeric (Float64): 10.0, 10, 1e01,
complex: 1 + 1i
Boolean: TRUE, FALSE,
and character, enclosed in ' or ".

Variable names can include alphanumeric characters, _, or . (sometimes with special meaning).
The assignment operator is <-, but also = can be used in most situations.
# denotes an end-to-line comment.

Arithmetic Operators: +, -, *, /, ^

Comparison Operators: ==, !=, >, >=, <, <=

Logical Operators: !, &, | (Syntax: &&, ||)

Mathematical Functions:
All common mathematical functions are available in R: abs, sqrt, sin, ..., exp, log, ..., round, etc.

In [1]:
0/0
1 + NA
sqrt(-1+0i)
Out[1]:
NaN
Out[1]:
[1] NA
Out[1]:
[1] 0+1i

NaN means "Not a Number". More important in R is NA for "Not Available", used for missing data.

Using the Workspace

In [2]:
e = exp(1)
ls()
ls.str()

getwd()
#setwd()
Out[2]:
'e'
Out[2]:
e :  num 2.72
Out[2]:
'/Users/hwb/Documents/meetup/RinLadenburg/notebooks'

Characters

In [3]:
cat("The answer is:", 42, "\n")
paste("Ich", "bin", "ein", "Berliner")
c('z', 'Z') %in% letters  # or Letters
The answer is: 42 
Out[3]:
'Ich bin ein Berliner'
Out[3]:
  1. TRUE
  2. FALSE
In [4]:
substr("Ich bin ein Berliner", 9, 11)
strsplit("Ich bin ein Berliner", ' ')
Out[4]:
'ein'
Out[4]:
    1. 'Ich'
    2. 'bin'
    3. 'ein'
    4. 'Berliner'

R integrates the PCRE library for regular expressions.

In [5]:
res <- gregexpr("\\w+", "Eine Rose ist eine Rose ist eine Rose.")
str(res)
List of 1
 $ : atomic [1:8] 1 6 11 15 20 25 29 34
  ..- attr(*, "match.length")= int [1:8] 4 4 3 4 4 3 4 4
  ..- attr(*, "useBytes")= logi TRUE

Vectors and Matrices

Generating vectors and matrices
Vectors and matrices are in R defined with the help of the 'concatenate' function c(),
or by using the 'colon' operator: 1:5 (or using the seq function).
Matrices will be generated with, e.g., the matrix function.

In [6]:
b = c(1, 2, 3, 4, 5, 6)
A = matrix(runif(36), nrow=6, byrow=TRUE)
A
(x = qr.solve(A, b))
A %*% x
Out[6]:
0.69402340.72283980.24232020.37660330.60955280.7730926
0.8039694040.3907267550.0058294820.1238375010.6889954430.927701381
0.62645960.11719350.40702210.50086440.27622110.5603193
0.716036760.055422680.342264080.014083160.025514730.39958570
0.34795280.60122840.37031290.41435630.66447540.6812745
0.01724550.67182680.95034810.96500540.44447000.4217130
Out[6]:
  1. 12.9030124608126
  2. -11.0184779518448
  3. 20.3019166563725
  4. -10.0486491125022
  5. 37.4855482348609
  6. -31.0118489278822
Out[6]:
1
2
3
4
5
6

Operations and Functions
Vectors or matrices(of equal length resp. size) can be added, multiplied, taken to the power: + - * / ^.
These operations work elementwise (not as matrix operations). Elements are cyclically filled when more elements are needed!

%*% is matrix multiplication.
Functions on arrays: length, sum, prod, sort, mean, median, ...

'Yoga' of Indexing
x[i] the i-th element of a vector x (indexing starts with 1, not 0 !).
A[i,j] the element in row i and column j of matrix A. A[i,] the i-th row, A[,j] the j-th column of matrix A.

Lists

Lists are collections of objects of different data types, known as its components.

In [7]:
L <- list(a=c(1,2,3,4,5), b=TRUE, c=c('a','b','c'), d=diag(1,3))
L
names(L)
L[1:2]
L[[1]][1]
Out[7]:
$a
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
$b
TRUE
$c
  1. 'a'
  2. 'b'
  3. 'c'
$d
100
010
001
Out[7]:
  1. 'a'
  2. 'b'
  3. 'c'
  4. 'd'
Out[7]:
$a
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
$b
TRUE
Out[7]:
1
In [8]:
L$d
L[["d"]]
Out[8]:
100
010
001
Out[8]:
100
010
001
In [9]:
L[[1]][2]
Out[9]:
2
In [10]:
lapply(L, length)
Out[10]:
$a
5
$b
1
$c
3
$d
9

Dataframes are lists where all components have the same length.

In [11]:
dframe <- data.frame(a=0, b=1:26, c=letters, d=LETTERS)
In [16]:
str(dframe)
'data.frame':	26 obs. of  4 variables:
 $ a: num  0 0 0 0 0 0 0 0 0 0 ...
 $ b: int  1 2 3 4 5 6 7 8 9 10 ...
 $ c: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ d: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...

Factors

Factors are categorical variables with a finite number of different values, called levels.

In [13]:
gender <- factor(c('m', 'f', 'm', 'm', 'f', 'm', 'f', 'f'))
gender
Out[13]:
  1. m
  2. f
  3. m
  4. m
  5. f
  6. m
  7. f
  8. f
In [14]:
levels(gender)
nlevels(gender)
Out[14]:
  1. 'f'
  2. 'm'
Out[14]:
2
In [15]:
as.numeric(gender)
Out[15]:
  1. 2
  2. 1
  3. 2
  4. 2
  5. 1
  6. 2
  7. 1
  8. 1