日期:2020-11-09 11:28


SEMESTER 2, 2018

Campus: City


Statistical Computing

(Time allowed: THREE Hours)


• Attempt ALL questions.

• Total marks are 100.

• Calculators are permitted.

• R Quick Reference is available in Attachment.

Part I: Programming

For questions in Part I, avoid using explicit loops or anything equivalent as much as

possible, unless the question asks to use them.

1. Write down the evaluation results of the following R expressions.

(a) 1:3 + c(T, F, T, T, F, T)

[2 marks]

(b) 2^1:5/10

[2 marks]

(c) matrix(1:10, 2, 5, b = TRUE)

[2 marks]

(d) {x = c(-0.5, 1, 0.8, 1.5); pmax(0, pmin(1, x))}

[2 marks]

(e) {x = 7; repeat {print(x); x = x + 1; if (x > 9) break}}

[2 marks]

(f) levels(factor(c("foo", "boo", "loo")))[2]

[2 marks]

(g) substring("Statistical computing", 9:12)

[2 marks]

(h) {x = c(5, 1, 4, -3, 2); ifelse(x > 3, mean(x), median(x))}

[2 marks]

[16 marks]

2. Use :, seq(), rep() and some other commonly used arithmetic operators/functions

to create the sequences given below.

Note: Do not use c() or any loop to create the sequences.

(a) 3 6 9 12 15

[2 marks]

(b) 10.000 5.000 2.500 1.250 0.625

[2 marks]

(c) 123423453456

[2 marks]

[6 marks]

5. (a) Figure 1 shows 7 squares filled with 7 distinct colours.

Note: For black and white printing purposes of this paper 7 shades of gray

are used instead.

1 2 3 4 5 6 7

Figure 1: Colored squares.

Write R code to reproduce Figure 1. You need to

• display the number at the center of each square,

• generate the colors using hcl() ranging from purple-ish for the leftmost

square and red-ish for the rightmost square.

Hint: You may have to set the aspect ratio of y- and x-axis to visualize the

rectangles as squares. The coordinates used in your code only need to be

roughly similar to Figure 1.

[8 marks]

Figure 2: A layout to display 5 plots.

[4 marks]

[12 marks]

Part II: Data Technology

6. Write down the evaluation results of the following R expressions:

(a) > text = "good and bad"

> gregexpr("d", text)

[5 marks]

(b) > df1 = data.frame(n = c("a", "b", "c"), x = 1:3)

> df2 = data.frame(n = c("a", "c"), y = c(6, 9))

> merge(df1, df2)

[5 marks]

(c) > breakText = function(t) {

strsplit(t, " ")[[1]]


> text = c("roses are red", "and so are you")

> lapply(text, breakText)

[5 marks]

(d) > text1 = "John (fishing, hunting), Paul (hiking, biking),"

> text2 = "Carol, Smith (fishing, swimming)"

> text = paste(text1, text2)

> newtext = gsub("(\\(.[^)]*?,) ", "\\1]", text)

> newtext = strsplit(newtext, ", ")[[1]]

> regmatches(newtext, regexpr(",]", newtext)) = ", "

> newtext

[5 marks]

(e) > text = c("Smith,(919)319-1677", "Ali, 800-899-2164",

"Richard, 7042982145")

> patt1 = "\\([2-9]\\d\\d\\) ?[2-9]\\d\\d-\\d\\d\\d\\d"

> patt2 = "[2-9]\\d\\d-[2-9]\\d\\d-\\d\\d\\d\\d"

> patt = paste0(c("(", patt1, ")|", "(", patt2, ")"), collapse = "")

> data = do.call("rbind", strsplit(text, ","))

> data = as.data.frame(data)

> colnames(data) = c("name", "phone")

> pos = grepl(patt, data[, 2])

> data$valid = ifelse(pos, "valid", "invalid")

> data

[5 marks]

[25 marks]

7. Suppose data is a data frame and has 3 columns. It contains information of the

population of administrative divisions of 198 countries. Column “population”

contains the number of people in the given region. The first 6 rows of this data set

are shown below. Write R code that, for each country, extracts regions with the

population size greater than the average population size of all regions within the

country. To do so you may follow these steps:

• For each country calculate the average population size of the administrative


• Merge the results to data.

• Select the appropriate subset.

[10 marks]

> head(data)

admin_region country population

1 Badakhshan Afghanistan 805500

2 Badghis Afghanistan 420400

3 Berat Albania 193855

4 Adrar Algeria 311615

5 Dibre Albania 191035

6 Baghlan Afghanistan 762500

8. Suppose Sales is a data frame in R, which stores the number of total sales for a

company in di↵erent months and days of the week during 1990 – 2017. The first

column is the year, the second column indicates the month of sale, and columns

Mon to Fri contain the number of total sales in each day of the week.

(a) Write R code which uses the function melt() in Library reshape2 to reshape

Sales to a long form. Assign the result to the symbol SalesLong. The first 6

rows of SalesLong are shown on next page.

[5 marks]

(b) Write R code which uses SalesLong to create a data frame which contains

the total number of Sales per month. Assign the result to the symbol result.

The first 6 rows of result are shown on next page.

[5 marks]

(c) Write an R expression which extracts the row from result with minimum


[5 marks]

[15 marks]

> head(Sales)

Year Month Mon Tue Wed Thu Fri

1 2017 January 23 20 14 29 25

2 2017 February 20 15 13 28 29

3 2017 March 26 21 15 30 30

4 2017 April 24 23 14 32 33

5 2017 May 23 25 16 27 27

6 2017 June 26 19 13 26 23

> head(SalesLong)

Year Month variable value

1 2017 January Mon 23

2 2017 February Mon 20

3 2017 March Mon 26

4 2017 April Mon 24

5 2017 May Mon 23

6 2017 June Mon 26

> head(result)

Month value

1 January 2953

2 February 3050

3 March 2883

4 April 3062

5 May 3147

6 June 2995

Basic Data Representation

TRUE, FALSE logical true and false

1, 2.5, 117.333 simple numbers

1.23e20 scientific notation, 1.23 ⇥ 1020.

3+4i complex numbers

"hello, world" a character string

NA missing value (in any type of vector)

NULL missing value indicator in lists

NaN not a number

Inf positive infinity

-Inf negative infinity

"var" quotation for special variable name (e.g. +, %*%, etc.)

Creating Vectors

c(a1,...,an) combine into a vector

logical(n) logical vector of length n (containing falses)

numeric(n) numeric vector of length n (containing zeros)

complex(n) complex vector of length n (containing zeros)

character(n) character vector of length n (containing empty strings)

Creating Lists

list(e1,...,ek) combine as a list

vector(k, "list") create a list of length k (the elements are all NULL)

Basic Vector and List Properties

length(x) the number of elements in x

mode(x) the mode or type of x

Tests for Types

is.logical(x) true for logical vectors

is.numeric(x) true for numeric vectors

is.complex(x) true for complex vectors

is.character(x) true for character vectors

is.list(x) true for lists

is.vector(x) true for both lists and vectors

Tests for Special Values

is.na(x) true for elements which are NA or NaN

is.nan(x) true for elements which are NaN

is.null(x) tests whether x is NULL

is.finite(x) true for finite elements (i.e. not NA, NaN, Inf or -Inf)

is.infinite(x) true for elements equal to Inf or -Inf

Explicit Type Coercion

as.logical(x) coerces to a logical vector

as.numeric(x) coerces to a numeric vector

as.complex(x) coerces to a complex vector

as.character(x) coerces to a character vector

as.list(x) coerces to a list

as.vector(x) coerces to a vector (lists remain lists)

unlist(x) converts a list to a vector

Vector and List Names

c(n1=e1,...,nk=ek) combine as a named vector

list(n1=e1,...,nk=ek) combine as a named list

names(x) extract the names of x

names(x) = v (re)set the names of x to v

names(x) = NULL remove the names from x

Vector Subsetting

x[1:5] select elements by index

x[-(1:5)] exclude elements by index

x[c(TRUE, FALSE)] select elements corresponding to TRUE

x[c("a", "b")] select elements by name

List Subsetting

x[1:5] extract a sublist of the list x

x[-(1:5)] extract a sublist by excluding elements

x[c(TRUE, FALSE)] extract a sublist with logical subscripts

x[c("a", "b")] extract a sublist by name

Extracting Elements from Lists

x[[2]] extract an element of the list x

x[["a"]] extract the element with name "a" from x

x$a extract the element with name name "a" from x

Logical Selection

ifelse(cond, yes, no) conditionally select elements from yes and no

which(v) returns the indices of TRUE values in v

List Manipulation

lapply(X, FUN, ...) apply FUN to the elements of X

split(x, f) split x using the factor f

Sequences and Repetition

a:b sequence from a to b in steps of size 1

seq(n) same as 1:n

seq(a,b) same as a:b

seq(a,b,by=s) a to b in steps of size s

seq(a,b,length=n) sequence of length n from a to b

seq(along=x) like 1:length(n), but works when x has zero length

rep(x,n) x, repeated n times

rep(x,v) elements of x with x[i] repeated v[i] times

rep(x,each=n) elements of x, each repreated n times

Sorting and Ordering

sort(x) sort into ascending order

sort(x, decreasing=TRUE) sort into descending order

rev(x) reverse the elements in x

order(x) get the ordering permutation for x

Basic Arithmetic Operations

x+y addition, “x plus y”

x-y subtraction, “x minus y”

x*y multiplication, “x times y”

x/y division, “x divided by y”

x^y exponentiation, “x raised to power y”

x %% y remainder, “x modulo y”

x %/% y integer division, “x divided by y, discard fractional part”


round(x) round to nearest integer

round(x,d) round x to d decimal places

signif(x,d) round x to d significant digits

floor(x) round down to next lowest integer

ceiling(x) round up to next highest integer

Common Mathematical Functions

abs(x) absolute values

sqrt(x) square root

exp(x) exponential functiopn

log(x) natural logarithms (base e)

log10(x) common logarithms (base 10)

log2(x) base 2 logarithms

log(x,base=b) base b logarithms

Trigonometric and Hyperbolic Functions

sin(x), cos(x), tan(x) trigonometric functions

asin(x), acos(x), atan(x) inverse trigonometric functions

atan2(x,y) arc tangent with two arguments

sinh(x), cosh(x), tanh(x) hyperbolic functions

asinh(x), acosh(x), atanh(x) inverse hyperbolic functions


choose(n, k) binomial coecients

lchoose(n, k) log binomial coecients

factorial(x) factorials

lfactorial(x) log factorials

Special Mathematical Functions

beta(x,y) the beta function

lbeta(x,y) the log beta function

gamma(x) the gamma function

lgamma(x) the log gamma function

psigamma(x,deriv=0) the psigamma function

digamma(x) the digamma function

trigamma(x) the trigamma function

Bessel Functions

besselI(x,nu) Bessel Functions of the first kind

besselK(x,nu) Bessel Functions of the second kind

besselJ(x,nu) modified Bessel Functions of the first kind

besselY(x,nu) modified Bessel Functions of the third kind

Special Floating-Point Values

.Machine$double.xmax largest floating point value (1.797693 ⇥ 10308)

.Machine$double.xmin smallest floating point value (2.225074 ⇥ 10308)

.Machine$double.eps machine epsilon (2.220446 ⇥ 1016)

Basic Summaries

sum(x1,x2,...) sum of values in arguments

prod(x1,x2,...) product of values in arguments

min(x1,x2,...) minimum of values in arguments

max(x1,x2,...) maximum of values in arguments

range(x1,x2,...) range (minimum and maximum)

Cumulative Summaries

cumsum(x) cumulative sum

cumprod(x) cumulative product

cummin(x) cumulative minimum

cummax(x) cumulative maximum

Parallel Summaries

pmin(x1,x2,...) parallel minimum

pmax(x1,x2,...) parallel maximum

Statistical Summaries

mean(x) mean of elements

sd(x) standard deviation of elements

var(x) variance of elements

median(x) median of elements

quantile(x) median, quartiles and extremes

quantile(x, p) specified quantiles

Uniform Distribution

runif(n) vector of n Uniform[0,1] random numbers

runif(n,a,b) vector of n Uniform[a,b] random numbers

punif(x,a,b) distribution function of Uniform[a,b]

qunif(x,a,b) inverse distribution function of Uniform[a,b]

dunif(x,a,b) density function of Uniform[a,b]

Binomial Distribution

rbinom(n,size,prob) a vector of n Bin(size,prob) random numbers

pbinom(x,size,prob) Bin(size,prob) distribution function

qbinom(x,size,prob) Bin(size,prob) inverse distribution function

dbinom(x,size,prob) Bin(size,prob) density function

Normal Distribution

rnorm(n) a vector of n N(0, 1) random numbers

pnorm(x) N(0, 1) distribution function

qnorm(x) N(0, 1) inverse distribution function

dnorm(x) N(0, 1) density function

rnorm(n,mean,sd) a vector of n normal random numbers with given mean and s.d.

pnorm(x,mean,sd) normal distribution function with given mean and s.d.

qnorm(x,mean,sd) normal inverse distribution function with given mean and s.d.

dnorm(x,mean,sd) normal density function with given mean and s.d.

Chi-Squared Distribution

rchisq(n,df) a vector of n 2

random numbers with degrees of freedom df

pchisq(x,df) 2

distribution function with degrees of freedom df

qchisq(x,df) 2

inverse distribution function with degrees of freedom df

dchisq(x,df) 2

density function with degrees of freedom df

t Distribution

rt(n,df) a vector of n t random numbers with degrees of freedom df

pt(x,df) t distribution function with degrees of freedom df

qt(x,df) t inverse distribution function with degrees of freedom df

dt(x,df) t density function with degrees of freedom df

F Distribution

rf(n,df1,df2) a vector of n F random numbers with degrees of freedom df1 & df2

pf(x,df1,df2) F distribution function with degrees of freedom df1 & df2

qf(x,df1,df2) F inverse distribution function with degrees of freedom df1 & df2

df(x,df1,df2) F density function with degrees of freedom df1 & df2

matrix(x, nr=r, nc=c) create a matrix from x (column major order)

matrix(x, nr=r, nc=c, create a matrix from x (row major order)


Matrix Dimensions

nrow(x) number of rows in x

ncol(x) number of columns in x

dim(x) vector coltaining nrow(x) and ncol(x)

Row and Column Indices

row(x) matrix of row indices for matrix x

col(x) matrix of column indices for matrix x

Naming Rows and Columns

rownames(x) get the row names of x

rownames(x) = v set the row names of x to v

colnames(x) get the column names of x

colnames(x) = v set the column names of x to v

dimnames(x) get both row and column names (in a list)

dimnames(x) = list(rn,cn) set both row and column names

Binding Rows and Columns

rbind(v1,v2,...) assemble a matrix from rows

cbind(v1,v2,...) assemble a matrix from columns

rbind(n1=v1,n2=v2,...) assemble by rows, specifying row names

cbind(n2=v1,n2=v2,...) assemble by columns, specifying column names

Matrix Subsets

x[i,j] submatrix, rows and columns specified by i and j

x[i,j] = v reset a submatrix, rows and columns specified by i and j

x[i,] submatrix, contains just the rows a specified by i

x[i,] = v reset specified rows of a matrix

x[,j] submatrix, contains just the columns specified by j

x[,j] = v reset specified columns of a matrix

x[i] subset as a vector

x[i] = v reset elements (treated as a vector operation)

Matrix Diagonals

diag(A) extract the diagonal of the matrix A

diag(v) diagonal matrix with elements in the vector v

diag(n) the n⇥n identity matrix

Applying Summaries over Rows and Columns

apply(X,1,fun) apply fun to the rows of X

apply(X,2,fun) apply fun to the columns of X

Basic Matrix Manipulation

t(A) matrix transpose

A %*% B matrix product

outer(u, v) outer product of vectors

outer(u, v, f) generalised outer product

Linear Equations

solve(A, b) solve a system of linear equations

solve(A, B) same, with multiple right-hand sides

solve(A) invert the square matrix A

Matrix Decompositions

chol(A) the Choleski decomposition

qr(A) the QR decomposition

svd(A) the singular-value decomposition

eigen(A) eigenvalues and eigenvectors

Least-Squares Fitting

lsfit(X,y) least-squares fit with carriers X and response y

Factors and Ordered Factors

factor(x) create a factor from the values in x

factor(x,levels=l) create a factor with the given level set

ordered(x) create an ordered factor with the given level set

is.factor(x) true for factors and ordered factors

is.ordered(x) true for ordered factors

levels(x) the levels of a factor or ordered factor

levels(x) = v reset the levels of a factor or ordered factor

Tabulation and Cross-Tabulation

table(x) tabulate the values in x

table(f1,f2,...) cross tabulation of factors

Summary over Factor Levels

tapply(x,f,fun) apply summary fun to x broken down by f

tapply(x,list(f1,f2,...),fun) apply summary fun to x broken down by several factors

Data Frames

data.frame(n1=x1,n2=x2,...) create a data frame

row.names(df) extract the observation names from a data frame

row.names(df) = v (re)set the observation names of a data frame

names(df) extract the variable names from a data frame

names(df) = v (re)set the variable names of a data frame

Subsetting and Transforming Data Frames

df[i,j] matrix subsetting of a data frame

df[i,j] = dfv reset a subset of a data frame

subset(df,subset=i) subset of the cases of a data frame

subset(df,select=i) subset of the variables of a data frame

subset(df,subset=i,select=j) subset of the cases and variables of a data frame

transform(df,n1=e1,n2=e2,...) transform variables in a data frame

merge(df1,df2,...) merge data frames based on common variables

Reading Lines

readline(prompt="") read a line of input

readLines(file, n) read n lines from the specified file

readLines(file) read all lines from the specified file

Reading Vectors and Lists

scan(file, what = numeric()) read a vector or list from a file

Formatting and Printing

format(x) format a vector in a common format

sprintf(fmt, ...) formatted printing of R objects

cat(...) concatenate and print vectors

print(x) print an R object

Reading Data Frames

read.table(file, header=FALSE) read a data frame from a file

read.csv(file, header=FALSE) read a data frame from a csv file

Options for read.table and read.csv

header=true/false does first line contain variable names?

row.names=··· row names specification

col.names=··· variable names specification

na.strings="NA" entries indicating NA values

colClasses=NA the types associated with columns

nrows=··· the number of rows to be read

Writing Data Frames

write.table(x, file) write a data frame to a file

write.csv(x, file) write a data frame to a csv file

String Handling

paste(..., sep = " ", collapse = NULL) paste strings together

strsplit(x, split) split x on pattern split (returns a list)

grep(pattern, x) return subscripts of matching elements

grep(pattern, x, value = TRUE) return matching elements

sub(pattern, replacement, x) replace pattern with given replacement

gsub(pattern, replacement, x) globally replace

High-Level Graphics

plot(x, y) scatter plot

plot(x, y, type = "l") line plot

plot(x, y, type = "n") empty plot

Adding to Plots

abline(a, b) line in intercept/slope form

abline(h = yvals) horizontal lines

abline(v = xvals) vertical lines

points(x, y) add points

lines(x, y) add connected polyline

segments(x0, y0, x1, y1) add disconnected line segments

arrows(x0, y0, x1, y1, code) add arrows

rect(x0, y0, x1, y1, col) add rectangles filled with colours

polygon(x, y) a polygon(s)

Low-Level Graphics

plot.new() start a new plot/figure/panel

plot.window(xlim, ylim, ...) set up plot coordinates

Options to plot.window

xaxs="i" don’t expand x range by 8%

yaxs="i" don’t expand y range by 8%

asp=1 equal-scale x and y axes

Graphical Parameters

par(... ) set/get graphical parameters

Useful Graphical Parameters

mfrow = c(m,n) set up an m by n array of figures, filled by row

mfcol = c(m,n) set up an m by n array of figures, filled by column

mar=c(m1,m2,m3,m4) set the plot margins (in lines)

mai=c(m1,m2,m3,m4) set the plot margins (in inches)

cex=m set the basic font magnification to m

bg=col set the device background to col

Measuring Text Size

strwidth(x, "inches", font, cex) widths of text strings in inches

strheight(x, "inches", font, cex) heights of text strings in inches


layout(mat,heights,widths) set up a layout

layout.show(n) show layout elements (up to n)

lcm(x) size specification in cm

Compound Expressions

{ expr1, ... , exprn} compound expressions


if (cond) expr1 else expr1 conditional execution

if (cond) expr conditional execution, no alternative


for (var in vector) expr for loops

while (cond) expr while loops

repeat expr infinite repetition

continue jump to end of enclosing loop

break break out of enclosing loop

Function Definition

function(args) expr function definition

var function argument with no default

var=expr function argument with default value

return(expr) return the given value from a function

missing(a) true if argument a was not supplied

Error Handling

stop(message) terminate a computation with an error message

warning(message) issue a warning message

on.exit(expr) save an expression for execution on function return

Language Computation

quote(expr) returns the expression expr unevaluated

substitute(arg) returns the expression passed as argument arg

substitute(expr,subs) make the specified substitutions in the given expression

approx(x, y, xout) linear interpolation at xout using x and y

spline(x, y, xout) spline interpolation at xout using x and y

approxfun(x, y, xout) interpolating linear function for x and y

splinefun(x, y, xout) interpolating spline for x and y

Root-Finding and Optimisation

polyroot(coef) roots of polynomial with coecients

in coef

uniroot(f,interval) find a root of the function f in the given interval

optimize(f,interval) find an extreme of the function f in the given interval

optim(x,f) find an extreme of the function f starting at the point x

nlm(f,x) an alternative to optim

nlminb(x,f) optimization subject to constraints


integrate(x,lower,upper) integrate the function f from lower to upper

