STAT 341: Assignment 5

58 Marks

NOTES

Your assignment must be submitted by the due date listed at the top of this document, and it must be

submitted electronically in .pdf format via Crowdmark/LEARN. This means that your responses for different

questions should be in separate .pdf files. Your .pdf solution files must have been generated by R Markdown

unless otherwise specified. Additionally:

• For mathematical questions: your solutions must be produced by LaTeX (from within R Markdown).

Handwritten and scanned/photographed solutions will not be accepted and you will receive zero points.

• For computational questions: R code should always be included in your solution (via code chunks in R

Markdown). If code is required and you provide none, you will receive zero points.

– Exception any functions used in the function glossary can loaded using echo=FALSE but any

other code chunks should have echo=TRUE. e.g. the code chuck loading gradientDescent can use

echo=FALSE but chunks that call gradientDescent should have echo=TRUE.

• For interpretation question: plain text (within R Markdown) is fine.

Organization and comprehensibility is part of a full solution. Consequently, points will be deducted for

solutions that are not organized and incomprehensible.

• You will submit your solutions in the form of one pdf file per question through LEARN For example,

for Q1 you should submit one pdf file containing the solution to the first question only. Failing to follow

the formatting instructions may result in your whole paper or individual questions receiving a grade of

0%.

Question 1 – 32 Marks

For this question you will need the digit data from file “digitData.csv”. Use the sample below from the Digit

data to answer part c).

digitSample <- c(294,133,95,265,154,1,289,232,121,99,129,83,30,56,249,134,46,68,165,279,105,91,248,285,21

And you will need Rcode for Pearson’s second skewness coefficient (median skewness) given by

3 × [y − medianP (y)] /SDP (y)

sdn <- function( z ) {

N = length(z)

sd(z)*sqrt( (N-1)/N )

}

skew <- function(z) { 3*(mean(z) – median(z))/sdn(z) }

• A commonly used transformation when y > 0 is the family of power transformations which is indexed

by a power α. Define this transformed variable to be

Tα(y) =

y

α α > 0

log(y) α = 0

−(y

α) α < 0

powerfun <- function(x, alpha) {

if(sum(x <= 0) > 1) stop(“x must be positive”)

if (alpha == 0)

log(x)

else if (alpha > 0) {

x^alpha

} else -x^alpha

}

• We can define the attribute α implicitly such that

3 ×

tα − medianP (tα)

/SDP (tα) = 0

i.e. the value of the power transformation such that the transformed variable has zero sknewness.

• Note

– This questions is related to sample exercises question 1.12 An Implicitly defined Skewness Attribute,

it might be helpful to review that question.

a) Using the brightness variable;

i) [2 Marks] Construct a histogram.

ii) [1 Mark] Calclate mean and Pearson’s second skewness coefficient.

iii) [2 Marks] If we apply the power transformation using α as the power we can change the skewness.

Using the uniroot function find the value of α which makes the skewness of the power-transformed

variable equal to zero.

iv) [3 Marks] Using the value of α from part (iii), calculate the skewness on the power-transformed

variable and construct a histogram of the power-transformed variable.

v) [2 Marks] Write a function named attr3 that takes in a population or sample of variates and

outputs the mean, skewness and the value of α which makes the skewness of the power-transformed

variable equal to zero. Apply the brightness variable to this function.

b) [5 Marks] Sampling Distribution of the attributes

• Select M = 1000 samples of size n = 50 without replacement. i.e. construct S1, S2, . . . , S1000.

• For each sample apply the attr3 function. Then construct three histograms (in a single row) of

the sample error for each attribute.

c) A Sample and the Bootstrap. Using the given sample (obtained by sampling without replacement) and

the variable brightness.

i) [1 Mark] Calculate the three attributes of interest using the given sample.

2

ii) [4 Marks] Construct two histograms; one of the raw values and another the power-transformed

variable brightness using the value of α from part c i).

iii) [5 Marks] Bootstrap; By resampling the sample S with replacement, construct B = 1000 bootstrap

samples S

?

1

, S?

2

, . . . , S?

1000 and calculate the three attributes of interest on each bootstrap sample.

Then construct three histograms (in a single row) of the bootstrap sample error for each attribute.

iv) [3 Marks] Calculate standard errors for each sample estimate and then construct a 95% confidence

for the population quantity using the percentile method.

d) [4 marks] Sampling Properties of the Bootstrap; For each of three attributes of interest estimate the

coverage probability when using the percentile method and give a standard error. Give a conclusion

about the procedure.

Question 2 – 16 Marks

Compare two sub-populations. Your comparison should include:

• a description of the context and the two sub-populations,

• compare the sub-populations using at least two attributes (but you not are required to consider multiple

testing),

• numerical and graphical summarizes,

• a conclusion.

• You comparison should be limited to 1 to 2 pages.

Your solution should be in your own words, but as motivating examples, see from the Inference exercises:

• 1.4 Comparing Sub-populations in Fire Emblem Heroes

• 1.8 Comparing male and female final grades

• 1.9 Comparing Midterm to final grades

• 1.7 City of Baltimore, Crime & Safety Rates for (2010-2014)

Rubric

Criteria Descriptor Marks

Population/Attributes Description and Difficulty /4

Format Clarity, Organization and LaTeX /4

Comparision Description, Results and Graphic /4

Discussion/Summary Justification and Relevant Terminology used /4

Question 3 – 10 Marks

In your own words summarize the subsection 4.4.2b-Bootstrap_t_Confidence_Interval

3

• You are recommended to use a combination of formulas, full sentences an example.

• You may incorporate subsection 4.4.2c-The_Double_Bootstrap but is not required.

• You are limited to 1 to 2 pages.

Rubric

Criteria Descriptor Marks

Format Organization /3

Writing Clarity & Grammar /2

Content Coverage, Depth, Relevant Terminology used and Example /5

4

## Reviews

There are no reviews yet.