Statistical Methods for Data Science

Mini Project 6

Instructions:

• Total points = 20.

• Submit a typed report.

• Do a good job.

• You must use the following template for your report:

Mini Project #

Name

Names of group members (if applicable)

Contribution of each group member

Section 1. Answers to the specific questions asked

Section 2: R code. Your code must be annotated. No points may be given if a brief

look at the code does not tell us what it is doing.

1. Consider the prostate cancer dataset available on eLearning as prostate cancer.csv.

It consists of data on 97 men with advanced prostate cancer. A description of the

variables is given in Figure 1. We would like to understand how PSA level is related

to the other predictors in the dataset. Note that vesinv is a qualitative variable.

You can treat gleason as a quantitative variable.

Build a “reasonably good” linear model for these data by taking PSA level as the

response variable. Carefully justify all the choices you make in building the model.

Be sure to verify the model assumptions. In case a transformation of response is

necessary, try the natural log transformation. Use the final model to predict the

PSA level for a patient whose quantitative predictors are at the sample means of the

variables and qualitative predictors are at the most frequent category.

1

Statistical Methods for Data Science

Mini Project 5

Consider the prostate cancer dataset available on eLearning. It consists of data on 97 men

with advanced prostate cancer. Following is a description of the variables:

Build a “reasonably good” linear model for these data by taking PSA level as the response variable. Carefully justify all the choices you make in building the model. Be

sure to verify the model assumptions and also to distinguish between quantitative and

qualitative variables. Use the final model to predict the PSA level for a patient whose

predictors are at the sample means of the variables.

Instructions:

• Due date: Thursday, April 20.

• Total points = 25.

• Submit a typed report.

• You can work on the project either individually or in a group of no more than two

students. In case of the latter, submit only one report for the group, and include a description of the contribution of each member.

• Do a good job.

• You must use the following template for your report:

header name description

subject ID 1 to 97

psa PSA level Serum prostate-specific antigen level (mg/ml)

cancervol Cancer Volume Estimate of prostate cancer volume (cc)

weight Weight prostate weight (gm)

age Age Age of patient (years)

benpros Benign prostatic

hyperplasia

Amount of benign prostatic hyperplasia (cm2

)

vesinv Seminal vesicle

invasion

Presence (1) or absence (0) of seminal vesicle

invasion

capspen Capsular

penetration

Degree of capsular penetration (cm)

gleason Gleason score Pathologically determined grade of disease (6, 7 or 8)

Figure 1: List of variables in the prostate cancer data

.

2