Multivariate Data Analysis
Computer Practical 4
This computer practical can be accessed via the course web page:
http://www.staff.ncl.ac.uk/d.j.wilkinson/teaching/mas3325/
It may be helpful to have the course web page, the course notes, and
this practical page all open in different tabs of your web browser
during this practical session. In particular, it may save time to
copy-and-paste R commands rather than re-typing them.
- Work through all of the R code in the notes from p.72 to the end of
Chapter 2, paying particular attention to the material relating to the
construction and analysis of Principal Components. When you construct
the 3d plot of the first 3 principal components of the galaxy data, make
sure that you interact with the plot, by clicking and dragging corners
of the enclosing cube in order to look at the data from different
angles.
- For the nci microarray data, try repeating the PCA given in the notes, but instead using the princomp() function. What goes wrong?
- For the zip.train data set, do a 3d interactive plot of the
first 3 principal components. By interacting with the plot, see that the
first 3 principal components do provide enough information to allow
classification of most images.
- For the zip.train data set, form a subset of the data
corresponding to the digit 3.
- Form the principal components for these
data using the prcomp() function.
- Produce a scatterplot of the
second principal component against the first.
- Plot images representing
the loadings for the first 4 principal components of those images.
- What
proportion of variation is explained by the first 1, 2, 3, and 4
principal components?
- Work through all of the R code in the notes from
Chapter 4 (Discrimination and Classification), starting on p.123.
- R contains a famous (old!) dataset called iris containing
measurements on 4 quantitative variables together with a fifth
qualitative variable containing a species classification. We will use
this dataset try out some classification techniques.
- Start by using columns 3 and 4 to predict the classification in column 5. Use LDA and compute the misclassification rate.
- Produce a scatterplot of columns 3 and 4, with points coloured according to the true species, and then highlight the misclassified points.
- You may also use this practical session to work on Project 2 and
get help with Project 2.