Professor of Stochastic Modelling School of Mathematics & Statistics Newcastle University

# Multivariate Data Analysis

## Computer Practical 1

This computer practical can be accessed via the course web page:

It may be helpful to have the course web page, the course notes, and this practical page all open in different tabs of your web browser during this practical session. In particular, it may save time to copy-and-paste R commands rather than re-typing them.

• First log in and start R in the usual way: Start -> All Programs -> Statistical Software -> R -> R-2.15.1
• This practical is concerned with analysing data from the R package ElemStatLearn. First ensure that the package is installed and loaded. Type require(ElemStatLearn) to ensure the package is loaded. If this doesn't give an error, everything is fine. In the unlikely event that the command gives an error, you can install the package using install.packages("ElemStatLearn") and then try again. Do not re-install the package unless there is a problem. In particular, the package should work fine in the cluster room where the practical takes place.
• Work through all of the R code relating to the ElemStatLearn examples, from the course notes, starting on page 6, all the way through to page 22 (inclusive). Do not just blindly enter commands without thinking about what is going on. Make sure you understand exactly what each command is doing. Use ?command to get help on any command you are unsure about. If it still doesn't make sense, ask.
• For the zip.train data, produce separate variance matrix image plots for each of the digits "0" through "9". Can you explain the differences between the images, in relation to the digits they represent?
• Consider the galaxy data set.
• What is the sample mean vector for the subset of the data corresponding to an angle of 111?
• What is the covariance between radial.position and velocity for the data subset referred to above?
• Consider the first 3 genes in the nci data set.
• What is the mean vector for the 3 genes?
• What is the sample variance matrix for the 3 genes?
• Use R to construct the centering matrix H6. Use R to verify that it is idempotent.

There is nothing to hand in for this practical session. However, the material covered in the first two practicals is necessary for completion of the first project to be handed out in Week 5.

 darren.wilkinson@ncl.ac.uk http://www.staff.ncl.ac.uk/d.j.wilkinson/