Course Title 
COMPUTATIONAL STATISTICS II 
Course code 
STAT 6182 
Credits 
3 
Level: 
Graduate 
Course Type: 
Elective 
Prerequisites 
Multivariate statistics, mathematical statistics, familiarity with basic matrix algebra. 
Rationale
In today’s world, many of the problems that statisticians face cannot be handled analytically thus alternative methods must be sought computationally to find approximate solutions to real world problems. This area has burgeoned over the last two decades as the cost of computing power decreased and speed of computers has increased. It is thus necessary to have courses within the MSc and PhD programs in Statistics that would cater for the need of computational skills in order to solve some of the more demanding problems for which there are no analytic solutions. This is the second course on computational statistics in the Master’s in Statistics program. Data mining is in particular a burgeoning area since most of the data today is classified as BIG data. In addition, spatial and temporal data is necessary in areas such as disease mapping and climate modelling.
Course Description
This course is meant to cover the techniques in statistics that are computational in nature that would not have ordinarily been covered by the other courses in the statistics masters program. The course covers topics such as spatial statistics, advanced Bayesian models and some data mining techniques. Both the theoretical underpinnings of the material and the application through computational aspects. The course will be hands on and practical and will rely heavily on the statistical software R. Matlab will be utilized where there is a need for numerical computations. We will rely on both real data and simulated data for illustrating the main concepts in the course. Datasets from different subject areas will be utilized.The course is the first in a sequence of two computational statistics courses. This course is presented to address these concerns.
Content
Computational Statistics is a branch of mathematical sciences concerned with efficient methods for obtaining numerical solutions to statistically formulated problems. This course will introduce students to a variety of computationally intensive statistical techniques and the role of computation as a tool of discovery. Topics include Spatial and temporal modeling, Bayesian Networks and some selected data mining techniques like neural networks and support vector machines.
Aims and Goals
The main goals of the course are:
 Introduce students to some common data mining techniques with both theoretical and practical coverage.
 Illustrate the use of computers in solving problems such as classification in statistics.
 Model both spatial and temporal data using appropriate statistical software.
 Understand the role of computation as a tool of discovery in data analysis.
Objectives
Upon successful completion of this course, students MUST be able to:
 Use appropriate methods for density estimation in both the univariate and multivariate setting.
 Write computer code in R to perform density estimation and use suitable methods such as crossvalidation to determine the effectiveness of the model.
 Use R and Winbugs to model data in the Bayesian framework.
 Model spatial and temporal data using appropriate data using R.
 Discuss data mining techniques such as complex theories both verbally and in written format
 Use appropriate data mining methods for specific problems such as classification.
Mode of Delivery
Lectures are delivered facetoface. All lectures, assignments, handouts, and review materials are available online to all students. Lectures supplemented with laboratory work and tutorials.
Course content and structure
Week 
Material 
Notes 

1 
Review of R Review of course outline and expectations. Downloading R and related software for course. 
Course introduction, format of delivery. Lab 
2 
Spatial Statistics Geostatistics: Variogram/covariance function; kriging. Introduction to Spatial Statistics R package. 
Lectures, with lecture notes made available *Project topics to be investigated and finalised Lab 
3 
Spatial Statistics Spatiotemporal modeling – Disease mapping. 
Lectures, with lecture notes made available *Students start research on project and report on progress to Supervisor on weekly basis. Lab 
4 
Spatial Statistics Spatiotemporal modeling – Climate Modeling. 
Lectures, with lecture notes made available Lab 
5 
Spatial Statistics Spatiotemporal modeling – Review of ARCGIS. 
Lectures, with lecture notes made available Lab 
6 
Data Mining Techniques Classification using Support Vector Machines 
Lectures, with lecture notes made available Lab 
7 
Data Mining Techniques Classification using Support Vector Machines (continued) 
Lectures, with lecture notes made available Lab 
8 
Data Mining Techniques Neural Networks Part I 
Lectures, with lecture notes made available Lab 
9 
Data Mining Techniques Neural Networks Part II 
Lectures, with lecture notes made available Lab 
10 
Data Mining Techniques Introduction to Bayesian Networks 
Lectures, with lecture notes made available 
11 
Data Mining Techniques Application of Bayesian Networks – Risk Assessment and Decision 
Lectures, with lecture notes made available 
12 
Data Mining Techniques Application of Bayesian Networks – Risk Assessment and Decision – Some more examples 
Lectures, with lecture notes made available 
13 
Revision and Group Presentations


Assessment
Coursework 100 %
This course will be assessed completely via 4 individual assignments and one group project. Each assignment and project will involve both theoretical and computer based problems.
Individual Assignments (4) – 60%
Four homework assignments will be given, collected and graded throughout the semester.
While discussion of the homework is allowed, you must prepare your solutions separately. Direct copying of written work or computer code is considered cheating and will result in a zero on the assignment. Assignments are worth 60% of the course grade.
Group Project (1) – 40%
Each student will be required to do a group project during the second half of the semester. The minimum group size is 3, however larger groups are encouraged. The topics will vary and can be discussed with the instructor. The groups will be required to present their project in class on last week of classes. Full details will be given around class session four. The project is worth 40% of the course grade.
Resource requirements
This course is on computational methods and many of the assignments will require the use of a computer. An introduction to the statistical programming language R will be presented as part of the course and students will be highly encouraged to complete their assignments in R. Other programming languages will be allowed upon approval of the instructor. Students are expected to document and hand in all code used to complete their homework assignments. R can be downloaded for free from: http://www.rproject.org/
The statistical computing lab already has Stata and Matlab. Open source statistical software such as R, Winbugs and Openbugs will be used as far as possible.
PRESCRIBED TEXTS AND READING MATERIALS
Required reading
Computational Statistics, by G. H. Givens and J. A. Hoeting, (Wiley 2005).
Statistical Computing with R by M. Rizzo, Chapman and Hall
Spatial Statistics for Spatial Data, by N. Cressie
Recommended reading
Hastie, T., Tibshirani, R. and Friedman J. 2009. Elements of Statistical Learning Springer.