courseinfo_stat6181_include

Course Title

COMPUTATIONAL STATISTICS I

Course code

STAT 6181

Level:

Graduate

Course Type

Elective

Credits

3

   

Pre-requisites

Multivariate calculus, familiarity with basic matrix algebra, graduate course in probability and statistics (MATH 2140). 

 

Rationale:

In today’s world, many of the problems that statisticians face cannot be handled analytically thus alternative methods must be sought computationally to find approximate solutions to real world problems. This area has burgeoned over the last two decades as the cost of computing power decreased and speed of computers has increased. It is thus necessary to have courses within the MSc and PhD programs in Statistics that would cater for the need of computational skills in order to solve statistical problems that require such skills.

 

Course Description

This course is meant to cover the basics methods in computational statistics. Techniques such as bootstrap, jack-knife, MCMC with particular reference to both hierarchical Bayesian and Empirical Bayes will be covered. The theoretical underpinnings of the course will be covered in conjunction with relevant computational aspects. The course will be hands on and practical and will rely heavily on the statistical software R. Matlab will be utilized where there is a need for numerical computations. We will rely on both real data and simulated data for illustrating the main concepts in the course. Datasets from different subject areas will be utilized.The course is the first in a sequence of two computational statistics courses. 

Content

Computational Statistics is a branch of mathematical sciences concerned with efficient methods for obtaining numerical solutions to statistically formulated problems. This course will introduce students to a variety of computationally intensive statistical techniques and the role of computation as a tool of discovery. Topics include numerical optimization in statistical inference (expectation-maximization (EM) algorithm, Fisher scoring, etc.), random number generation, Monte Carlo methods, randomization methods, jackknife methods, bootstrap (parametric and non-parametric) methods,

 

Aims and Goals

The main goals of the course are:

  • Introduce and explore modern computational methods used in statistics.
  • Solve numerically some numerical problems associated with statistical routines like the Newton-Rapson method.
  • Review methods for simulation, estimation and visualization of statistical data.
  • Understand the role of computation as a tool of discovery in data analysis.
  • Perform simple Bayesian Hierarchical modeling.
  • Write the full conditionals for parameters in a hierarchical data setting
  • Use appropriate software in Bayesian model estimation.

 

Objectives

Upon successful completion of this course, students MUST be able to:

  • State and use appropriate optimization techniques for single and multi-parameter problems
  • Apply appropriate bootstrap techniques to both the parametric and non-parametric setting
  • Use the Jack-knife in parameter estimation
  • Derive the Expectation and Maximization steps of the EM Algorithm given a statistical scenario.
  • Generate random numbers from various distributions
  • Use R and Matlab to solve problems involving numerical approximations
  • Apply Bayesian hierarchical models to real world data.

 

Mode of Delivery

Lectures delivered face-to-face. All lectures, assignments, handouts, and review materials are available online to all students. Each face to face session will involve a practical computational component. Students are required to bring their laptops with appropriate software for all classes.

Course content and structure

Week

Material

Notes

1

Introduction to R and Matlab

Review of the main numerical recipes in R. An introduction to matrix algebra using R. Writing short routines in R.

Course introduction, format of delivery.

Lab

2

Optimization Theory

Single and Multivariate methods

Lectures, with lecture notes made available

*Project topics to be investigated and finalised

Lab

3

The Basic Bootstraps

Parametric and Non-parametric simulation.

Lectures, with lecture notes made available

*Students start research on project and report on progress to Supervisor on weekly basis.

Lab

4

Bootstrap

Further ideas related to semi-parametric models and censoring

Lectures, with lecture notes made available.

Lab

5

Bootstrap

Applications of bootstrap to hypothesis testing. Permutations tests.

 

Lectures, with lecture notes made available

Lab

6

Jackknife

An introduction to the Jackknife. Illustration of the disadvantages of the Jackknife computationally.

Lectures, with lecture notes made available

Lab

7

EM Algorithm

Missing Data; Marginalization and Notation. Simple examples for example using Fisher’s data.

 

Lectures, with lecture notes made available

Lab

8

EM Algorithm

Application of the EM and variants of the EM to problems such as  finding the number of components in a mixture of normal.

Lectures, with lecture notes made available

Lab

9

EM Algorithm

Applications to multivariate data

 

Lectures, with lecture notes made available

Lab

10

Bayesian Modeling

MCMC, Simple Hierarchical models; Gibbs Sampling; Implementation in R;

 

Lectures, with lecture notes made available

Lab

11

Bayesian Modelling

Application of Bayeisna models to some advanced MCMC, Simple Hierarchical models; Gibbs Sampling; Implementation in R and Winbugs.

Lectures, with lecture notes made available

Lab

12

Bayesian Modelling

Examples some more hierarchal models using a combination of R, Winbugs, OpenBugs and RBugs.

Lectures, with lecture notes made available

Lab

13

Revision and Group Presentations

 

 

 

 Assessment

 Course-work 100 %

This course will be assessed completely via 4 individual assignments and one group project. Each assignment and project will involve both theoretical and computer based problems. 

Individual Assignments (4) – 60%

Four homework assignments will be given, collected and graded throughout the semester.

While discussion of the homework is allowed, you must prepare your solutions separately. Direct copying of written work or computer code is considered cheating and will result in a zero on the assignment. Assignments are worth 60% of the course grade.

 

Group Project (1) – 40%

Each student will be required to do a group project during the second half of the semester. The minimum group size is 3, however larger groups are encouraged. The topics will vary and can be discussed with the instructor. The groups will be required to present their project in class on last week of classes. Full details will be given around class session four. The project is worth 40% of the course grade.

 

Resource requirements

The statistical computing lab already has Stata and Matlab. Open source statistical software such as R, Winbugs and Openbugs will be used as far as possible.

PRESCRIBED TEXTS AND READING MATERIALS

Required reading

Computational Statistics, by G. H. Givens and J. A. Hoeting, (Wiley 2005).

Statistical Computing with R by M. Rizzo, Chapman and Hall 

Recommended reading

Hastie, T., Tibshirani, R. and Friedman J. 2009. Elements of Statistical Learning Springer.