Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
http://www.lancaster.ac.uk/data-science/
Contents
Section
1
Appendix
A
Module Descriptions
Appendix
B
Term Dates
The Lancaster programme combines interdisciplinary teaching from three world-leading departments: the
School of Computing and Communications (SCC); the Department of Mathematics and Statistics and the
Lancaster Environment Centre (LEC).
Building upon a common core set of modules the Data Science Masters schemes allow for subject specialism:
MSc: Data Science
- Statistical inference specialism
- Computing specialism
MSc: Data Science for the Environment.
The Computing specialism of the Data Science MSc is aimed at students with a background in computer
science who want to develop strong quantitative, data handling and analytical skills, along with skills that
support the engineering of systems to support data science. The Statistical Inference specialism of the
Data Science MSc is aimed at students with a background in mathematics and statistics and who want to
develop their statistical, computing and analytical skills for the extraction, synthesis, processing and
analysis of large and complex data. Students may also choose modules across specialisms in order to fit
their background and career aspirations. Course details, by specialism, are provided in Section 3 and
Section 4.
The production of data that are relevant to environmental issues is increasing at a phenomenal rate.
Technological advances in fields such as remote sensing, imaging and forecasting are increasingly producing
data from more sources and with higher resolution than we have been able to handle until recently. There is a
need for data scientists who are able to organise, analyse and disseminate such data in order to answer questions
which will help us manage a range of scenarios, from management of natural resources to predicting population
change. This is the aim of the MSc: Data Science for the Environment.
Overseas students will require a visa to be able to study with us in the UK.
2.3 Making an Application
You should apply online using the My Applications website
Supporting documentation includes:
Your degree transcripts and certificates, including certified English translations if applicable
Two references
If English is not your first language, you should also enclose copies of your English language test
results
You also need to complete a personal statement to help us understand why you wish to study your
chosen degree.
If you are a current Lancaster University student, you will not need to provide your Lancaster degree
transcript and only need to provide one reference.
The postgraduate section of Lancaster University's website has plenty of Information about applying to
Lancaster
All module descriptions are, in alpha-numeric order, contained in Appendix A. Modes of assessment and
credit weightings for all modules follow in Table 1, Section 3.2.
The dissertation component consists of a substantial project applying data scientific methods to address a
substantive research question. This will be undertaken during the summer term and will typically
incorporate a two-month industrial or research organisation placement.
Title
Data Science
Fundamentals
Data Mining
SCC461
Programming for
Data Scientists
Statistical Methods
CFAS440
and Modelling
CFAS406 Likelihood Inference
Weeks
scheduled
6 20
Coursework
%
Exam
%
100
NA
1 10
100
NA
100
NA
100
NA
100
NA
1 10
1, 2, 4
5
Credit
Weighting
15
15
15
15
15
Title
SCC401
Elements of
Distributed Systems
SCC413
Weeks
scheduled
11 - 20
Coursework
%
Exam
%
100
NA
14 - 20
100
NA
Credit
Weighting
15
15
Systems Architecture
11 - 20
15
100
NA
and Integration
SCC modules: taught by School of Computing and Communications except for SCC460 and SCC461 which are
cross-disciplinary; CFAS modules: taught by the Department of Mathematics and Statistics
SCC411
SCC460
SCC403
SCC461
MATH552
MATH551
A `compulsory specialist module (15 credits) in Bayesian Inference for Data Science (MATH555);
A set of three `optional modules (30 credits) chosen from 10 available spanning a range of
specialist/advanced statistical methods relevant to the design, analysis and interpretation of
observational and experimental data:
MATH562
MATH563
MATH564
CHIC565
MATH566
MATH572
MATH573
MSCI523
MSCI526
MSCI524
All module descriptions are, in alpha-numeric order, contained in Appendix A. Modes of assessment and
credit weightings follow in Section 4.2, Table 2.
The dissertation component consists of a substantial project applying data scientific methods to address a
substantive research question. This will be undertaken during the summer term and will typically
incorporate a two-month industrial or research organisation placement.
Title
Coursework
Exam
%
100
NA
Credit
Weighting
15
SCC460
SCC403
Data Mining
100
NA
15
SCC461
100
NA
15
50
50
15
50
50
15
Coursework
Exam
%
50
50
Credit
Weighting
15
Coursework
%
Exam
%
50
50
Credit
Weighting
10
50
50
10
50
50
10
CHIC565
50
50
10
50
50
10
100
NA
10
50
50
10
MSCI523
Forecasting
100
NA
10
MSCI526
100
NA
30
70
Title
MSCI534
Title
Environmental epidemiology
10
10
SCC modules taught by School of Computing and Communications; MATH modules: taught by the
Department of Mathematics and Statistics; MSCI modules taught by Lancaster University Management
School: CHIC taught by CHICAS research group.
4. A dissertation (60 credits) with associated industrial placement
SCCXXX modules: School of Computing and Communications Data Science submission box
(contact Charlotte Griffiths c.griffiths@lancaster.ac.uk
MATHXXX/CFASXXX modules: Mathematics and Statistics Dept. Data Science submission box
(contacts MATHXXX: Jane Hall j.hall2@lancaster.ac.uk; CFASXXX Angela Mercer
a.j.mercer@lancaster.ac.uk
LECXXX: Lancaster Environment Centre Masters submission box (contact Stacey Read
s.read@lancaster.ac.uk)
Precise details will be provided by the module tutor. An assignment cover sheet must signed be included
at the front of your coursework, to confirm that the work is your own.
Submitting your assignments online
Each module you take will have its own Moodle page. You can view all of you Moodle page access by
accessing https://modules.lancs.ac.uk/ or logging into your student portal https://portal.lancs.ac.uk/.
Your assignments should be submitted online via Moodle. The example below illustrates this.
unsanctioned late work submitted up to 3 days after the deadline will incur a penalty of
10% of the available marks for the work
work submitted after 3 days, without an agreed extension, will be marked at 0%.
Plagiarism
The University has established an institutional framework for dealing with plagiarism: Plagiarism
Framework Secretariat: and is a member of the JISC Plagiarism Detection Service (Turnitin), which
searches for matching text between a paper and available material on the Internet. The LEC PG Office
reserves the right to assess some or all of your work submitted electronically using this service.
You commit plagiarism if you try to pass off the work of any other person, whether a published author, an
internet source or a fellow-student, as your own. To copy passages, illustrations or even sentences from a
book without acknowledgement is gross plagiarism; to copy from another student's essay or dissertation
is a particularly grave offence. Never use another's actual words (or a close paraphrase of them) without
putting them in quotation marks and giving an exact reference to your source.
Serious plagiarism as described above may result in an essay being given a zero mark, and if repeated - or
if particularly grave - may result in your being excluded from further courses within the University (this
matter would be decided by the Standing Academic Committee). These strict rules are in place to ensure
that the work you undertake as part of your course has real value to you both educationally and in giving
you a sense of personal achievement.
Feedback and Notification of Marks
Feedback on assessed work will be provided within four weeks of submission (excluding vacations and
unforeseen staff absences). Once coursework has been marked and the marks recorded by the PG Office,
students will be informed and can then collect coursework from the administrating PG Office.
Exam papers are marked in line with the above timescales for coursework. It is University policy not to
return examination scripts to students.
Students may also view their coursework marks via the Student Portal once they have been processed
by the administrating PG Office.
It should be remembered that until the External Exam Board has met, (October each year) any marks
given to students are provisional and may be subject to change. The External Exam Board does not
usually meet until 6 weeks after the end of the programme.
As per the University regulations there is no appeal against academic judgement.
Awards
The overall mark awarded for the taught courses is calculated from the average of the marks gained in the
taught modules, weighted for the credit rating of each module. The overall mark awarded for the
dissertation is taken as the average mark of the first and second markers. The following criteria will be
applied for the awards of Masters degrees:
Award of MSc in Data Science
To qualify for the Masters degree in Data Science you must achieve a total of 180 credits: 120 credits
from the taught course component and 60 credits from the Dissertation. Credit for a module is given if
the overall module mark is 50% or more.
Condonation
Notwithstanding this requirement, candidates shall be eligible for an award by compensation /
condonation in respect of up to a maximum of 45 credits of a taught Masters programme provided that:
a) no single module mark falls below 40%;
b) the candidates weighted mean or modal mark is 50% or greater;
Higher awards: MSc in Data Science with Merit or Distinction
The MSc in Data Science may be awarded with Merit providing that weighted mean mark is 60%
or greater.
The MSc in Data Science may be awarded with Distinction providing that the weighted mean mark
is 70% or greater.
Subject to the condonation/compensation guidelines and the criteria for higher awards and only
students achieving at least a condonable mark for modules at the first attempt are eligible for the
classes of Merit and Distinction.
Borderline cases
Where the overall average falls within 2 percentage points of the range (68%, 58% or 48% respectively)
or in cases where most credits are in the class above the mean, the Exam Board will have discretion to
decide which of the alternative awards to recommend.
Re-sits
A student who fails to achieve a mark of 50% for a module/element in the MSc programme is entitled to
one opportunity for reassessment in each failed module/element. A mark of not more than 50% can
be awarded for modules re-taken.
The form of the reassessment is at the absolute discretion of the Examination Board, save that the form of
reassessment must allow the student a realistic chance of achieving 50% in the re-sit.
Role
Telephone
No. (01524)
510310
510515
510328
594666
593964
593064
592516
593765
510600
592525
Moodle
Each postgraduate programme is supported by its own online page which uses Moodle as its base. This
system is used by students from departments all across the University so you are not alone. You should
familiarise yourself with this as much as possible as it is a key resource for students on the course. To
find the page for your course, go to the main Moodle site from your Student Portal. Go to
https://portal.lancs.ac.uk/. You will need your University username and password to get access to the
site.
Moodle is used for:
Gaining access to your Module pages,
Viewing your timetable for the year and receiving notifications about changes to the timetable,
Email
All students will be given a Lancaster University email address, in the form yourname@lancaster.ac.uk.
Please note that any contact we make with you will be through your Lancaster email address and
therefore you must access this e-mail account on a daily basis. Failure to check your Lancaster account
regularly does not constitute an excuse for missing important information, dates etc.
Computing Facilities
There are numerous open access PC Labs located around campus. The PC labs provide a wide range of
software, printers (colour and monochrome) and scanning facilities. All lab PCs are connected to the
campus network and internet.
Information Systems Services (ISS) also provides other IT services to students, including IT workshops
and courses.
It is also possible to access University services remotely e.g. from home, or via a smart phone.
The ISS Service Desk can be contacted if you require any general computing-related assistance.
Learning Zone
The Learning Zone is located centrally on Alexandra Square and is accessible 24-7. It provides relaxed
surroundings for students to work within and bookable pods for meetings, presentations and group
work.
University Library facilities
Lancaster University Library is a valuable reference resource. Many of the main texts for the module on
your programmes are available from here. Your registration with the library should have been completed
when you registered with the University.
Using the Sports Facilities on Campus
As student of the University, you are entitled to student membership of the Sports Centre for the duration
of your course. Please note however, that membership runs annually according to the academic year
(October September) so, depending on when you are attending your course, it may/may not be costeffective to apply for membership. Membership details are available from the Sports Centre itself
http://sportscentre.lancs.ac.uk/. Details of opening times / classes etc. can be found in the leaflet in your
induction pack (or ring the Sports Centre itself on 01524 510600).
The Students Union & Student Support Office
LUSU is a body that represents all student views to the University, providing professional, academic and
other advice for students. Students registering at Lancaster automatically become members of the
Students Union. There are no financial obligations associated with membership, though you can
withdraw from the union if you wish, by completing an opt-out form.
You can also apply online for your NUS Purple Card at http://card.lusu.co.uk/members and then collect
this from LUSU in Bowland College, next to Alexandra Square.
Academic Support for Students in the Faculty of Science and Technology
Academic Support offers a number of drop-in sessions open to all students, plus bookable courses for
International students, as well as one-to-one consultations. More information about this service can be
found at: http://www.lancs.ac.uk/sci-tech/academic_support/
Academic support for students in FST s provided via a faculty-based service and the Student Learning
Advisor for the Faculty of Science & Technology is Robert Blake: fststudyadvice@lancaster.ac.uk. Support
is provided for any student seeking advice on effective scientific writing and study practices. Additional
support is available for students who are non-native speakers of English or those with learning
difficulties such as dyslexia. If you have a previously identified learning difficulty, or you suspect your
performance is being hampered by a problem that has previously never been diagnosed, you are
encouraged to contact Academic Support.
Careers Advice
The Universitys Careers Service, offers an extensive service tailored to your needs. Their professional
staff includes specialists in careers information, employer liaison, event management and careers
guidance. They work closely with other staff within the university, the Students Union, professional
bodies and a broad range of national and international employers to provide a variety of opportunities to
help you progress your career goals. Careers are located in the Base, just off Alexander Square.
TARGETconnect is an online system administered by Careers and provides students with access to
student and graduate vacancies, details of careers events, an appointment booking system to see a
careers adviser and the online careers query system. Careers information including online psychometric
testing and video resources are available online.
Student Based Services
Student Based Services provide information, advice and guidance covering different areas.
We hope you have an enjoyable and productive time at Lancaster, but we recognise that sometimes
problems can affect your ability to study. Please do not forget that it is your responsibility to seek help if
you are experiencing difficulties. The University will do whatever is possible to assist you, provided that
we are aware of your problems. These may be personal, financial or academic. If you find yourself
getting into difficulties we strongly urge you to consult the PG Administrator in the first instance.
In addition, Student Based Services provide information, advice and guidance covering different areas of
student welfare.
The Base is situated on A-Floor of University House in Alexandra Square and is a onestop enquiry desk for all Student Based Services. Staff there will be able to make
appointments with specialist staff where needed. Details on the various student based
services can be found on the links below.
Student Registry responsible for all
regulations, policies and procedures governing
your award. The Student Registry is also
responsible for managing your official record,
including personal details. They can provide
information on many aspects of student
administration.
Counselling and Mental Health Service staff provide confidential and professional
support on issues such as personal, family,
social or academic matters over the short
term, to more complex or difficult longer term
problems.
The
service
offers
both
appointment and drop-in sessions.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
CFAS406
Statistical Inference
Dr Steffen Grunewalder
Coursework (100%)
16 hours
15
M1, M2
This modules aims to provide an in-depth understanding of statistics as a general approach to the
problem of making valid inferences about relationships from observational and experimental
studies. Examples from social science and environmental science are used to illustrate this approach.
The emphasis will be on the principle of Maximum Likelihood as a unifying theory for estimating
parameters. The module is delivered as a combination of lectures and practicals over two days.
Topics covered will include:
Revision of probability theory and parametric statistical models.
The properties of statistical hypothesis tests, statistical estimation and sampling
distributions.
Maximum Likelihood Estimation of model parameters.
Asymptotic distributions of the maximum likelihood estimator and associated statistics for
use in hypothesis testing.
Application of likelihood inference to simple statistical analyses including linear regression
and contingency tables.
Learning: Students will learn through the application of concepts and techniques covered in the
module by application to real data sets. Students will be encouraged to examine issues of
substantive interest in these studies. Students will acquire knowledge of:
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
CFAS440
Statistical Methods and Modelling
Prof Brian Francis, Dr David Lucy, Dr Emma Eastoe
Coursework (100%)
45 hours
15
M1, M2
The aim of this module will be to address the fundamentals of statistics for those who do not have a
mathematics and statistics background. The module is delivered over three intensive two-day
sessions of lectures and practicals. Students will develop an understanding of the theory behind
core statistical topics; sampling, hypothesis testing, and modelling. They will also be putting this
knowledge into practice, by applying it to real data to address research questions.
The module is an amalgamation of three two-day short courses taught in weeks 1 - 4:
Mathematics for Statistics; the nature of variables, scientific notation, logarithms,
combinations, algebra, matrices and an introduction to differentiation and integration.
Statistical Methods; commonly used probability distributions, parameter estimation,
sampling variability, hypothesis testing, basic measures of bivariate relationships.
Generalised Linear Models; the general linear model and the least-squares method, logistic
regression for binary responses, Poisson regression for count data. More broadly, how to
build a flexible linear predictor to capture relationships of interest.
These short courses will be followed by supported tutorial sessions in weeks 3-10.
Learning: On completion of this module a student will be able to:
Comprehend the mathematical notation used in explaining probability and statistics.
Demonstrate knowledge of basic principles in probability, statistical distributions, sampling
and estimation.
Make decisions on the appropriate way to test hypothesis, and carry out the test and
interpret the results.
Demonstrate knowledge of the general linear model, the least-squares method of
estimation, and the linear predictor. As well the extensions to generalised linear models for
discrete data.
Decide on the appropriate way to statistically address a research question. Carry out all
aspects of a statistical analysis, assessing model results and performance.
Report their findings.
Assessment: There will be three pieces of coursework:
One assessment for Statistical Methods; assessing understand and application of statistical
concepts, and interpretation of results from hypothesis testing.
Two independently produced reports for Generalized Linear Models; centred on in-depth
statistical analyses.
Recommended texts and other learning resources:
Upton, G., & Cook, I. (1996). Understanding statistics. Oxford University Press.
Rice, J. (2006). Mathematical statistics and data analysis. Cengage Learning.
Dobson, A. J., & Barnett, A. G. (2008). An Introduction to Generalized Linear Models. CRC
Press.
Fox, J. (2008). Applied regression analysis and generalized linear models. Sage Publications.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
Prerequisites:
MATH551
Likelihood Inference
Dr Juhyun Park
Coursework (35%), module test (15%), and end of year exam (50%)
25 hours
15
M1, M2
UG Mathematics/Statistics (probability theory; calculus; matrices etc)
This course considers the idea of statistical models, and how the likelihood function, the probability
of the observed data viewed as a function of unknown parameters, can be used to make inference
about those parameters. This inference includes both estimates of the values of these parameters,
and measures of the uncertainty surrounding these estimates. We consider single and multiparameter models, and models which do not assume the data are independent and identically
distributed. We also cover basic computational aspects of likelihood inference that are required in
many practical applications.
Topics covered will include:
Definition of the likelihood function for single and multi-parameter models, and how it is
used to calculate point estimates (maximum likelihood estimates).
Asymptotic distribution of the maximum likelihood estimator, and the profile deviance, and
how these are used to quantify uncertainty in estimates.
Inter-relationships between parameters, and the definition and use of orthogonality.
Generalised Likelihood Ratio Statistics, and their use for hypothesis tests.
Calculating likelihood functions for non-IID data.
Simple use of computational methods to calculate maximum likelihood estimates and
confidence intervals and to perform hypothesis tests.
Model criticism through residual analysis.
On successful completion students will be able to:
Understand how to construct statistical models for simple applications.
Appreciate how information about the unknown parameters is obtained and summarized via
the likelihood function.
Calculate the likelihood function for IID data, and for some statistical models which do not
assume independent identically distributed data.
Evaluate point estimates and make statements about the variability of these estimates.
Understand the inter-relationships between parameters, and the concept of orthogonality.
Perform hypothesis tests using the generalised likelihood ratio statistic.
Use computational methods to calculate maximum likelihood estimates.
Use computational methods to construct confidence intervals, and perform hypothesis tests.
Look at residuals to judge how appropriate a model is.
Bibliography:
Azzalini. Statistical Inference: Based on the Likelihood. Chapman and Hall. 1996.
D R Cox and D V Hinkley. Theoretical Statistics. Chapman and Hall. 1974.
Y Pawitan. In All Likelihood: Statistical Modeling and Inference Using Likelihood. OUP. 2001.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
Co-requisites:
MATH552
Generalised Linear Models
Dr Michalis Kolossiatis
Coursework (35%), module test (15%), and end of year exam (50%)
25 hours
15
M1, M2
MATH551
Generalised linear models are now one of the most frequently used statistical tools of the applied
statistician. They extend the ideas of regression analysis to a wider class of problems that involves
exploring the relationship between a response and one or more explanatory variables. In this course
we aim to discuss applications of the generalised linear models to diverse range of practical
problems involving data from the area of biology, social sciences and time series to name a few and
to explore the theoretical basis of these models.
Topics covered:
We introduce a large family of models, called the generalised linear models (GLMs), that
includes the standard linear regression model as a special case and to discuss the theoretical
properties of these models.
We learn a common algorithm called iteratively reweighted least squares algorithm for the
estimation of parameters.
We fit and check these models with the statistical package R; produce confidence intervals
and tests corresponding to questions of interest; and state conclusions in everyday
language.
On successful completion students will be able to demonstrate knowledge and practice in:
Model specification: choosing a suitable GLM; equivalent models; aliasing.
Fitting models: maximum likelihood estimation using R.
Effects of covariates: confidence intervals and tests of quantities of interest, interaction.
Variable selection: backwards stepwise selection of covariates.
Assessing model fit: deviance and Pearson residuals; leverage; residual deviance test of
model fit; over-dispersion.
State conclusions in everyday language.
Bibliography:
P. McCullagh and J. Nelder. Generalized Linear Models, Chapman and Hall, 1999.
A.J. Dobson, An Introduction to Generalised Linear Models, Chapman and Hall, 1990.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
Prerequisites:
MATH555
Bayesian Inference for Data Science
TBC
Coursework (50%) and end of year exam (50%)
25 hours
15
L1, L2
MATH551, MATH552
This module aims to introduce the Bayesian view of statistics, stressing its philosophical contrasts
with classical statistics, its facility for including information other than the data into the analysis and
its coherent approach towards inference and model selection. The module will also introduce
students to MCMC (Markov chain Monte Carlo), a computationally intensive method for efficiently
applying Bayesian methods to complex models. By the end of the course the students should be able
to formulate an appropriate prior for a variety of problems, calculate, simulate from and interpret
the posterior and the predictive distribution, with or without MCMC as appropriate and to carry out
Bayesian model selection using the marginal likelihood. Students should be able to carry out all of
this using the programming language R.
Topics covered:
Bibliography:
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
Prerequisites:
MATH562
Extreme Value Theory
Dr Jennifer Wadsworth
Coursework (50%) and end of year exam (50%)
20 hours (intensive teaching mode in week 11)
10
L1 (weeks 11/12)
MATH551, MATH552
This module aims to develop the asymptotic theory, and associated techniques for modelling and
inference, associated with the analysis of extreme values of random processes. The course will focus
on the mathematical basis of the models, the statistical principles for implementation and the
computational aspects of data modelling. Students are expected to acquire the following: an
appreciation of, and facility in, the various asymptotic arguments and models; an ability to fit
appropriate models to data using specially developed R software; the ability to understand and
interpret fitted models.
For many physical processes, especially environmental processes, it is extremes of the process that
are of greatest concern; the highest sea-levels cause floods; the fastest wind-speeds destroy
buildings, etc. Most of the statistical theory is concerned with modelling typical behaviour; in
contrast, the analysis of extremes requires us to model the unusual. This means that we have very
little data with which we can either develop or estimate models. In the absence of alternatives,
asymptotic theory is used as the basis for model development, but the issue of data scarcity leads to
interesting challenges for creating models that optimise such data as are available.
Topics covered:
Asymptotic theory for maxima of univariate independent and identically distributed (iid)
random variables: limit distributions, GEV distribution, and domains of attraction.
Extension of asymptotic theory for univariate iid variables to cover top order statistics and
threshold exceedances: GP distribution.
Statistical modelling and inference using maxima and threshold methods.
Statistical modelling of extremes of non-identically distributed random variables.
Asymptotic theory and statistical methods for extreme values of stationary sequences:
clustering, extremal index.
Bibliography:
S G Coles, An Introduction to the Statistical Modelling of Extreme Values, Springer-Verlag,
London, 2001.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
MATH563
Clinical Trails
Dr Deborah Costain
Course work (50%) and written exam (50%)
20 hours (intensive teaching in week 11)
10
L1 (weeks 11/12)
This course aims to introduce students to aspects of statistics, which are important in the design and
analysis of clinical trials. On completion of the module students should understand the basic
elements of clinical trials, be able to recognise and use principles of good study design, and be able
to analyse and interpret study results to make correct scientific inferences.
Clinical trials are planned experiments on human beings designed to assess the relative benefits of
one or more forms of treatment. For instance, we might be interested in studying whether aspirin
reduces the incidence of pregnancy-induced hypertension; or we may wish to assess whether a new
immunosuppressive drug improves the survival rate of transplant recipients.
Topics covered:
Clinical trials fundamentals: design issues, ethics and defining and estimating treatment.
Cross-over trials.
Sample size determination.
Equivalence and non-inferiority trials.
Meta-analysis.
Discussion of more general methodological and ethical issues.
Bibliography:
D.G. Altman, Practical Statistics for Medical Research, Chapman and Hall, 1991.
S. Senn, Cross-over trials in clinical research, Wiley, 1993.
S. Piantadosi, Clinical Trials: A Methodologic Perspective, John Wiley & Sons, 1997.
ICH Harmonised Tripartite Guidelines.
J.N.S. Matthews, Introduction to Randomised Controlled Clinical Trials, Arnold, 2000.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
MATH564
Principles of Epidemiology
Dr Gillian Lancaster
Coursework (50%) and written Exam (50%)
20 hours (intensive teaching in week 13)
10
L1 (weeks 13/14)
Epidemiology is the study of the distribution and determinants of disease in human populations. This
course provides an introduction to the principles and statistical methods of epidemiology. Various
concepts and strategies used in epidemiological studies are examined. Most inference will be
likelihood based, although the emphasis is on conceptual considerations.
Topics covered:
The history of epidemiology and the role of statistics
Measures of health and disease: incidence, prevalence and cumulative incidence risk
Types of epidemiological studies: Randomized controlled trials, cohort studies, case-control
studies, cross-sectional and ecological studies
Causation in epidemiology
Potential errors in epidemiological studies: selection bias, confounding
Remedies for confounding: Standardized rates, stratification and matching
Diagnostic test studies
Bibliography:
R. Beaglehole, R. Bonita and T. Kjellstroem (1993) Basic epidemiology. Geneva: World Health
Organization.
D. Clayton and M. Hills (1993) Statistical models in epidemiology. Oxford: Oxford University
Press.
M. Woodward (1999) Epidemiology: Study design and data analysis. Chapman & Hall, Boca
Raton
K.J. Rothman, S. Greenland and T.L. Lash. Modern Epidemiology. Lippincott Williams &
Wilkins,US, 2008.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
Prerequisites:
CHIC565
Environmental Epidemiology
Dr Ben Taylor
Coursework (50%) and written Exam (50%)
20 hours (intensively in week 19)
10
L2 (weeks 19/20)
MATH551; MATH552; MATH564
This course aims to introduce students to the kinds of statistical methods commonly used by
epidemiologists and statisticians to investigate the relationship between risk of disease and
environmental factors. Specifically the course concentrates on studies with a spatial component. A
number of published studies will be used to illustrate the methods described, and students will learn
how to perform similar analyses using the statistical R package. By the end of the course students
should have an awareness of the kinds of methods used in environmental epidemiology, including
an appreciation of their limitations. They should also be capable of conducting a number of these
methods themselves.
Topics covered:
Introduction: Motivating examples for methods in course.
Spatial Point Processes: Theory and methods for the analysis of distributions of points in
two-dimensional space; the Poisson process; univariate and bivariate K-functions.
Spatial variation in risk: Case-control point-based methods and methods based on counts;
spatial clustering.
Disease mapping: Graphical investigation of spatial variation in risk; constructing smooth
maps of disease risk from area-level count data.
Geographical correlation studies: Poisson regression; the ecological fallacy and its relation
with disease mapping.
Point source methods: Investigation of risk associated with distance from a point or line
source, for point and count data; practical implementation in epidemiological studies.
Bibliography:
P.J. Diggle. Statistical Analysis of Spatial Point Patterns (2nd edition). London: Edward
Arnold. 2003.
P. Elliott, J. Cuzick, D. English and R. Stern (eds). Geographical and environmental
epidemiology: methods for small-area studies. Oxford: Oxford University Press, 1992.
O. Schabenberger and C.A. Gotway. Statistical Methods in Spatial Data Analysis. Boca Raton:
Chapman & Hall/CRC, 2005.
L. Waller and C.A. Gotway. Applied Spatial Statistics for Public Health Data. New York: Wiley,
2004.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
Prerequisites
MATH566
Longitudinal Data Analysis
Dr Juhyun Park
Coursework (50%) and written Exam (50%)
20 hours (intensive teaching in week 15)
10
L2 (weeks 15/16)
MATH551; MATH552
Longitudinal data arise when a time-sequence of measurements is made on a response variable for
each of a number of subjects in an experiment or observational study. For example, a patient's blood
pressure may be measured daily following administration of one of several medical treatments for
hypertension. The practical objective of many longitudinal studies is to find out how the average
value of the response varies over time, and how this average response profile is affected by different
experimental treatments. This module presents an approach to the analysis of longitudinal data,
based on statistical modelling and likelihood methods of parameter estimation and hypothesis
testing.
The specific aim of this module is to teach students a modern approach to the analysis of
longitudinal data. Upon completion of this course the students should have acquired, from lectures
and practical classes, the ability to build statistical models for longitudinal data, and to draw valid
conclusions from their models.
Syllabus:
What are longitudinal data?
Exploratory and simple analysis strategies
The independence working assumption
Normal linear model with correlated errors
Linear mixed effects models
Generalised estimating equations
Dealing with dropout
Bibliography:
H. Brown and R. Prescott, Applied Mixed Models in Medicine, Wiley, 1999.
P.J. Diggle, P. Heagerty, K.Y. Liang and S.L. Zeger, Analysis of Longitudinal Data (second
edition), Oxford University Press, 2002.
G. Verbeke and G. Molenberghs, Linear Mixed Models for Longitudinal Data, Springer, 2000.
R. E. Weiss, Modelling longitudinal data, Springer, 2005.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
MATH572
Genomics: technologies and data analyses
Dr Thomas Jaki
Coursework (100%)
Distance learning with workshops in Lent term (weeks 11 20)
10
L1,L2
To describe several modern genomics technologies in their biological context, to describe the types
of data that are obtained from these technologies, to describe the statistical methodologies used to
analyse such data and to have the students use statistical packages to perform such analyses on
example data sets and interpret the results.
Genomics is a large field dealing with everything from DNA to metabolites and from evolution to
microbiology. This course focuses on several genomics technologies by first of all putting them into
their biological and genetic context and secondly describing the types of biological questions that
can be answered with the data from these technologies. As an example of technologies that are to
be discussed during this course are (i) DNA sequencing (ii) SNP (iii) microarrays and (iv) blotting and
other proteomics methods. The most commonly use statistical analysis tools for each technology
described will be discussed. The types of biological questions that are going to be addressed relate
to issues such as genomic variation, constant genome and changing expression, human evolution
and migration, disease and normality.
Syllabus:
DNA sequencing, Single Nucleotide Polymorphisms (SNPs), transcriptional analysis via
microarrays and Proteomics.
Visualisation methods, hypothesis testing, multiple testing problems, multivariate methods,
regression analyses for high dimensional data.
Bibliography:
Malcolm Campbell and Laurie J. Heyer. Genomics, Proteomics and Bioinformatics. CSHL
Press, 2003.
Lange, K. Mathematical and Statistical Methods for Genetic Analysis, Springer, 2nd ed. 2002.
Wit, E. C. and McClure, J. D. Statistics for Microarrays: Design, Analysis and Inference, John
Wiley & Sons, 2004.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
Pre-requisites:
MATH573
Survival and Event History Analysis
Dr Andrew Titman
Coursework (50%) and written Exam (50%)
20 hours (intensive teaching in week 17)
10
L2 (weeks 17/18)
MATH551; MATH552
To describe the theory and to develop the practical skills required for the design and analysis of
medical studies leading to the observation of survival times or multiple failure times. By the end of
the course students should be able to develop study designs and to carry out sophisticated analyses
of this type, should be aware of the variety of statistical models and methods now available, and
understand the nature and importance of the underlying model assumptions.
In many medical applications interest lies in times to or between events. Examples include time
from diagnosis of cancer to death, or times between epileptic seizures. This advanced course begins
with a review of standard approaches to the analysis of possibly censored survival data. Survival
models and estimation procedures are reviewed, and emphasis is placed on the underlying
assumptions, how these might be evaluated through diagnostic methods and how robust the
primary conclusions might be to their violation. Study design is considered, in particular how to
define failure and censoring and how to determine a suitable sample size and duration of follow-up.
The course closes with a description of models and methods for the treatment of multivariate
survival data, such as repeated failures, the lifetimes of family members or competing risks.
Stratified models, marginal models and frailty models are discussed.
Syllabus:
Survival data. Censoring. Survival, hazard and cumulative hazard functions. Kaplan-Meier
plots. Parametric models and likelihood construction. Coxs proportional hazards model,
partial likelihood
Time-dependent covariates. Diagnostic methods.
Residual analysis.
Testing the
proportional hazards assumption
Competing risks data, cause-specific hazard and cumulative incidence functions
Stratified models, marginal models, frailty models
Bibliography:
Collett, D. Modelling Survival Data in Medical Research. Chapman and Hall/CRC, 2003.
Hougaard, P. Analysis of Multivariate Survival Data. Springer, 2000.
Therneau, TM and Grambsch, PM. Modelling Survival Data: Extending the Cox Model.
Springer, 2000.
Pintilie, M. Competing Risks: A Practical Perspective. Wiley, 2006.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Credits:
Term:
Prerequisites:
MSCI523
Forecasting
Dr Sven Crone
Coursework (100%)
10
L1, L2
(MATH551 and MATH552) OR (CFAS406 and CFAS440)
The module introduces time series and causal forecasting methods so that passing students will be
able to prepare methodologically competent, understandable and concisely presented reports for
clients. By the end of the course, students should be able to model causal and time series models,
assess their accuracy and robustness and apply them in a real world problem domain.
Syllabus:
Introduction to Forecasting in Organisations: Extrapolative vs. Causal Forecasting; History
& academic research in Forecasting; Forecasting case studies.
Data Exploration: Time Series Patterns; Univariate & Multivariate Visualisation; Nave
Forecasting Methods & Averages.
Exponential Smoothing Methods: Single, Seasonal & Trended Exponential Smoothing;
Model Selection; Parameter Selection.
ARIMA Methods: AR-, MA-, ARMA and ARIMA Models; ARIMA Model specification &
estimation; Automatic selection.
Time Series Regression : Simple & multiple regression on time series; Hypothesis testing;
Model evaluation; Diagnostics
Time Series Regression: Model specification and constraints; Dummy Variables, Lag,
Non-linearities; Stationarity; Building regression models.
Applications in operations and marketing.
Judgmental Forecasting: Judgmental methods for forecasting; Biases and heuristics.
Bibliography:
Ord K. & Fildes R. (2013), Principles of Business Forecasting, South-Western Cengage
Learning.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Credits:
Term:
Prerequisites:
MSCI526
Data Mining for Marketing, Sales and Finance
Dr Nicos Pavlidis
Coursework (100%)
10
L1, L2
(MATH551 and MATH552) OR (CFAS406 and CFAS440)
The course extends the concepts of statistical model building and the models from the Introductory
Statistics module towards methods from machine learning and artificial intelligence.
By the end of the course you should be able to:
Understand general modelling concepts in relation to complex models
Use a wide range of data mining methods to handle data of different types & applications
Understand how these methods may be applied in practical management contexts
Use & apply SAS Enterprise Miner to deal with complexity and large datasets
Syllabus:
Introduction to Data Mining
Data Mining Process: Methods for data exploration & manipulation; Methods for data
reduction & feature selection; Evaluating Classification Accuracy.
Data Mining Methods for Classification: Logistic Regression; Decision Trees; Nearest
neighbour classification; Artificial Neural Networks.
Data Mining applications in Credit Scoring
Bibliography:
Tan, P. N., M. Steinbach, et al. (2005). Introduction to data mining. Boston, Pearson Addison
Wesley.
Berry, M. J. A. and G. Linoff (2000). Mastering data mining: the art and science of customer
relationship management. New York, NY [u.a.], Wiley Computer Publ.
Berry, M. J. A. and G. Linoff (2004). Data mining techniques: for marketing, sales, and
customer relationship management. Indianapolis, Ind., Wiley Pub.
Linoff, G. and M. J. A. Berry (2001). Mining the Web: transforming customer data into
customer value. New York, John Wiley & Sons.
Weiss, S. M. and N. Indurkhya (1998). Predictive data mining: a practical guide. San
Francisco, Morgan Kaufmann Publishers.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Credits:
Term:
Prerequisites:
MSCI534
Optimisation and Heuristics
Dr Adam Letchford
Coursework (30%), Exam (70%)
10
L1, L2
(MATH551 and MATH552) OR (CFAS406 and CFAS440)
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
SCC401
Elements of Distributed Systems
Dr. Amit Chopra and Dr. Barry Porter
100% Coursework
In this module we explore the fundamental principles, techniques and technologies that underpin
today's global IT infrastructure. It is one of two complementary modules that comprise the Systems
stream of the Data Science MSc, which together enable students to assess new systems
technologies, to know where technologies fit in a comprehensive schema, and to know what to read
to go deeper.
The principal ethos of the module is to focus on the properties of system components, with the aim
of encouraging a principled understanding of the strengths, weaknesses, scalability and bottlenecks
of systems components. This will enable graduates of the MSc to be able to make intelligent and
well-reasoned trade-offs between fundamental building blocks of distributed systems in todays IT
infrastructure.
Further, the course will review state of the art thinking regarding algorithms, and technologies
behind such architectures, placing these within the framework of the current research agenda.
Topics to be covered will include:
The module will cover two key areas, fundamental techniques/ principles and fundamental
technologies/ paradigms. Fundamental techniques/ principles will include coverage of the following:
Caching
Tiering
Replication
Synchronization
Failure
Reliability
Fundamental technologies/ paradigms will include coverage of the following:
Interaction paradigms in distributed systems
Peer-to-peer architecture
Scalable and high-performance networking
Scalable and enterprise storage
Data acquisition (e.g. sensor networks)
Enterprise computing and scalable processing
Large scale distributed information systems (e.g. high-performance web architectures)
High performance computer clusters, grid architectures
Enterprise security
Organisational impacts (e.g. data protection, security)
On successful completion of this module students will:
Demonstrate a deep understanding of the fundamental elements that underpin state of the
art enterprise distributed systems;
Describe and critically evaluate core techniques and paradigms used within enterprise IT
systems;
Understand and appreciate the trade-offs, strengths and limitations of systems elements in
principle and practice in modern IT systems.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Module Details:
SCC411
Systems Architecture and Integration
Dr. Barry Porter and Dr Amit Chopra
100% Coursework
In this module we explore the architectural approaches, techniques and technologies that underpin
today's global IT infrastructure and particularly large-scale enterprise IT systems. It is one of two
complementary modules that comprise the Systems stream of the Data Science MSc, which together
provide a broad knowledge and context of systems architecture enabling students to assess new
systems technologies, to know where technologies fit in the larger scheme of enterprise systems and
state of the art research thinking, and to know what to read to go deeper.
The principal ethos of the module is to focus on the principles, emergent properties and application
of systems elements as used in large-scale and high performance systems. Detailed case studies and
invited industrial speakers will be used to provide supporting real-world context and a basis for
interactive seminar discussions.
Topics to be covered will include:
Supported by a consideration of emerging issues and implications arising from these new
technologies:
Commercial considerations
Legal and ethical considerations
New development and support paradigms, including open sourcing
In addition to the discussion and seminar led aspects of the course, we envisage hands-on
measurement-based coursework that looks empirically at the scalability of a significant technology,
e.g. a cloud system such as Amazon EC2.
On successful completion of this module students will:
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Module Details:
SCC413
Applied Data Mining
Dr. Plamen Angelov and Matthew Rowe
100% Coursework
This module will provide students with up-to-date information on current applications of data in
both industry and research. The module will build on SCC.403 Fundamentals of Data by explaining
how data is processed and applied at large-scale across a variety of different areas.
Topics to be covered will include:
The Semantic Web: primer, crawling and spidering Linked Data, open-track large-scale
problems (e.g. Billion Triples Challenge), distributed and federated querying, distributed
reasoning, ontology alignment.
The Social Web: primer, user-generated content and crowd-sourced data, social networks
(theories, analysis), recommendation (collaborative filtering, content recommendation
challenges, and friend recommendation/link prediction).
The Scientific Web: from big data to bid science, open data, citizen science, and case studies
(virtual environmental observatories, collaboration networks).
Scalable data processing: primer, scaling the semantic web (scaling distributed reasoning
using MapReduce), scaling the social web (collaborative filtering, link prediction), and
scalable network analysis for the scientific web.
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
SCC460
Data Science Fundamentals
Dr Matthew Rowe
100% Coursework
20 hours
15
M2, L1, L2
This module teaches students about how data science is performed within academic and industry
(via invited talks), research methods and how different research strategies are applied across
different disciplines, and data science techniques for processing and analysing data. Students will
engage in group project work, based on project briefs provided by industrial speakers, within multiskilled teams (e.g. computing students, statistics students, environmental science students) in order
to apply their data science skills to researching and solving an industrial data science problem.
Topics covered will include:
The role of the data scientist and the evolving epistemology of data science.
The language of research, how to form research questions, writing literature reviews, and
variance of research strategies across disciplines.
Ethics surrounding data collection and re-sharing, and unwanted inferences.
Identifying potential data sources and the data acquisition processes.
Defining and quantifying biases, and data preparation (e.g. cleaning, standardisation, etc.).
Choosing a potential model for data, understanding model requirements and constraints,
specifying model properties a priori, and fitting models.
Inspection of data and results using plots, and hypothesis and significance tests.
Writing up and presenting findings.
Learning: Students will learn through a series of group exercises around research studies and
projects related to data science topics. Invited talks from industry tackling data science problems will
be given to teach the students about the application of data science skills in industry and academia.
Students will gain knowledge of:
Assessment: Assessment is comprised of 50% group work project and 50% individual project
proposal. The group project will involve working on a given data science problem provided by
industry.
Recommended texts and other learning resources:
O'Neil. C., and Schutt. R. (2013) Doing Data Science: Straight Talk from the Frontline.
OReilly.
Trochim. W. (2006) The Research Methods Knowledge Base. Cenage Learning
Module Mnemonic:
Module Title:
Module Convenor:
Assessment:
Duration:
Credits:
Term:
SCC461
Programming for Data Science
Dr Matthew Rowe, Dr Jun Zhao, Stuart Sharples
Coursework (50%), End of module report (50%)
25 hours
15
M1, M2, L1
This module aims to provide students with the necessary programming skills to statistically process
and explore disparate datasets using R, to become confident in using this language to create and
analyse variables in order to discover patterns and relationships through the use of visualisation,
testing and modelling. It also aims to provide students with experience in using object-oriented
programming concepts and principles to read in data from both local files and databases so that it
can be merged together, using record-reconciliation techniques, and then output this into a single
file for processing; this will be taught using the object-oriented programming language Java. The
teaching of both Java and R is essential here as the former is well-suited to handling data, via the
creation of bespoke data objects, while the latter is good for statistically assessing data.
Students will gain experience of working through exercise tasks and discussing their work with their
peers; thereby fostering interpersonal communications skills - in particular in how students
approach the tasks and their proposed solutions. Students will also gain an understanding of how to
conceptualise a problem using available programming semantics, and determine the appropriate
level of abstraction required in their solutions.
Topics covered will include:
Principles of object-oriented programming (e.g. variables, abstraction, inheritance,
polymorphism);
Using R to manipulate data sets and to create features from data, while becoming familiar
with at a more technical level with its data objects;
Summarise and visualise data using custom functions and third-party libraries;
Using Java to read in data from disparate sources and write data to local files;
Performing record-reconciliation to merge heterogeneous datasets together;
Producing clean, friendly, reusable code.
On successful completion of this module students will be able to:
Use R to manipulate clean data sets and to create features from data;
Have a understanding of the different data objects that exist in R and how to interact with
them;
Create and maintain custom functions and scripts which facilitate methodical data analysis;
Produce data visualisations for both exploration purposes and for inclusion in reports;
Read in data from local files and databases using Java;
Identify the correct level of abstraction to model a system or process in Java and construct
the necessary classes to represent that;
Choose and apply an appropriate record-reconciliation approach for integrating data
together.
Bibliography:
Introductory statistics with R. Dalgaard, Peter. Springer, 2008. ISBN-13: 978-0387954752
R Cookbook. Paul Teetor. O'Reilly Media; 1 edition. 2011. ISBN-13: 978-0596809157.
Java, A Beginner's Guide, 5th Edition. Herbert Schildt. McGraw-Hill Osborne. 2011. ISBN-13:
978-0071606325.
Lent Term:
Summer Term:
Lent Term:
Summer Term: