Skip to main content


Jason Moore,

Do you want to know why Google, Facebook and Amazon are worth so much money? 

Living in the information age, we are awash with data. Our lives are recorded digitally in minute details by the devices with which we interact, often on a second-by-second basis - where we are, what information we seek, what we create, what we buy, with whom we communicate, etc. Additionally, we have the means to gather data in unprecedented quantities relating to any question in which we have interest, and to store it in perpetuity, readily accessible to anyone with an internet connection. These reams of data can answer a huge range of questions of fundamental interest, if we can translate the data into terms that we can understand. With the right data, we can create tailored cancer therapies for individuals based on their genetics, we can predict the outcome of elections ahead of time with a 98% accuracy, and we can describe the fundamental processes sculpting the world around us in unprecedented detail. 

In this course we will learn many of the techniques that we can use to ask and answer questions of datasets that are far too vast for the human mind to be able to comprehend in toto. Using the freely available statistical software R and similarly freely available online datasets, we will see the power of computer-driven multivariate statistical analyses. With this newly gained knowledge and tools, you will find a dataset of your own, pose some hypotheses, analyse your data and draw some completely new insights into the world around us. 

The societal issues associated with big data are also complex - from the recent revelations about NSA and GCHQ data collection from innocent citizens to the fact that credit card companies are able to predict both pregnancy and due date before a mother knows. We will debate these as we begin to understand the breadth and power of big data analyses.

The course will use online tutorials in the statistical programming language R. Peer reviewed articles providing background to the introductory exercises will be provided. The majority of the course will be students individual projects, and students will be responsible for researching and reading the relevant literature.

Students must attend all classes and participate actively. We will learn how to analyse large datasets using the statistical programming language R. Be prepared to delve into its depths! 

Students will undertake a series of exercises at the beginning of the course to familiarise themselves with R, and the manipulation of large datasets. Each exercise will be written up for credit. 

A long paper investigating the ethical issues raised by a chosen aspect of big data will be due half way through the semester. 

The second half of the semester will comprise a student-driven research project in an area of your choice. You will be expected to locate a suitably big dataset, and subject it to analysis using the techniques that you have learnt during the class, and to present a formal write-up of your analyses and conclusions.



Jason Moore earned his Ph.D. from the University of Cambridge in 2006, with a focus on the reconstruction of vertebrate palaeoecological patterns. Subsequently he has worked on a large number of studies examining ecology in the past, both from recent assemblages (i.e. bones from the Yellowstone River in Montana and shells from Baja California) and fossil assemblages (from a range of time periods in the US and India). Dr. Moore is fascinated by both the complexity of teasing ecological information from the past, and by the amazing insight that can be gleaned with the correct techniques.