Meeting time & location: MWF 2:20 pm to 3:50 pm at WH-100E.
Office hours: Thursday 10 to 11 am excluding holidays. I should usually be there but you are recommended to email me to confirm just in case.
This course is a 4-credit course, which means that in addition to the scheduled lectures/discussions, students are expected to do at least 9.5 hours of course-related work each week during the semester. This includes things like: completing assigned readings, participating in lab sessions, studying for tests and examinations, preparing written assignments, completing internship or clinical placement requirements, and other tasks that must be completed to earn credit in the course.
Prerequisite
I assume that you have knowledge of (Advanced) Linear Algebra and Statistical Inference (or Mathematical Statistics). Knowledge about Linear Regression models is recommended.
Topics
Reviews of linear algebra and optimization techniques neccessary for the course.
Multivariate statistical analysis: random vectors and matrices, sample mean and sample covariance, the multivariate normal distribution, the multivariate Central Limit Theorem, assessing normality and outlier detection, the Hotelling's T square, the confidence ellipsoid, simultaneous confidence intervals, Bonferroni methods, the multivariate analysis of variance (MANOVA), the multivariate linear regression.
Modern multivariate techniques: principal component analysis, factor analysis, canonical correlation analysis, negative matrix factorization, independent component analysis, multidimensional scaling, classification methods including linear and quadratic discriminant analyses, clustering methods including K-means, hierarchical clustering, and Gaussian mixture models.
Advanced topics, if time permitting: Gaussian process regression, multiple testing, Gaussian graphical models, multivariate methods for high-dimensional data.
Learning Outcomes
Process and visualize different data types.
Identify and evaluate appropriate data analytics techniques to be used.
Understand the underlying mechanism of multivariate models and evaluate and interpret such models.
Use analytical tools and software widely used in practice.
Learn to present and communicate the findings effectively.
Recommended Texts
The required text is Johnson & Wichern 2007. Härdle & Simar 2012 is also recommended.
Elementary
Johnson, Richard A & Wichern, Dean W. 2007. Applied multivariate statistical analysis. Upper Saddle River, N.J: Pearson Prentice Hall.
Härdle, Wolfgang & Simar, Léopold. 2012. Applied multivariate statistical analysis. Berlin: Springer (also visit this site (http:www.quantlet.de) for sample codes; search “MVA”). There is a newer (4th) edition which should work as well.
Advanced and applied
Izenman, Alan Julian. 2013. Modern multivariate statistical techniques: Regression, classification, and manifold learning. New York: Springer New York. Book Home Page (including R, S-plus and MATLAB code and data sets)
Hastie, Trevor, Tibshirani, Robert, and Friedman, J. H. 2009. The elements of statistical learning: Data mining, inference, and prediction. New York, NY: Springer New York.
James, Witten, Hastie and Tibshirani, 2014. An Introduction to Statistical Learning with Applications in R. Book Home Page. The PDF file of the book can be downloaded for free. There is also a R library for this book.
Theoretical
Anderson, T. W. 2003. An introduction to multivariate statistical analysis. Hoboken, N.J: Wiley-Interscience.
Muirhead, Robb J. 1982. Aspects of multivariate statistical theory. New York: Wiley.
Working with R or SAS
https:r4ds.had.co.nz/ Grolemund, Garrett & Wickham, Hadley. R for Data Science.
Everitt, Brian, and Hothorn, Torsten. 2011. An introduction to applied multivariate analysis with R. New York: Springer.
Khattree, Ravindra, and Naik, Dayanand N. 1999. Applied multivariate statistics with SAS software. Cary, NC: SAS Institute.
Khattree, Ravindra, and Naik, Dayanand N. 2000. Multivariate data reduction and discrimination with SAS software. Cary, NC: SAS Institute.
Brightspace
Brightspace will only be used for recording grades on assignments and exams and for distributing solutions. The code and lecture notes can also be found on the Brightspace.
Grading
Homework (35%): biweekly.
Midterm exam (35%): a midterm exam focusing on the theoretical part of the course will be administered.
Course project (25%): The project will involve data analysis using multivariate techniques. The data set for the project will be provided by the instructor. In your project report, you summarize results from your data analysis and describe findings in the style of a research article.
Lecture attendance and participation (5%): meaningful actitivities (ask and answer questions) on piazza also count.