Course name: Model Selection in High Dimensions
Course code: MESA256
Credit hours: 4.00
Objective
The learning objectives are for the student to 1) acquire the necessary fundamental notions for model selection such as out-of-sample validity, prediction error, over- and under-fitting, etc. 2) to study regularized regression methods such as the (adaptive) lasso, MCP, SCAD, elastic net, 3) practice algorithms/methods such as cross-validation, CART, LARS, OMP, SIS, stepwise/streamwise/stagewise search, 4) to put into practice the methods through the analysis of a (big) dataset (project)
Description
Model selection in high dimensions is an active subject of research, ranging from machine learning and/or artificial intelligence algorithms, to statistical inference, and sometimes a mix of the two. We focus on the frequentist approach to model selection in view of presenting methods that have the necessary properties for out-of-sample (or population) validity, within an as large as possible theoretical framework that enables the measurement of different aspects of the validity concept. We therefore anchor the content into an inferential statistics approach, essentially for causal models.
More specifically, the focus of model selection in high dimensions is presented into two main headings, one on statistical methods or criteria for measuring the statistical validity, and the other one on fast algorithms in high dimensional settings, both in the number of observation and in the number of inputs, that avoid the simultaneous comparison of all possible models.
The models are mainly the linear regression and the logistic regression (for classification) and the domains of applications range from economics to medical sciences.
An important part of the class is devoted to the practice of model selection in high dimensions methods, using R packages, that includes a semester project that is used for the final evaluation.