Experiments with sparse regression models
We will be generating data from regression models in order to understand properties of various estimators and procedures. Our basic framework requires to generate p predictors in matrix x and a target variable y as follows:
xi ∼ Np (0,S) (1)
yi = β1xi1 + … + βpxip + εi , εi ∼ N(0, σ2 ) (2)
for i = 1, …, n, where S{jk} = ρ |j−k| for some correlation level −1 ≤ ρ ≤ 1 and for elements j, k ∈ {1, …, p}.
(Hint: make sure you are thorough enough and explore the effect of various choices n, p, σ2 , ρ)
HEALTH WARNINGS:
❼ I won’t accept a sloppy copy-paste of a million tables without structure, motivation and scientific structure. Your main task is to build a story and explain what works and what doesn’t, in a structured and thorough way. Your report should be scientific and evidence based, and not opinion or intuition-based like a newspaper article or a blog piece.
❼ You should submit all your code in clear and reproducible form. I won’t accept use of build-in functions (other than the functions for lasso/elastic net).
❼ You can use MATLAB, Python or R. I can read other languages, but it will be harder for me to run your code and replicate things, so you are advised NOT to work in C++, Java, Stata etc.
References
[1] Kapetanios, G., Labhard, V. and Price, S. (2008) Forecasting Using Bayesian and Information-Theoretic Model Averaging, Journal of Business & Economic Statistics,26(1), 33-41.
[2] Pesaran, M.H. and Timmermann, A. (1995), Predictability of Stock Returns:
Robustness and Economic Significance. The Journal of Finance, 50, 1201-1228.
[3] Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267-288.