The package vbdvsarmadillo
successfully replicates the simulation expereiment of Variational Bayes Dynamic Variable Selection (VBDVS) proposed by Koop and Korobilis (2020) in optimized Rcpp code based on the C++ Armadillo Linear Algebra template. Computational efficiency can be significantly improved and the speed improvement gained from this optimization even beat MATLAB (which is actually the benchmark for optimized matrix computation) slightly once all the underlying computation is implemented under the Microsoft R Open distribution. Microsoft R Open as the enhanced R distribution is available for free either through the link provided or through your Microsoft Azure account.
The basic DGP used for Monte Carlo experiment is described as following, which is the same as Koop and Korobilis (2020). \[\begin{align} y_{t}~&=~\beta_{1t} x_{1 t}+\beta_{2t}x_{2 t}+\ldots+\beta_{pt}x_{p t}+\sigma_{t} \varepsilon_{t}, ~ \varepsilon_{t} \sim \mathcal{N}(0,1)\\[2mm] x_{j,t}~&\sim~\mathcal{N}(0,1),~j=1,\ldots,p\\[2mm] \beta_{j, t}~&=~s_{j, t} \times \theta_{j, t},~\text{$s_{j,t}$ is $0$-$1$ indicator variable}\\[2mm] \theta_{j, t}~&=~\underline{\theta}_{j}+\underline{\rho}\left(\theta_{j, t-1}-\underline{\theta}_{j}\right)+\underline{\delta} \eta_{j, t}, ~\eta_{j, t} \sim \mathcal{N}(0,1)\\[2mm] \log \left(\sigma_{t}^{2}\right)~&=~\underline{\sigma}^{2}+\underline{\phi}\left(\log \left(\sigma_{t-1}^{2}\right)-\underline{\sigma}^{2}\right)+\underline{\xi} \zeta_{t}, ~\zeta_{t} \sim \mathcal{N}(0,1)\\[2mm] \theta_{j, 0}~&=~\underline{\theta}_{j}, \quad \log \left(\sigma_{0}^{2}\right)=\underline{\sigma}^{2} \end{align}\]
library(vbdvsarmadillo)
library(magrittr)
# Basic usage is as following
x <- read.csv("C:/AY2020-2021/Codes Library/MATLAB/tvp_bayes_matlab/new_100_x.csv",
header = F)
x <- as.matrix(x)
y <- read.csv("C:/AY2020-2021/Codes Library/MATLAB/tvp_bayes_matlab/new_100_y.csv",
header = F)
y <- y[, 1]
tic <- Sys.time()
res <- VBDVS_armadillo(list(y = y, x = x, maxiter = 100,
prior = 2, q = 0, delta = 0.9, c0 = 25, h0 = 12, g0 = 1))
toc <- Sys.time()
toc - tic
#> Time difference of 10.57869 secs
# source('sptvpsv_reg_dgp.R')
We can actually see from the implementation contained in the above chunk that computational time has improved significantly to about 10 secs and sometimes it can even be boosted to around 8 secs under the optimized R ML server distribution, which is freely available as well. Moreover, if you have CPU equipped with more cores, this improvement will increase much more significantly.
We can now jointly demonstrate how data is generated using the function sptvpsv_reg_dgp
included in this package vbdvsarmadillo
and then apply VBDVS_armadillo
, another function contained in vbdvsarmadillo
to implement the VBDVS algorithm in the following chunk.
# DGP
Tt <- 200 # Number of observations to generate
p <- 200 # Number of predictors to generate
s <- matrix(1, Tt, p)
s[round(Tt/3):Tt, 1] <- 0
s[round(Tt/2):Tt, 3] <- 0
s[1:round(Tt/2), 4] <- 0
s[, 5:p] <- 0
theta0 <- c(-1.7, 2.9, 1.4, -2.3, rep(0, p - 4))
mu <- s[1, ] * theta0
param <- list()
param[["s"]] <- s
param[["theta0"]] <- theta0
param[["sigma0"]] <- 0.1
param[["mu"]] <- mu
dgp_components <- sptvpsv_reg_dgp(Tt, p, param)
y <- dgp_components$y
x <- dgp_components$x
# Implementing VBDVS
tic <- Sys.time()
res <- VBDVS_armadillo(list(y = y, x = x, maxiter = 100,
prior = 2, q = 0, delta = 0.9, c0 = 25, h0 = 12, g0 = 1))
toc <- Sys.time()
toc - tic
#> Time difference of 52.17439 secs
# Partially demonstrate the results
res$tv_probs[1:10, 1:10] # The initial inclusion probability
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 1 1 0.0005878838 0.0005878838 0.0005878838 0.0005878838
#> [2,] 1 1 0.0005878847 0.0005878845 0.0005878855 0.0005878843
#> [3,] 1 1 0.0005878820 0.0005878820 0.0005878820 0.0005878820
#> [4,] 1 1 0.0004759708 0.0004759650 0.0004759668 0.0004759660
#> [5,] 1 1 0.0004759715 0.0004759642 0.0004759641 0.0004759680
#> [6,] 1 1 0.0004759638 0.0004759640 0.0004759639 0.0004759638
#> [7,] 1 1 0.0004759690 0.0004759715 0.0004759645 0.0004759641
#> [8,] 1 1 0.0004759662 0.0004759696 0.0004759639 0.0004759664
#> [9,] 1 1 0.0004759640 0.0004759683 0.0004759749 0.0004759641
#> [10,] 1 1 0.0004208790 0.0004208763 0.0004208758 0.0004208784
#> [,7] [,8] [,9] [,10]
#> [1,] 0.0005878838 0.0005878838 0.0005878838 0.0005878838
#> [2,] 0.0005878843 0.0005878843 0.0005878843 0.0005878846
#> [3,] 0.0005878820 0.0005878820 0.0005878820 0.0005878820
#> [4,] 0.0004759671 0.0004759640 0.0004759651 0.0004759640
#> [5,] 0.0004759663 0.0004759639 0.0004759639 0.0004759669
#> [6,] 0.0004759639 0.0004759638 0.0004759639 0.0004759640
#> [7,] 0.0004759708 0.0004759675 0.0004759639 0.0004759652
#> [8,] 0.0004759704 0.0004759640 0.0004759723 0.0004759656
#> [9,] 0.0004759670 0.0004759833 0.0004759779 0.0004759643
#> [10,] 0.0004208890 0.0004208761 0.0004208773 0.0004208845
res$tv_probs[191:200, 1:10] # The last inclusion probability
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0.0001020304 1 0.0001020306 1 0.0001020306 0.0001020305
#> [2,] 0.0001020304 1 0.0001020304 1 0.0001020305 0.0001020309
#> [3,] 0.0001020365 1 0.0001020345 1 0.0001020318 0.0001020307
#> [4,] 0.0001020362 1 0.0001020308 1 0.0001020474 0.0001020305
#> [5,] 0.0001020316 1 0.0001020329 1 0.0001020307 0.0001020306
#> [6,] 0.0001020461 1 0.0001020348 1 0.0001020499 0.0001020334
#> [7,] 0.0001020425 1 0.0001020322 1 0.0001020335 0.0001020314
#> [8,] 0.0001020418 1 0.0001020405 1 0.0001020331 0.0001020341
#> [9,] 0.0001020387 1 0.0001020525 1 0.0001020436 0.0001020464
#> [10,] 0.0001020306 1 0.0001020306 1 0.0001020312 0.0001020305
#> [,7] [,8] [,9] [,10]
#> [1,] 0.0001020310 0.0001020307 0.0001020305 0.0001020305
#> [2,] 0.0001020332 0.0001020309 0.0001020304 0.0001020304
#> [3,] 0.0001020305 0.0001020307 0.0001020305 0.0001020305
#> [4,] 0.0001020343 0.0001020345 0.0001020333 0.0001020305
#> [5,] 0.0001020335 0.0001020310 0.0001020320 0.0001020323
#> [6,] 0.0001020421 0.0001020392 0.0001020320 0.0001020315
#> [7,] 0.0001020535 0.0001020307 0.0001020417 0.0001020394
#> [8,] 0.0001020337 0.0001020310 0.0001020450 0.0001020415
#> [9,] 0.0001020452 0.0001020746 0.0001020706 0.0001020385
#> [10,] 0.0001020310 0.0001020305 0.0001020304 0.0001020305
In this section we successfully replicate the main Monte Carlo Experiment carried in Koop and Korobilis (2020). In this figure, we demonstrate how coefficients are estimated (medians over 100 Monte Carlo iterations) over time and the corresponding true values used for generating data as well. Cyan solid lines refer to coefficients fitted from the variational bayes algorithm and black long-dashed lines refer to true values used to generate data. The sub-figure contained in each panel represents how the related values evolve over time. And grey areas refer to the \(84\%\) and \(16\%\) quantiles (over 100 Monte Carlo iterations) respectively. The length of time-series observations is \(T=200\) and the total number of covariates is \(p=200\).
In this figure we demonstrate how the posterior probabilities assigned to the first \(20\) covariates evolve over time. The length of time-series observations is \(T=200\) and the total number of covariates is \(p=200\).
Time-varying inclusion probability
All the corresponding calculation now is shifted to High Performance Computational Cluster (HPCC), but underlying calculation is still based on the previously discussed procedures implemented in vbdvsarmadillo
. Hence we do not demonstrate codes in chunk.
Koop, Gary, and Dimitris Korobilis. 2020. “Bayesian Dynamic Variable Selection in High Dimensions.” https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3246472.