Ecco la dimostrazione manuale, o povero, della dimostrazione:
> set.seed(0)
> # The correlation matrix
> corr_matrix = matrix(cbind(1, .80, .2, .80, 1, .7, .2, .7, 1), nrow=3)
> nvar = 3 # Three columns of correlated data points
> nobs = 1e6 # One million observations for each column
> std_norm = matrix(rnorm(nvar * nobs),nrow=nobs, ncol=nvar) # N(0,1)
Corr = ⎡⎣⎢1.8.2.81.7.2.71⎤⎦⎥
N = ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢[ 1 , ][ 2 , ]⋮⋮[ 999999 , ][ 1000000 , ][ , 1 ]- 1.0806338- 1.1434241⋮⋮0.4861827- 0.4394551[,2]0.6563913−0.1729738⋮⋮0.035630061.69265517[,3]0.8400443−0.9884772⋮⋮−2.1176976−1.9534729⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
1. SVD METHOD:
⎡⎣⎢⎢⎢U[3×3]Σ0.5⎡⎣⎢⎢⎢⎢⎢⎢d1√000d2√000d3√⎤⎦⎥⎥⎥⎥⎥⎥NT[3×106]⎤⎦⎥⎥⎥T
> ptm <- proc.time()
> # Singular Value Decomposition method:
> svd = svd(corr_matrix)
> rand_data_svd = t(svd$u %*% (diag(3) * sqrt(svd$d)) %*% t(std_norm))
> proc.time() - ptm
user system elapsed
0.29 0.05 0.34
>
> ptm <- proc.time()
2. CHOLESKY METHOD:
⎡⎣⎢⎢⎢⎢⎢⎢Ch⎡⎣⎢⎢c11c21c310c22c3200c33⎤⎦⎥⎥NT[3×106]⎤⎦⎥⎥⎥⎥⎥⎥T
> # Cholesky method:
> chole = t(chol(corr_matrix))
> rand_data_chole = t(chole %*% t(std_norm))
> proc.time() - ptm
user system elapsed
0.25 0.03 0.31
Thank you to @userr11852 for pointing out to me that there is a better way to calculate the difference in performance between SVD and Cholesky, in favor of the latter, using the function microbenchmark
. At his suggestion, here is the result:
microbenchmark(chol(corr_matrix), svd(corr_matrix))
Unit: microseconds
expr min lq mean median uq max neval cld
chol(corr_matrix) 24.104 25.05 28.74036 25.995 26.467 95.469 100 a
svd(corr_matrix) 108.701 110.12 116.27794 111.065 112.719 223.074 100 b