1. (Specific Q.) SAS Code for calculating a covariance matrix
I am trying to understand the following code for drawing Q-Q plot graph. At the 9th line, Xd is defined with X – (one / n)` * X. I think this calculation is strange because the matrix rank is not equal for the first two matrixes. For instance, X is 55x7 matrix, and (one/n) is 55x55 containing 1/55 for each element. The transpose matrix of (one/n) is also 55x55 containing 1/55 for each element. Thus, the matrix calculation is (55x7) – (55x55) * (55x7). However, in the (55x7) – (55-55) calculation, how about the other 55x48 elements? 55x7 matrix has the difference between X and (1/55), but another 55x48 has -1/55.
X – (one/n)` * X
(55x7) – (55x55) * (55x7)
In addition, I want to know what the meaning of Xd. From your comment of “mean-centered data matrix”, I expect the deviation of each element from the mean. However, I have different values. For example, I imagine X matrix as (7x3) matrix.
X =
1 3 6
2 2 1
3 4 4
4 6 2
5 2 9
6 5 4
7 9 5
(one/n)` =
1/7 1/7 1/7 1/7 1/7 1/7 1/7
1/7 1/7 1/7 1/7 1/7 1/7 1/7
1/7 1/7 1/7 1/7 1/7 1/7 1/7
1/7 1/7 1/7 1/7 1/7 1/7 1/7
1/7 1/7 1/7 1/7 1/7 1/7 1/7
1/7 1/7 1/7 1/7 1/7 1/7 1/7
1/7 1/7 1/7 1/7 1/7 1/7 1/7
X – (one/n) =
0.86 2.86 5.86 -0.14 -0.14 -0.14 -0.14
1.86 1.86 0.86 -0.14 -0.14 -0.14 -0.14
2.86 3.86 3.86 -0.14 -0.14 -0.14 -0.14
3.86 5.86 1.86 -0.14 -0.14 -0.14 -0.14
4.86 1.86 8.86 -0.14 -0.14 -0.14 -0.14
5.86 4.86 3.86 -0.14 -0.14 -0.14 -0.14
6.86 8.86 4.86 -0.14 -0.14 -0.14 -0.14
X – (one/n) * X =
21.00 28.57 28.57
5.00 9.57 13.57
19.00 28.57 33.57
18.00 27.57 33.57
32.00 50.57 63.57
24.00 39.57 52.57
36.00 54.57 66.57
However, I expect the following the deviation values.
-3 -1.428571429 1.571428571
-2 -2.428571429 -3.428571429
-1 -0.428571429 -0.428571429
0 1.571428571 -2.428571429
1 -2.428571429 4.571428571
2 0.571428571 -0.428571429
3 4.571428571 0.571428571
Also, I saw two kinds of SAS code to calculate covariance matrix. One is this direct method to calculate covariance using Xd and S. The other is the indirect method to borrow the covariance matrix of result from “proc corr”. Why do you directly calculate a covariance matrix rather than borrow the result from “proc corr” in this example?
/* sas program for generating data for chi-square q-q plots */
1 %let inputdata = isqs6348.t1_7; /* this line must be edited */
2 %let varlist = m100 m200 m400 m800 m1500 m3000 marathon ; /* this line must be edited */
3 proc iml;
4 use &inputdata;
5 read all var { &varlist } into X;
6 n = nrow(X);
7 p = ncol(X);
8 One = J(n,n,1); /* just a n x n square matrix full of 1s (nxn)*/
9 Xd = X - (One / n)` * X; /* mean-centered data matrix (nxp)*/
10 S = (1 / (n-1)) * Xd`*Xd; /* covariance matrix (pxp) */
11 Sinv = inv(S);
12 chisq = j(n,1,0);
13 do i = 1 to n;
14 chisq
= Xd[i,] * Sinv * Xd[i,]`; /*Distance from obs i to the mean */
15 end;
16 probs = (rank(chisq) - j(n,1,.5))/n; /* contains (r-.5)/n values */
17 quants = 2*gaminv(probs, p/2); /* contains chi-square quantiles */
18 plotdata = quants||chisq;
19 create chisqqdata(rename=(col1=chiquant col2=distsq)) from plotdata;
20 append from plotdata;
30 quit;
2. (General Q.) Variance vs Eigenvalue and Eigenvector
In the last class, we studied that variance has a crucial role for two-sample test. For example, in the univariate two-sample test, we suppose that group 1 and group 2 have the same variance, sigma. Also, in the multivariate two-sample test, we suppose that two groups have the same number of variables and the same variance, capital sigma. However, I think that this kind of test has a problem. For example, consider a univariate case and the following graph which (a) has mean = 5 and variance = 0.4, (b) has mean = 5 and variance = 0.4, so two groups have the same mean and variance except for the direction.
Although two samples have different shapes, the null hypothesis cannot be rejected because sample-test only considers the difference of means of two groups. I think it will be the same result in the case of multivariate sample test. On the contrary, eigenvalue and eigenvector of covariance matrix can show the length and direction of the matrix. So, is there any way or test to compare eigenvalues between two groups, and to compare eigenvectors between two groups? Or, is there any test using engienvalue and engenvector? I think that the test using eigenvalue and eigenvector would be more robust than the test using variance only.