26
$\begingroup$

In a least square regression algorithm, I have to do the following operations to compute regression coefficients:

  • Matrix multiplication, complexity: $O(C^2N)$
  • Matrix inversion, complexity: $O(C^3)$
  • Matrix multiplication, complexity: $O(C^2N)$
  • Matrix multiplication, complexity: $O(CN)$

where, N are the training examples and C is total number of features/variables.

How can I determine the overall computational complexity of this algorithm?

EDIT: I studied least square regression from the book Introduction to Data Mining by Pang Ning Tan. The explanation about linear least square regression is available in the appendix, where a solution by the use of normal equation is provided (something of the form $a=(X^TX)^{-1}X^Ty)$, which involves 3 matrix multiplications and 1 matrix inversion).

My goal is to determine the overall computational complexity of the algorithm. Above, I have listed the 4 operations needed to compute the regression coefficients with their own complexity. Based on this information, can we determine the overall complexity of the algorithm?

Thanks!

  • 0
    Could you expand further? You give three different measures of effort for matrix multiplication, and I'm not sure which is right. Also, there are at least three methods I know of for doing *linear* least squares (and a bit more for *nonlinear* least squares). What are you trying to do? Where did you get the algorithm you currently have?2011-11-22
  • 0
    @J.M. I have provided more elaboration in my question. Please let me know if you need more information.2011-11-22
  • 2
    Okay... forming $\mathbf M=\mathbf X^\top\mathbf X$ is a matrix multiplication. Forming $\mathbf b=\mathbf X^\top \mathbf y$ is a matrix-vector multiplication. But, please, *please*, **please**, do not use inversion+multiplication to compute $\mathbf c=\mathbf M^{-1}\mathbf b$! It is better computational practice to form the Cholesky decomposition of $\mathbf M$ and use that in the computation of $\mathbf c$.2011-11-22
  • 0
    Sure, thanks for the advice! However, I still need to know about how to determine the computational complexity of my current implementation. Can it be inferred from the information I provided above?2011-11-22
  • 0
    Actually, all I want to know is this: From the 4 matrix operations I listed above (with their own complexity), which one has the highest degree of complexity? 3 of them have the same degree of complexity, so I'm not sure which one that I can assign as the algorithm's overall complexity.2011-11-22

2 Answers 2

44

For a least squares regression with $N$ training examples and $C$ features, it takes:

  • $O(C^2N)$ to multiply $X^T$ by $X$
  • $O(CN)$ to multiply $X^T$ by $Y$
  • $O(C^3)$ to compute the LU (or Cholesky) factorization of $X^TX$ and use that to compute the product $(X^TX)^{-1} (X^TY)$

Asymptotically, $O(C^2N)$ dominates $O(CN)$ so we can forget the $O(CN)$ part.

Since you're using the normal equation I will assume that $N>C$ - otherwise the matrix $X^T X$ would be singular (and hence non-invertible), which means that $O(C^2N)$ asymptotically dominates $O(C^3)$.

Therefore the total time complexity is $O(C^2N)$. You should note that this is only the asymptotic complexity - in particular, for $C$, $N$ smallish you may find that computing the LU or Cholesky decomposition of $X^T X$ takes significantly longer than multiplying $X^T$ by $X$.

Edit: Note that if you have two datasets with the same features, labeled 1 and 2, and you have already computed $X_1^T X_1$, $X_1^TY_1$ and $X_2^TX_2$, $X_2^TY_2$ then training the combined dataset is only $O(C^3)$ since you just add the relevant matrices and vectors, and then solve a $C\times C$ linear system.

In particular this allows you do to very fast bootstrap, jackknife and cross-validation when you are training an OLS regression (or variants like ridge regression, lasso, constrained OLS etc).

  • 5
    Thanks for the answer Chris! I'm really impressed about how you notice that N > C, thus O(C^2N) dominates O(C^3).2011-11-22
  • 7
    Would it be possible to have a formal reference for OLS' computational complexity (an article, a textbook, or similar)? I would like to cite it in an academic paper.2016-01-27
0

I was confused at why the matrix multiplication was claimed to be $\mathcal{O}(C^2 N)$ here, which is $\mathcal{O}(N^3)$ when $N=C$, which is much slower than Strassen's $\mathcal{O}(N^{2.8})$ and Le Gall's $\mathcal{O}(N^{2.37})$.

It turns out that we are doing rectangular matrix multiplication where $N \ggg C$. There's much more data than variables.

There are algorithms with far better scaling than the naive $\mathcal{O}(C^2N)$ cost for the dominant part in Chris Taylor's answer.

You can multiply $X^TX$ with complexity $\mathcal{O}(C^{1.8}N)$ if using Strassen's $\mathcal{O}(C^{2.8})$ cost for $C \times C$ matrices, and in theory you can also do it with $\mathcal{O}(C^{1.37}N)$ if using the best known complexity for multiplication of $C \times C$ matrices (but keep in mind that Strassen's algorithm is often preferred because of the big constant in the latter algorithm).

I obtained this from the first equation of page 262 of this paper, by setting:

  • $m=p=C$
  • $n=N$
  • $N > C$
  • $\omega = 2.8$ for Strassen, and $2.37$ for the algorithm with best theoretical (but not practical) asymptotic cost.