Let $p(x_1,x_2,x_3)$ be a scalar function. The goal is to find $x_1,x_2,x_3$ to minimize $p(x_1,x_2,x_3)$. Now consider the gradient descent method: $$ \left( \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ \end{array} \right)_{k+1} = \left( \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ \end{array} \right)_{k} - \alpha_k \left( \begin{array}{c} \frac{\partial p}{\partial x_1} \\ \frac{\partial p}{\partial x_2} \\ \frac{\partial p}{\partial x_3} \\ \end{array} \right)_{k} $$ where $\alpha_k$ is the step size.
My question is: can the above iterative process be conducted in a distributed manner? This might be motivated by some reasons such as distributed computational resources. The following is my opinion about this problem.
Rewrite the above equation to $$ x_{1,k+1}=x_{1,k}-\alpha_{1,k} \frac{\partial p}{\partial x_1} $$ $$ x_{2,k+1}=x_{2,k}-\alpha_{2,k} \frac{\partial p}{\partial x_2} $$ $$ x_{3,k+1}=x_{3,k}-\alpha_{3,k} \frac{\partial p}{\partial x_3} $$ Then the three equations can be computed in three computers, respectively. Here I have a question, do the step size $\alpha_{1,k}, \alpha_{2,k}, \alpha_{3,k}$ matter? Should we keep $\alpha_{1,k}=\alpha_{2,k}=\alpha_{3,k}$?? In other words, is the following equation gradient descent? If $\alpha_{1,k}, \alpha_{2,k}, \alpha_{3,k}$ are different from each other, the global movement is no long along with $\nabla_\mathbf{x} p(\mathbf{x})$. $$ \left( \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ \end{array} \right)_{k+1} = \left( \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ \end{array} \right)_{k} - \left( \begin{array}{ccc} \alpha_{1,k} & 0 & 0 \\ 0 & \alpha_{2,k} & 0 \\ 0& 0& \alpha_{3,k}\\ \end{array} \right) \left( \begin{array}{c} \frac{\partial p}{\partial x_1} \\ \frac{\partial p}{\partial x_2} \\ \frac{\partial p}{\partial x_3} \\ \end{array} \right)_{k} $$