Let $p(x_1,x_2,x_3)$ be a scalar function. The goal is to find $x_1,x_2,x_3$ to minimize $p(x_1,x_2,x_3)$. Now consider the gradient descent method: $ \left( \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ \end{array} \right)_{k+1} = \left( \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ \end{array} \right)_{k} - \alpha_k \left( \begin{array}{c} \frac{\partial p}{\partial x_1} \\ \frac{\partial p}{\partial x_2} \\ \frac{\partial p}{\partial x_3} \\ \end{array} \right)_{k} $ where $\alpha_k$ is the step size.
My question is: can the above iterative process be conducted in a distributed manner? This might be motivated by some reasons such as distributed computational resources. The following is my opinion about this problem.
Rewrite the above equation to $ x_{1,k+1}=x_{1,k}-\alpha_{1,k} \frac{\partial p}{\partial x_1} $ $ x_{2,k+1}=x_{2,k}-\alpha_{2,k} \frac{\partial p}{\partial x_2} $ $ x_{3,k+1}=x_{3,k}-\alpha_{3,k} \frac{\partial p}{\partial x_3} $ Then the three equations can be computed in three computers, respectively. Here I have a question, do the step size $\alpha_{1,k}, \alpha_{2,k}, \alpha_{3,k}$ matter? Should we keep $\alpha_{1,k}=\alpha_{2,k}=\alpha_{3,k}$?? In other words, is the following equation gradient descent? If $\alpha_{1,k}, \alpha_{2,k}, \alpha_{3,k}$ are different from each other, the global movement is no long along with $\nabla_\mathbf{x} p(\mathbf{x})$. $ \left( \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ \end{array} \right)_{k+1} = \left( \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ \end{array} \right)_{k} - \left( \begin{array}{ccc} \alpha_{1,k} & 0 & 0 \\ 0 & \alpha_{2,k} & 0 \\ 0& 0& \alpha_{3,k}\\ \end{array} \right) \left( \begin{array}{c} \frac{\partial p}{\partial x_1} \\ \frac{\partial p}{\partial x_2} \\ \frac{\partial p}{\partial x_3} \\ \end{array} \right)_{k} $