The general strategy is this. To take $\delta$ of an expression involving $\text{grad}\, X$, you first just take the variation (allowing the $\delta$'s to be inside the $\text{grad}$). Then you end up with $\text{grad}(\delta X)$. But this is all inside the integral. Now, you integrate by parts, to get the $\text{grad}$ off the $\delta X$.
Here is a common example. Consider$$E = \int (f^2 + \text{grad}\,f \cdot \text{grad}\,f).$$Let us compute $\delta E$. We have$$\delta E = \int (2f\,\delta f + 2 \,\text{grad}\,f \cdot \text{grad}(\delta f)).$$Now integrating the second term by parts, we get$$\delta E = \int (2f\,\delta f - 2\,\text{div}(\text{grad}\,f)\,\delta f).$$That is,$$\delta E = \int (\delta f)(2f - 2\,\text{div}(\text{grad}\,f)).$$Now we have$${{\delta E}\over{\delta f}} = 2f - 2\,\text{div}(\text{grad}\,f).$$So, ${\delta E}/{\delta f} = 0$ yields$$\text{div}(\text{grad}\,f) = f.$$You do a similar thing in your example.
My grads are $\nabla_j$, $\nabla_k$. Since the gradients are with respect to different coordinates, what would then happen upon integrating the second term by parts? Also I need two different $f$'s as I need a complex conjugate. Can you please update this all? How are the variations of the two terms I have posted different?
For Hermitian conjugate, we have $\delta(X^\text{H}) = (\delta X)^\text{H}$. For integration by parts, we take it that the integral is over all the coordinates. So, we can integrate by parts, e.g.$$(\text{something})\nabla_j(\text{something else}),$$yielding$$-\nabla_j(\text{something})(\text{something else}).$$In the final integral, we get$$\int \left((\text{stuff})\,\delta X + (\text{other stuff})(\delta X)^\text{H}\right).$$The vanishing of the variation then requires that $\text{stuff} = 0$ and $(\text{other stuff}) = 0$. (Probably, $(\text{other stuff})$ will be the Hermitian conjugate of $\text{stuff}$, so we will get just one condition.) On the other hand, if we have a term such as $\text{Tr}((\text{stuff})(\delta X)^\text{T})$ we replace it by its equal, $\text{Tr}((\text{stuff})^\text{T}(\delta X))$.
I do understand by parts. You addressed this in the update, thanks. I understand this part, I just do not get all the details involved in the calculation which is what I am specifically looking for in the bounty. So I have no idea what something, something else, stuff, other stuff are. I do believe you're right I just don't know what any of this means. (This is clearly way simpler to you than me.) I need to see actual proof. How do all the indices of $f$ and the gradients come into play? This is not addressed anywhere in your answer. Thanks so much for your help.
Thanks I really appreciate the update, I still have questions. The simplification using "stuff", "other stuff", "something", etc. just makes it a lot more confusing to me, as I said I am looking for a complete rigorous solution. A simple question I have is the following, if$$E = \int \left(f^2 + \nabla_j f_{\alpha j} \nabla_k \bar{f}_{\alpha k}\right)$$what is $\delta E$? Is it$$\delta E = \int \left(2f\,\delta f + 2\nabla_jf_{\alpha j} \cdot \nabla_k \delta \bar{f}_{\alpha k}\right)?$$I do get how you got $\delta E$ based on the example you provided but this is the simplest case. (I only get this one.)
I think that, in order that I can understand this, you will have to
do the following.
First, eliminate everything that is not relevant to your question. Is it necessary that there be matrices? If not, use ordinary functions. Is it necessary that things be complex? If not, use reals. Is it necessary that there be many independent variables? If not, use one variable. Are there terms in your integral that are not germane to your question? Then remove them. That is, find the simplest, most transparent situation in which your issue arises.
Second, state fully what (in that context) the question is. That is, tell me what everything is a function of, what the integrals are over, what you wish to do with those integrals, etc.
Yes it is necessary everything is matrices. I can do the simple case for scalars (ordinary functions). It is necessary there are that many independent variables. The terms in my integral are relevant and there is nothing to remove. The simplest situation my issue arises is what you posted as your answer (the only situation I understand). If you can't help anymore, thanks again, I appreciate your help. I will try editing my post now to make it simpler. However, I need to deal with complex matrices and $\nabla j$, $\nabla k$ (that's the whole trouble for me, I am fine with scalars.)
I made it much simpler and showed what I think. Note, I made it simpler by forgetting about the different gradients, I just chose each gradient to be $\nabla k$, $\nabla k$, and also made it simpler by having the matrix indices the same. I cannot make it any more simpler without removing the structure of the problem, I must deal with complex matrices. Please let me know if you have any suggestions. Thanks again for everything I really appreciate it.
Thanks. Now I understand. Yes, what you are doing is correct.
Something similar arises for ordinary functions. Let $f$ be a function of complex variable $z$. The actual $f$ may be written in terms of $z$ and $\overline{z}$ (complex conjugate of $z$). For example, $f = z\overline{z}$. What is $df/dz$? How do you do the "$d/dz$" of $\overline{z}$? The rule is that you regard $z$ and $\overline{z}$ as independent for purposes of taking the derivative. Thus,$${{df}\over{dz}} = \overline{z}, \quad {{df}\over{d\overline{z}}} = z.$$This is convenient, because, for example we have (for a general $f$)$$\overline{{df}\over{dz}} = { {d\overline f}\over{d\overline z}}.$$