Here $p$ is the probability of a success in a single trial, $n$ is the number of trials, and $X$ is a random variable representing the number of successes in a run of $n$ trials. On average you would expect to get $pn$ successes, expected value of $X$ is $pn$. The expected value of $\frac{X}n$ is therefore $\frac{pn}n=p$. If we define $\hat p$ to be $\frac{X}n$, the fraction of trials that are successes, we expect it to be about $p$, give or take a bit.
That ‘give or take a bit’ is measured by the standard deviation $\sigma_{\hat p}=\sqrt{\frac{p(1-p)}n}$. Now when $n$ is reasonably large, both $X$, the number of successes, and $\hat p$, the fraction of successes, are approximately normally distributed. The mean of the normal distribution that approximates $\hat p$ is $p$, the expected value of $\hat p$, and its standard deviation is $\sigma_{\hat p}$.
The standard normal distribution, however, has a mean of $0$ and a standard deviation of $1$, so in order to use the standard tables prepared from it, you have to shift and rescale the distribution of $\hat p$ so that it has a mean of $0$ and a standard deviation of $1$. The formula does this in two steps. First it replaces $\hat p$ by $\hat p-p$. Thus, if you get a sample that hits the expected fraction of successes on the nose, your $\hat p$ will equal $p$, and $\hat p-p$ will be $0$. More generally, $\hat p-p$ measures measures the deviation of your sample’s fraction of successes from the expected fraction; when $\hat p-p>0$, you got more than the average number of successes, and when \hat p-p<0, you got fewer. Subtracting $p$ from $\hat p$ just shifts the centre of the distribution from $p$ down to $0$: since the mean value of $\hat p$ is $p$, the mean value of $\hat p-p$ is $0$.
Then we want to rescale the distribution to standardize its spread, so that it has a standard deviation of $1$. Right now its standard deviation is still $\sigma_{\hat p}$: shifting it down by $p$ units doesn’t change its shape, or how must it’s spread out. We want to convert each $\sigma_{\hat p}$ units of spread to $1$ unit. This is like wanting to convert a spread in feet to a spread in yards: you have to divide by the $3$ feet per yard. Here I have to divide by the $\sigma_{\hat p}$ $\hat p$-units per standard unit; the result is
$\frac{\hat p-p}{\sigma_{\hat p}}=\frac{\hat p-p}{\sqrt{p(1-p)/n}}\;.$
For convenience let’s call this quantity $Y$. $Y$ is $\hat p$ shifted down by $p$ units and rescaled to have a standard deviation of $1$. Since $\hat p$ is approximately normally distributed, so is $Y$, and its mean and standard deviation are $0$ and $1$, respectively. This means that $Y$ is approximated by the standard normal distribution.
Suppose that $Z$ is a random variable with the standard normal distribution. If 0<\alpha\le\frac12, $z_\alpha$ is by definition the number with the property that $P(Z\ge z_\alpha)=\alpha$. Thus, $P(Z\ge z_{\alpha/2})=\alpha/2$. The standard normal distribution is symmetric about its mean $0$, so $P(Z\le -z_{\alpha/2})=\alpha/2$ as well. Thus, $P(Z\le -z_{\alpha/2}\text{ or }Z\ge z_{\alpha/2})=\frac{\alpha}2+\frac{\alpha}2=\alpha\;,$ and hence P(-z_{\alpha/2} the probability that $Z$ falls between $-z_{\alpha/2}$ and $z_{\alpha/2}$ is $1-\alpha$.
Finally, $Y$ is distributed approximately like $Z$, with the same mean and standard deviation, so the same is approximately true of $Y$:
P(-z_{\alpha/2}
And since $Y=\frac{\hat p-p}{\sigma_{\hat p}}=\frac{\hat p-p}{\sqrt{p(1-p)/n}}\;,$ $(1)$ reduces to
P\left(-z_{\alpha/2} < \frac{\hat p- p}{\sqrt{p(1-p)/n}} < z_{\alpha/2}\right) \approx 1 - \alpha\;.