Ilmari and leonbloy have already said quite a bit and provided pertinent links; here are some more thoughts:
First, about the primes: the accepted answer to this question at stackoverflow sounds right to me. It's not that either the multiplier or the hash table size need to be primes; they should just be coprime. If you're writing the code for both, there's no reason for them to be prime, but if you're writing the code for only one of them (which often happens in practice), you might want to choose primes, or at least numbers with large prime factors, to reduce the chances of a "prime collision" between the multiplier and the hash table size.
Second, about hashcodes being "well distributed" or "optimally spread": That's only part of the story. For long strings, where injectivity is not an option, random spread is good, but for short strings, injectivity is even better than random spread. Ideally, a multiplier-based hash function should have a random spread for initial inputs but be injective with respect to the last few inputs.
ASCII codes for letters of the same case differ only in the low $5$ bits. So for strings ending in lowercase letters and $32$-bit hash values, you can get injectivity with respect to the last six characters if the multiplier is greater than $32$, whereas for multipliers less than $32$ the lowest bit of the penultimate input influences at most the last $5$ bits and thus its contribution isn't linearly independent from the contributions of all the other bits. In that case you're only using half the available hash values, and the values for the last six inputs form pairs that are mapped to the same hash values.
If the multiplier is greater than $32$ and the multiplication doesn't overflow, injectivity with respect to the last $6$ lowercase characters is guaranteed. This is the case up to a multiplier of $47$ (since $2^4\cdot47^5\lesssim2^{32}$), but even beyond that, the contributions from different bits are usually linearly independent (there's a good chance for $30$ random vectors in $\mathbb F_2^{32}$ to be linearly independent); the first odd multiplier for which they aren't is $95$.
So it's not immediately clear why $33$ should be better for ASCII strings than any other odd multiplier greater than $32$ with remainder $1$ mod $4$ (see Ilmari's answer). However, more generally speaking, the lower the multiplier, the more final inputs get mapped injectively before they start getting scrambled by the overflow. So it could well be that the $33$ is a good trade-off between getting the lower $5$ bits of ASCII characters out of each other's way (which requires $f\ge32$) and delaying the onset of scrambling by overflow (which requires low $f$).
A note on the difference between the XOR variant and the one where $j$ is added instead: My considerations above about linear independence apply only to the XOR case, in which we can treat the hash value as a vector in $\mathbb F_2^{32}$ and the hash function as an affine transform on that space. In the case of addition, we don't need the multiplier to be greater than $32$ for the hash to be injective with respect to the last six inputs, just greater than the number of lowercase letters, which is $26$, so in that case there's no particular advantage to $33$. That's borne out by the experimental results that leonbloy linked to, where $31$ (labeled "K&R") even does slightly better on words than $33$ (labeled "Bernstein"); you can see near the top of the page or by following the Bernstein link that these results refer to the case with addition. (This is consistent with the idea that lower multipliers are generally better as long as they map the lowercase letters injectively, but it seems to cast some doubt on Ilmari's point about the remainder mod $4$?)
Examples where the combination of $31$ and XOR fails to be injective already occur with $3$ letters; for instance, "rox" and "rng" are both mapped to 1A607
.
This is all just about the advantages of a multiplier with respect to avoiding hash collisions. Often the speed of computing the hash function is at least as important as avoiding collisions, and in that respect $33$ has the advantage over most other multipliers (though not over $31$) that it can easily be computed with a shift and an addition, as you explained.