The Fourier transform - any Fourier transform - splits a signal into "frequencies", and measures the amplitude and alignment of each frequency.
In the case of sound, these are audible frequencies that you can hear. But in the case of an image, things are less obvious. The mathematics is still the same, but it's harder to wrap your brain around.
The Fourier transform measures "spatial frequencies" in the image. If you imagine horizontal or vertical bars of colour repeating at different speeds, these are the "frequencies" that the Fourier transform is measuring. Much like a sound signal, an image with long, rolling, smooth colour transitions contains many low frequencies, whereas one with abrupt changes in colour possesses lots of high frequencies.
The Fourier transform thus has a couple of uses in image processing. I can think of two:
First, when you change any signal, its spectrum obviously changes as well. When you take a photograph and the camera moves, you get a blurry image. It's not at all obvious how you could try to "unblur" this image. But, when you talk about the spectrum of the image, a blur is simply a low-pass filtering operation. In principle, if you undo that filtering, you could unblur the image.
(Obviously, that's the theory. In practise, it's not that simple...)
Lots of other interesting things you could do to an image are quite complicated in terms of what happens to the individual pixels, but very simple in terms of how the spectrum changes. So using the Fourier transform to get you a spectrum is an obvious step.
Alternatively, the Fourier transform is useful for image compression. If you save the individual pixel colours less accurately, the image just looks like some God-awful computer graphics from the 1980s. But if you save the spectrum less accurately, the picture just gets slightly blurry, which is far less annoying.
By doing a sophisticated analysis of the way the human brain processes image data, you can estimate which frequencies in a given image are "the most important", and store those with high precision, while throwing away any "less important" frequencies. This is how JPEG and friends work.