In layman's terms:
First let's start with the Fourier series, a method that Fourier wrote for the first time in a paper about heat diffusion modelling. The idea is that any continuous function can be approximated by adding up lots of sine an cosine functions. The more terms you use, the more accurate will be the approximation:
$f(x) = a_0\cos\frac{\pi y}{2}+a_1\cos 3\frac{\pi y}{2}+a_2\cos5\frac{\pi y}{2}+\cdots + b_0\sin\frac{\pi y}{2}+b_1\sin 3\frac{\pi y}{2}+b_2\sin5\frac{\pi y}{2}+\cdots$
In order to find the coefficients, the following trick was used for cosine functions: $ a_n = \displaystyle\frac{1}{\pi}\int_{-\pi}^\pi f(x) \cos(nx)\, dx, $ and similarly for sine functions: $ b_n = \displaystyle\frac{1}{\pi}\int_{-\pi}^\pi f(x) \sin(nx)\, dx. $
These formulae are known as the Fourier sine and cosine transforms.
Euler's formula $e^{i\theta} = \cos(\theta) + i \sin(\theta)$ shows a relationship between exponential and cosine and sine functions. The Fourier sine and cosine transforms can thus be combined into a single transform: $ \hat{f}(\xi) = \int_{-\infty}^{\infty} f(x)\ e^{- 2\pi i x \xi} \, dx, $ and this explains why you see exponential functions in the Fourier transform instead of cosine and sine functions.
Now, in image processing we are typically working with two-dimensional images. So the Fourier transform has been done twice: $ \displaystyle F(u,v)=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x,y)\ e^{-j2\pi(ux+vy)} \, dx \, dy. $
Since images actually come in discrete pixels rather than continuous the integrals are replaced by summations.