A coin may be biased; you get "heads" with frequency $p\in(0,1)$. The probability of getting two heads in six independent trials is $\binom{6}{2}p^2(1-p)^4$. The likelihood function is $L(p) = \binom{6}{2}p^2(1-p)^4$. It's the probability as a function of $p$ with the "$2$" held fixed.
With $p$ fixed, the probability of getting "heads" $x$ times, as a function of $x$, is the probability density function (with respect to counting measure, so it's the probability mass function). But with $x$ fixed (in the example above, $x=2$) the same probability density as a function of $p$ (and as a function of $p$ it's not a probability density function) is the likelihood function $ L(p) = \binom{6}{2}p^2(1-p)^4. $ The log-likelihood function is merely the logarithm of the likelihood function: $ \ell(p) = \log\binom{6}{2} + 2\log p + 4\log(1-p). $ The logarithm is used simply because it's an easier function to differentiate. One does not usually then go on to find $L'(p)$, because usually one just wants the maximum value rather than the rate of change at particular points. $\log$ is an increasing function, so the maximum value of $\ell$ and that of $L$ occur in the same places.
Maximum-likelihood estimation is not the only purpose for which likelihood functions are used. Another purpose is that if one multiplies a prior probability density of $p$ by the likelihood function, and then normalizes, one gets the posterior probability density function of $p$. That is Bayes' theorem. Bayes himself did this originally in the context of the binomial distribution---just as in the example above. In that kind of problem, one generally has no occasion to take the logarithm explicitly.