In general a "feature vector" is a list of values (numbers) that contain the relevant features of your signal for some specific task (here, use as input to a speech recognitizion algorithm) in some efficient and expressive way.
Some concrete examples. Suppose that, at the first step in your procedure, you divide your audio signal (say, 24khz, mono signal) in "frames" (fragments of fixed length, say 50 ms). Your are going now to build an appropiate "feature vector" for each one of this frames.
A frame here is composed of 1200 samples, which you store (say) in a row matrix in Matlab. Well, I could considered this matrix already as a "feature vector" (it certainly represents the audio signal: each number is the audio intensity as a function of time). But this "trivial" vector is not very apt: because they are too many numbers and because they are not in themselves very 'expressive': I want to distinguish a vowel from a consonant, for example, and this 1200 numbers say little about it; the same speaker saying the same vowel will probably produce a two such vectors that are very different. I dont want that.
A first transformation, that will give us a more useful feature vector, is the Fourier transform (or rather, the spectogram) of the audio. You probably have the basic idea (from music: graphic equalizers, etc). Instead of having a list ("vector") of 1200 samples (each for a instant of time) I now have a "vector" of (say) 128 numbers that tell me how much energy the audio has in each "frequency band" (always inside the frame). This is more efficient (less numbers) and expressive (perhaps I can start roughly distinguishing vowels and consonants, male vs female voices, etc, just by looking at this numbers).
From this other transformation follows (MEL: change the scale of the frequencies; CEPSTRUM: log followed by inverse Fourier transform -or rather DCT, conceptually similar here- ) and finally you trim the least important elements from your feature vector. Each step gives you a different feature vector, hopefully more efficient/expressive than the previous one. The whole procedure can sound a little complex and esotheric, if are not familiar with all this. But conceptually, from the point of understanding what means to compute a suitable "feature vector", these last steps are conceptually analogous with the first one.