One interpretation of the Kullback-Leibler divergence $D_{KL}(P \Vert Q)$ is that it measures the extra message length per character needed to send a message drawn from distribution $P$ using an encoding scheme optimized for distribution $Q$. If one character is assigned a very small probability by $Q$, then the encoding scheme will use a very large number of bits to encode that character, since it is expected to occur so rarely. (By Kraft's inequality, the number of bits should be $-\log_2(q_i)$, where $q_i$ is the probability of the unlikely character.) If in fact it occurs frequently in distribution $P$, then a message drawn from $P$ will take a large number of bits to encode (specifically, $-p_i \log_2(q_i)$ per character on average). Therefore, if the $i$-th component of $Q$ goes to zero while the corresponding component of $P$ remains fixed and positive, the KL divergence $D_{KL}(P \Vert Q)$ should (and does) diverge logarithmically as $-p_i \log_2(q_i)$.