A CRC computation is as follows. You have a data polynomial
$d(x) = \sum_{i=0}^{n-1} d_ix^i$ where the $d_i \in \{0, 1\}$ are
$n$ bits to
be transmitted (or recorded). What is actually transmitted (or
recorded) is
$$t(x) = x^{16}d(x) + p(x)$$
where $p(x)$ is the ${\textit remainder~polynomial}$
when $x^{16}d(x)$ is divided by the CRC polynomial
$C(x) = x^{16} + x^{12} + x^5 + 1$. Note that polynomial division yields
$$
x^{16}d(x) = q(x)C(x) + p(x)
$$
where the quotient $q(x)$ is discarded. But the above equation
can be re-arranged as
$$
q(x)C(x) = x^{16}d(x) - p(x) = x^{16}d(x) + p(x) = t(x)
$$
(because arithmetic on the polynomial coefficients is done modulo $2$
and subtraction is thus the same as addition),
and thus $t(x)$ is a multiple of the CRC polynomial $C(x)$.
Since $p(x)$ is of degree $15$ or less, the
high-order coefficients of the polynomial $t(x)$ are the data bits
while the low-order coefficients are the so-called CRC bits or parity
bits, that is
$$
t(x) = d_{n-1}x^{n-1 + 16} + d_{n-2}x^{n-2+16}
+ \cdots d_0x^{16} + p_{15}x^{15} + p_{14}x^{14} + \cdots + p_0.
$$
In computing $p(x)$ via polynomial long division using paper and pencil,
you need to remember that
(i) $\quad \quad$ $d(x)$ is multiplied by $x^{16}$ before
beginning the long division
(ii) $\quad \quad$ the subtractions of polynomial coefficients that occur in
the long division process are all done modulo $2$ and thus are the same as additions modulo $2$ (that is, the XOR operation)
(iii) $\quad \quad$ the long division continues till $x^{16}d_0$ is processed
and the remainder is a polynomial of degree $15$ or less.
Practical CRC systems often have bells-and-whistles (such as the high-order
bits $8$ bits in $x^{16}d(x)$ are complemented before the division process
begins etc.) which I have not included above because these are likely
not of interest in this forum. However, ignoring such details in your
paper-and-pencil computations will make your results differ
from the ones your machine is giving you.
At the receiving (or reading) end, what you have is a polynomial
$r(x)$ which might be $\textit slightly$ different from $t(x)$
if transmission (or read) errors changed some of the bits. The
CRC error detection process merely divides $r(x)$ by $C(x)$; no
need to multiply $r(x)$ by $x^{16}$ first. If
the remainder is nonzero, then $r(x)$ is $\textit not$ a multiple
of $C(x)$ and so the receiver knows that transmission (or read) errors
have occurred. But if the remainder is $0$, then $r(x)$ is a
multiple of $C(x)$, and
with high probability, is the same as $t(x)$. The low-order
$16$ bits in $r(x)$ are discarded and the high-order $n$ bits
are accepted as error-free data. When the remainder is $0$, there
is a small probability that the difference between $r(x)$ and $t(x)$
is a multiple of $C(x)$. Such errors are referred to as undetected
errors. The probability of undetected error decreases as the degree
of $C(x)$ increases. For this reason, many modern systems use
CRC polynomials of degree 32.