In relation to the final equation on the example in the accepted answer (+1):
The independence from the population parameter $\theta$ of the conditional probability mass function of the random vector $\mathrm X = \left(\mathrm X_1,\mathrm X_2, \dots,\mathrm X_n \right),$ corresponding to $n$ iid samples, with respect to a statistic $T(\mathrm X)$ of this random vector can be understood through the partition of the sample space by the statistic. The intuition here would be of Venn diagrams separating uniquely those samples of size $n$ that add up to the same value, or the set of partitions of $n \bar{ \mathrm x}=\sum_{i=1}^n \mathrm x,$ which can be thought of as $[x_{n \bar{\mathrm x}}]\left(x^0+x^1+x^2+\cdots\right)^n,$ for instance in the case of the Poisson, which has support $\mathbb N\cup\{0\},$ the mean of samples of $n=10$ would partition the sample space (diagrammatically) as

This explains why, considering $\mathrm X$ as a subset of $T(\mathrm X),$
$\Pr\left(\mathrm X=\mathrm x \cap T(\mathrm X)=T(\mathrm x)\right)=\Pr\left(\mathrm X=\mathrm x\right)$
allowing the following "test" for a sufficient statistic:
$\begin{align} \Pr\left(\mathrm X=\mathrm x \vert T(\mathrm X)=T(\mathrm x)\right)&=\frac{\Pr\left(\mathrm X=\mathrm x \cap T(\mathrm X)=T(\mathrm x)\right)}{\Pr\left(T(\mathrm X)=T(\mathrm x) \right)}\\[2ex] &=\frac{\Pr\left(\mathrm X=\mathrm x \right)}{\Pr\left(T(\mathrm X)=T(\mathrm x) \right)} \end{align} $
i.e. if for all values of $\theta,$ the ratio of the probability of the sample over the probability of the statistic is constant, the test statistic is sufficient: $\Pr\left(\mathrm X=\mathrm x \vert T(\mathrm X)=T(\mathrm x)\right)$ does not depend on $\theta.$
Moving on to the example in the accepted answer (2 draws from a normal $N(\mu,\sigma)$ distribution, $\mathrm X =(\mathrm X_1, \mathrm X_2),$ which are meant to represent the entire sample, $(\mathrm X_1, \mathrm X_2, \cdots, \mathrm X_n)$ in the more general case, and transitioning from discrete probability distributions (as assumed up to this point) to continuous distributions (from PMF to PDF), the joint pdf of independent (iid) Gaussians with equal variance is:
$\begin{align} f_\mathrm X\left(\mathrm X =\mathrm x\vert\mu\right)&=\prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left({\frac{-(x_i-\mu)^2}{2\sigma^2}}\right)\\[2ex] &=\frac{1}{(2\pi\sigma^2)^{(n/2)}}\exp\left({\frac{-\sum_{i=1}^n(x_i-\mu)^2}{2\sigma^2}}\right)\\[2ex] &=\frac{1}{(2\pi\sigma^2)^{n/2}}\exp\left({\frac{-\sum_{i=1}^n(x_i-\bar x + \bar x -\mu)^2}{2\sigma^2}}\right)\\[2ex] &=\frac{1}{(2\pi\sigma^2)^{n/2}}\exp\left({\frac{-\left(\sum_{i=1}^n(x_i-\bar x)^2 + n(\bar x -\mu)^2\right)}{2\sigma^2}}\right)\\[2ex] \end{align}$
The ratio of pdf's (the denominator corresponding to the pdf of the sampling distribution of the sample mean for the normal, i.e. $N(\mu,\sigma^2/n),$ results in
$\begin{align} \frac{f_\mathrm X(\mathrm X =\mathrm x\vert \mu)}{q_{T(\mathrm X)}(T\left(\mathrm X=T(\mathrm x)\right)\vert \mu)}&=\frac{\frac{1}{(2\pi\sigma^2)^{n/2}}\exp\left({\frac{-\left(\sum_{i=1}^n(x_i-\bar x)^2 + n(\bar x -\mu)^2\right)}{2\sigma^2}}\right)} {\frac{1}{\left(2\pi\frac{\sigma^2}{n}\right)^{1/2}}\exp\left({\frac{-n(\bar x-\mu)^2}{2\sigma^2}}\right)}\\[2ex] &\propto \exp{\left(\frac{-\left(\sum_{i=1}^n(x_i-\bar x)^2\right) }{2\sigma^2} \right)} \end{align}$
eliminating the dependency on a specific $\mu.$
This is all beautifully explained in Statistical Inference of George Casella and Roger L. Berger.
Consequently, the sample mean is a sufficient statitic.
In contradistinction, the maximum value of the sample, which is a sufficient statistic of a uniform $[0,\theta]$ with unknown $\theta,$ would not be sufficient to estimate the mean of Gaussian samples. The histogram of the maximum value of samples of 10 from the uniform $[0,3]$ shows how the $\theta$ parameter is approximated, allowing the rest of the information from the sample to be discarded:

The maximum would simply be an extreme example of the single random variable in the sample vector posted as a counterexample to a sufficient statistic in the approved answer.
In this case, the pdf of the statistic becomes unwieldy, involving the error function:
$\frac{1}{2}+\frac{1}{2}\text{erf}\left(\frac{x-\mu}{\sigma\sqrt 2}\right)$
which (among other differences between the numerator and denominator of the pdf ratios) preclude getting rid of $\mu.$
Intuitively, knowing the maximum value of each sample does not summarize all the information regarding the population mean, $\mu,$ available in the sample. This is visually clear plotting the sampling distribution of the means of $10^6$ simulations of $n=10$ samples $N(0,1)$ (on the left) versus the sampling distribution of the maximum values (on the right):

The latter disposes of information available within the complete sample necessary to estimate the population mean - it is also biased.