Following your own answer, we can use Derek Goldrei, Propositional and Predicate Calculus A Model of Argument (2005) for the right explanation.
He states a "standard" Hilbert-style propositional calculus; see page 87 for axioms, inference rule (only one : modus ponens) and the "basic" definition of the derivability property :
$\Gamma \vdash \varphi$ as a shorthand for ‘there is a derivation of $\varphi$ from $\Gamma$’.
Recall that :
Let $\Gamma$ be a set (possibly empty) of formulas and let $\varphi$ be a formula. A derivation of $\Gamma \vdash \varphi$ within $S$ is a finite sequence of formulas $φ_1, φ_2, φ_3, . . . , φ_n$, where the final formula $φ_n$ in the sequence is the formula $\varphi$ and the inclusion of each formula $φ_i$ can be explained in one of the following ways:
(i) $φ_i \in \Gamma$; [...]
Then see page 89 :
Theorem 3.1 (the thinning rule) : (ii) Suppose that there is a derivation of $\Gamma \vdash \varphi$ and that $\Delta$ is a set of formulas with $\Gamma ⊆ \Delta$. Then there is also a derivation of $\Delta \vdash \varphi$.
See the important note in the same page :
Theorem 3.1 is a theorem about a formal proof system, rather than a formal theorem of it. To help distinguish between these two uses of the word ’theorem’, a result about a formal system is often called a metatheorem.
Thus, as per Peter's comment, in an Hilbert-style presentation of propositional logic (like Goldrei's one) it is more correct to state the thinning rule as a meta-theore; ths means that it "works" ate the "meta-level" stating a property of the derivability relation and not at the level of the calculus (where modus ponens "works").
You are right in pointing at a difference between Goldrei's predicate calculus abd the propositional one. Consider the definition of derivation with the new generalization inference rule [page 221] :
(iv) $φ_i$ is of the form $∀xφ_j$, where $φ_j$ is a previous formula in the sequence, i.e. where $j < i$, and the quantified variable $x$ does not appear free in any formula in the set $\Gamma$.
In addition, if we have a derivation $\Gamma_0 \vdash φ$ obeying the conditions (i) to (iv) above for some subset $\Gamma_0$ of $\Gamma$, we shall also declare that $\Gamma \vdash φ$.
Thanks to this, if we have a derivation of $\Gamma \vdash φ$, this means that there is a derivation following conditions (i) to (iv) of $\Gamma_0 \vdash φ$ for some subset $\Gamma_0$ of $Γ$.
This is the thinning rule for the system. For propositional calculus this was a metatheorem about the system, but for predicate calculus it is built into the system.
The reason is explained in page 223, commenting the proof of $∀xφ(x) \vdash ∀yφ(y)$, for any formula $φ(x)$ in which $x$ appears as a free variable and for any variable $y$ which doesn’t appear in $φ(x)$.
Let’s use this derivation as an example to explain why we have built the thinning rule into the definition of $\Gamma \vdash φ$. If $x$ appears as a free variable in $φ(x)$ and $y$ doesn’t appear in $φ(x)$, then we have $∀xφ(x) \vDash ∀yφ(y)$.
This will follow from the soundness theorem for the system, but it is also easily verified directly from the definition of logical consequence. It follows straightforwardly from the properties of logical consequence that $\Delta, ∀xφ(x) \vDash ∀yφ(y)$ for any set of formulas $\Delta$ – we might say that the thinning rule applies to logical consequence – and it would then be desirable, given that we want a completeness theorem for the system, that $\Delta, ∀xφ(x) \vdash ∀yφ(y)$.
Without the thinning rule for the formal system, we would run into trouble if the set $\Delta$ contained any formulas involving $y$ as a free variable, for instance just the formula $φ(y)$. Our derivation of $∀xφ(x) \vdash ∀yφ(y)$ wouldn’t give a derivation of $φ(y), ∀xφ(x) \vdash ∀yφ(y)$ because the use of Gen on [one of the line of the derivation] would then quantify a variable appearing free in one of the assumptions, albeit an assumption not exploited anywhere in the derivation! But thanks to the thinning rule, we can say that $φ(y), ∀xφ(x) \vdash ∀yφ(y)$, because there is a derivation of $∀yφ(y)$ using a subset of the set of assumptions.