To answer you first question, the "axioms of probability" help us to articulate our intuition and properties we desire for a probability function.
For instance, if you toss a normal coin and you know that a tail has a probability of $\frac1{5}$ to occur then the probability of head is fixed at $\frac4{5}$. You have only one degree of freedom in this case.
Similarly, if you are rolling a normal die with $6$ faces displaying numbers from $1$ to $6$ and if you know that $2$ occurs with probability $\frac5{21}$ and $5$ occurs with probability $\frac6{97}$, then this fixes the probability of some of the other events. For instance, the probability of getting neither $2$ nor $5$ is fixed at $1 - \frac5{21} - \frac6{97}$.
The sets over which the probability gets fixed by specifying the probability of some sets is the motivation for the definition of algebra /$\sigma$-algebra.
Further, note that your definition of probability need not be on the entire $2^S$. In fact in many interesting cases, you cannot define a function, which matches with our intuitive definition of probability, on the entire $2^S$ (For instance, you cannot define a non-negative and translation-ally invariant function on $\mathbb{R}$ to be countably additive on the entire $2^{\mathbb{R}}$).
It is enough to define a function on a $\sigma$-algebra of $S$ and not on the entire $2^S$. ($2^S$ is a $\sigma$-algebra). A very simple $\sigma$-algebra could be something like $\{\emptyset,A,A^C,S\}$. This allows us to talk of probability of the event $A$. This is called the $\sigma$-algebra generated by the set $A$. Defining the probability on $A$ fixes the probability of the rest of the elements in the $\sigma$-algebra generated by $A$. More generally, if you are interested in some events say $S = \{A_{\alpha}\}_{\alpha \in \Gamma}$, it is enough to restrict our definition of probability to the $\sigma$-algebra generated by $S$ denoted by $\sigma(S)$.
For the second question, "How do we know that such a function has anything to do with whatever "probability" means "in the real world"? All we've done is identify real numbers in [0,1] with subsets of the set S"
All we have done now is, as you have said, for each set we have assigned a number between $0$ and $1$ satisfying certain axioms. It is up-to to you to identify this with the probability in "real-world".
For instance, let $S = \{0,1\}$ and define $\mathbb{P}(\{0\}) = \frac1{147}$. You may or may not choose this to model the toss of a coin. If you believe the coin is fair, you will not choose this model since this says the coin is biased.
The theory sets up the framework and it is up-to the modeler to check if this model fits "reality" or the problem he is interested in.