Why in algebraic geometry we usually consider the Zariski topology on $\mathbb A^n_k$? Ultimately it seems a not very interesting topology, infact the open sets are very large and it doesn't satisfy the Hausdorff separation axiom. Ok the basis is very simple, but what are the advantages?
Why Zariski topology?
- 
5It seems to be the natural candidate. It allows spaces to be built from algebraic equations... and algebraic geometry is that subject, building geometries from algebra. – 2012-06-22
- 
1Dear Galoisfan, These answers are related: http://math.stackexchange.com/a/53931/221 and http://mathoverflow.net/questions/21502/what-is-the-zariski-topology-good-bad-for/21529#21529 Regards, – 2012-06-23
- 
15"it seems a not very interesting topology" aha, so basically anything one doesn't understand yet is not very interesting? The Zariski topology is very, very, very natural: You just want the zero-sets $\{f = 0\}$ to be closed, and evaluation has to take place in residue fields. Thus you arrive at $\{\mathfrak{p}: f \in \mathfrak{p}\}$ as basic closed subsets. The same is true *verbatim* for manifolds. Also, it is a misconception that non-hausdorff spaces are pathological. Some people try to make you believe this, without any evidence. – 2012-06-23
- 
0I think the fact that open sets are very large and easy to play with is precisely the point of the Zariski topology—in algebraic geometry there is no pressure to consider $ \mathbb{Q}(i) \in \mathbb{C} $ topologically. – 2017-10-05
6 Answers
To appreciate the Zariski topology it helps to have a fairly broad view about what a topological space is. Topological spaces in full generality are, confusingly, not very topological in the naive sense! As discussed in this math.SE question, I think it is better to think of point-set topology as being about semidecidable properties (which are the open sets). The familiar kind of topology induced by a metric is about the specific property of being close in a metric sense, but other kinds of topologies are about different kinds of properties.
The Zariski topology is about the property of non-vanishing of polynomials. The semidecidable properties here are the properties "this set of polynomials does not vanish here." Intuitively speaking the reason this is semidecidable is that you can compute the value of a polynomial at a point to finite precision and once you show that it is sufficiently different from zero it cannot be zero.
The fact that the Zariski topology isn't Hausdorff isn't a weird property of the Zariski topology; it tells you something important about how vanishing of polynomials behaves, namely that the behavior of a polynomial on a few points can tell you a lot about its behavior at seemingly far-away points. This is intrinsic to the nature of algebraic geometry and pretending that the Zariski topology doesn't exist won't make it go away.
Okay, so what can you actually do with it? Here are a couple of things:
- If two polynomials agree on a Zariski-dense subset, then they agree identically. This is a surprisingly useful way to prove polynomial identities; for example, it can famously be used to prove the Cayley-Hamilton theorem.
- Moving to the Zariski topology on schemes allows the use of generic points. I am not familiar with examples of this technique in use though.
- Serre famously made use of the Zariski topology to introduce sheaf cohomology to algebraic geometry, which was (as I understand it) a crucial innovation.
To really appreciate the Zariski topology it helps to generalize it to arbitrary commutative rings. An important motivational example: if $X$ is a compact Hausdorff space and $C(X)$ is the ring of continuous functions $X \to \mathbb{R}$, then the maximal spectrum of $C(X)$ not only can be identified with $X$, but has the same topology! (This is an exercise in Atiyah-MacDonald.)
The rings one gets in this way are precisely the real subalgebras of complex commutative C*-algebras by the commutative Gelfand-Naimark theorem, and in fact you get a (contravariant) equivalence of categories. Moreover, by the Serre-Swan theorem, the category of real vector bundles on $X$ is naturally equivalent to the category of finitely-generated projective modules over $C(X)$.
It helps to think about this example like a physicist. Think of $X$ as the set of possible states of some physical system and the elements of $C(X)$ as observations one can make about the system; the value of a function at a point is the result of the observation in a fixed state. The Zariski topology here captures all semidecidable properties that you can decide using the observations in $C(X)$. For example, if one of the functions in $C(X)$ is called "temperature," there is a corresponding semidecidable property "the temperature of the system is between $0$ and $100$ degrees inclusive," which you can decide by computing the temperature to finite precision.
(What if $X$ is not compact? Then if you work with the ring $C_b(X)$ of bounded continuous functions on $X$, there are consistent sets of possible values of the observables which do not arise from an actual state of your system; they are points in the Stone-Čech compactification $\beta X$ instead.)
Here's another example that I like: let $B$ be a Boolean ring, which is a ring satisfying $b^2 = b$ for all $b \in B$. Then every element of $B$ can be identified with a subset of its maximal spectrum. This idea can be used to
- prove Stone's representation theorem for Boolean algebras,
- deduce the existence of ultrafilters from the existence of maximal ideals in rings, and
- prove the compactness theorem in propositional logic (without proving the completeness theorem)!
For a discussion, see my blog post Boolean rings, ultrafilters, and Stone's representation theorem.
- 
2I think that it’s better to think of point-set topology as being about topological spaces *tout court*. That includes spaces with little additional structure every bit as much as it includes spaces with fancy structure. You’re really talking not about how to think point-set topology, but rather how to think about to think of its applications. – 2012-06-22
- 
1@Qiaochu Yuan♦, your answers are always illuminating. You are younger than me and your preparation exceeds that of many professors that I have seen in my life. Congratulations – 2012-06-22
- 
2I think "between 0 and 100 degrees inclusive" should be "between 0 and 100 degrees exclusive". – 2012-06-23
I would just like to mention a pleasant feature of the Zariski topology which is , to my knowledge, never addressed in algebraic geometry books (counterexample anybody?)
The Zariski topology is never Hausdorff in positive dimension, but apart from that it is normal ($=T_4$)  in the affine case.
This means that for an affine variety (or an affine scheme) $X$, given two disjoint closed subsets $C,D\subset X$ there exists a regular function $f\in \mathcal O(X)$ with $f(c)=1$ for all $c\in C$ and $f(d)=0$ for all $d\in D$.
More astonishingly yet, you can even take arbitrary regular functions $g\in \mathcal O(C), h \in \mathcal O(D)$ and interpolate them to an $f\in \mathcal O(X)$ such that $f\mid C=g$ and $f\mid D=h$
This is due to the fact that in algebraic geometry you define the functions first,  the polynomials (or one of their quotient rings), and then you deduce from them a topology.
In classical topology (as in calculus or analysis) you define the topological space first (through a metric, say) and then you investigate the continuous functions on these spaces.
 And so it can happen (in contrast to algebraic geometry) that you don't have enough functions to separate disjoint closed subsets from each other.
Edit
In the same vein (equivalently, really) let me mention that affine algebraic varieties (or affine schemes ) satisfy the Urysohn property: every regular function on the closed  $C\subset X$ extends to a regular function on $X$.
In the language of schemes it is the absolute triviality that, for $C=V(I)$, the morphism   $\mathcal O(X)=A \to \mathcal O(C)=A/I$ is surjective!
And it is a triviality because it is built into the foundations of algebraic geometry: the Zariski topology is constructed out of the  functions (and Grothendieck's genius was to  force every element of any commutative  ring to be a function!).
- 
0Dear Georges: there seems to be a typo in your second paragraph: I think you meant to say "$f(d)=0$ for all $d \in D$". (What you wrote is correct, but perhaps less interesting.) – 2013-10-24
- 
0Dear @Asal, I have corrected my typo: thanks a lot for drawing my attention to it. That what I actually wrote was "perhaps less interesting" must be the understatement of the day: congratulations for being so polite and amusing at the same time! – 2013-10-27
- 
0Dear Georges, you're welcome. I find that a little _litotes_ can be not ineffective in conveying a point gently. As for politeness, I take my lead from you! – 2013-10-27
- 
0Dear @Georges Is a continuous function with respect to the Zariski topology necessarily regular? If not, we cannot say this satisfies T4 axiom. – 2017-06-21
- 
0Dear @Hang: you are right! My remark only points to an analogy: regular functions on a variety behave like continuous functions on a T4 space. But indeed, as you correctly remark, continuous functions neeedn't be regular. – 2017-06-21
I'd like to give my personal perspective which I believe is a more elementary version of Zhen Lin's. What I have been able to explain for myself is a) why the Zariski topology is natural to consider when talking about vanishing sets, b) how the Zariski topology on sets of prime ideals of a ring $R$ suggests that locally ringed spaces are good general objects for geometric considerations, c) why the Zariski topology on the set of all prime ideals $\DeclareMathOperator{\Spec}{Spec}\Spec R$, gives us affine schemes, and d) when it is ok to use the Zariski topology only on the set of maximal ideals $\DeclareMathOperator{\maxSpec}{maxSpec}\maxSpec R$ (so in particular why over an algebraically closed $\Bbbk$ we can think of $\mathbb A^n_\Bbbk$ as $n$-tuples $(a_1,\dots,a_n)$ of elements of $\Bbbk$ identified with the maximal ideals $\left
Vanishing Sets and the Zariski Topology
Imagine that we have a ring $R$ and a set (space) $X$, so that we think of $R$ as "functions" on $X$, in the sense that for every $x\in X$ there is a set $R_x$ of "values at X" such that we can think of $x$ as a (surjective) evaluation function $x\colon R\to R_x$ given by $f\to f(x)$. If we try to axiomatize the properties of the notion of "$f\in R$ vanishes at $x\in X"$, we arrive at:
- $f\in R$ and $g\in R$ have the same value at at $x$, if and only if $(f-g)(x)$ vanishes at $x$;
- If $f$ vanishes at $x$, then $(f\cdot g)(x)$ also vanishes.
These are enough to ensure that every $x\colon R\to R_x$ induces a ring structure on $R_x$ such that the set of functions $f$ vanishing at $x$ is precisely the ideal $\ker x\subset R$. Requiring that constant unit (i.e. $1$) does not vanish anywhere ensures that the ideals are proper, i.e. none of $R_x$ is the trivial zero ring.
It is not difficult to show that given a set of points in our space $S\subset X$, the set of functions in $R$ vanishing on $S$ is an ideal $I(S)$ of $R$, and in particular that it is the intersection of the ideals $\ker x$ associated to the points $x\in S$, i.e. $I(S)=\bigcap\{\ker x\colon x\in S\}$. Similarly, given any set of functions $J\subset R$, the vanishing set of points $V(J)=\{x\in X\colon f(x)=0(x)\forall f\in J\}$ can be described as the set of points $x$ whose associated ideal $\ker x$ contains $J$, i.e. $V(J)=\{x\in X\colon J \subset \ker x\}$.
Since the idea of algebraic geometry is to establish geometric objects as zero-loci of functions, that is, as vanishing sets, we care about the following easy to check properties of the operator $V$:
- $V(I)=V(\left)$, so from now on we'll only considers ideals of $R$ as our sets of functions $I$, $J$, etc.
- $V(0)=X$
- $I\subset J$ implies $V(J)\subset V(I)$
- $V(\sum_\lambda I_\lambda)=\bigcap_\lambda V(I_\lambda)$
- $V(I)\cup V(J)\subset V(I\cap J)$
The last statement is NOT an equality in general. Indeed if $\ker x$ is not a prime ideal, then letting $fg\in\ker x$, but $f,g\not\in\ker x$, we get that $x\not\in V(f)\cup V(g)$, but $x\in V((f)\cap (g))$. This is bad, since it meas that the vanishing sets in this general context are not necessarily closed under finite unions, which makes it extremely difficult to effective decompose them into smaller pieces. Pretty much the only way to obtain an easily verifiable condition of the closure under finite unions is to require that all the associated ideals $\ker x$ are prime (so that $R_x$ are integral domains), in which case the vanishing sets $V(I)$ satisfy the axioms for closed sets of a topology, which I call the Zariski topology induced by $R$ on $X$.
Note that $X$ can be thought of as a mutliset of prime ideals of $R$.
Vanishing Sets and Locally Ringed Spaces
We want to do more: we want to study the sheaf of vanishing sets on $X$. Of course, this makes no sense as I've stated it since sheaves are defined relative to a topology (roughly if something is a local phenomenon, then it is a sheaf), and we have not specified a topology on $X$. Observe, however, that being a closed set in a topology is a local property in that topology, in the sense that if $S$ is locally closed relative to every open $U\subset X$, then $S$ is closed in $X$. It follows that under the Zariski topology on $X$ induced by $R$, vanishing sets are a sheaf.
But if vanishing sets are a sheaf, and each vanishing set is given by a ''function'' in $R$ on $X$, we better make ''functions'' on $X$ into a sheaf as well. There is essentially one reasonable way to do this, which is by restricting appropriately functions in $R$ to open subsets $U$.
First, a simplification. Since every vanishing set is generated as the intersection of hypersurfaces (vanishing sets of single ''functions'', since $I=\sum_\alpha (f_\alpha)$ we have that $V(I)=\bigcap_{\alpha}V(f_\alpha))$, it is completely useless to have ''functions'' $f\in R$ that do not vanish at any point $x$: they provide extra ideals which say nothing about the points of the space. It is clear that we should demand that any $f$ that does not vanish anywhere should be a unit of $R$, and to achieve this we may replace $R$ with its localization $S^{-1}R$ at the multiplicative system $S=\{f\in R\colon f(x)\neq0\forall x\in X\}$ (the system is multiplicative since the $R_x$ are integral domains). This leaves the vanishing sets exactly the same, while giving us a slightly simpler ring to encode them (the fewer ideals, the better).
Having said this, suppose that we have an open set $U\subset X$. Whatever ring $R_U$ we associate to $U$, we want its vanishing sets to be closed sets of $U$. We should also have a restriction map $\DeclareMathOperator{\res}{res}\res_{X,U}\colon R\to R_U$ to tell us how to restrict ''functions'' on $X$ to ''functions'' on $U$. This map should be a ring homomorphism if we have any sense of decency (plus its inverse has to map the ideal of $R_U$ vanishing at $S\subset U$ to the ideal of $R$ vanishing at $S\subset X$). Furthermore, given the above convention, if $f\in R$ does not vanish on any points of $U$, then it should get sent to a unit. Hence, $R_U$ will necessarily admit a homomorphism from the localization $S^{-1}R$ where $S=\{f\in R\colon f(x)\neq0(x)\forall x\in U\}$. Thus, we can define what I call the ''structure presheaf'' $\mathscr F_X$ by $\mathscr F_X(U)=S^{-1}R$ for $S=\{f\in R\colon f(x)\neq0\forall x\in U\}$, and setting $\res_{U,V}$, the restriction map from functions on $U$ to functions on $V$, to be localization of $R_U$ at $S=\{f\in R_U\colon f(x)\neq0\forall x\in V\}$.
A key property of this presheaf is that its stalks are local rings and that they encode vanishing. In particular, since $f$ vanishes at a point $x$ if and only if $f\in\ker x$, then it is not hard to see that the stalk $\mathscr F_{X,x}$ at $x$ is the localization of $R=\mathscr F(X)$ at the prime ideal $\ker x$, and hence that $f$ vanishes at a point $x$ if and only if $x$ is a non-unit in the stalk $\mathscr F_{X,x}$. Consequently, the sheaffification $\mathscr O_X$ of $\mathscr F_X$ is precisely what I call the ''structure sheaf'' of $X$ (remember that $X$ is a multiset of prime ideals, not all prime ideals which is the usual context for the structure sheaf). This sheaf is quite elusive, but has the property that $(X,\mathscr O_X)$ is a locally ringed space (stalks are local rings), and that vanishing sets can be extracted from the stalks $\mathscr O_{X,x}$ by saying that $f\in\mathscr O_X(U)$ vanishes at a point $x$ if $f$ localizes to a non-unit at $\mathscr O_{X,x}$. Hence the study of vanishing sets becomes a special case of the study of locally ringed spaces!
Affine Schemes -- the most basic locally ringed spaces
Why is the structure sheaf $\mathscr O_X$ elusive (the one from above for $X$ a set with a ring $R$ of ''functions'' on it)? Because $\mathscr O_X(U)$ is not necessarily $\mathscr F_X(U)$, the localization of $R$ at the set of functions that vanish nowhere on $U$. In fact, the top ring $\mathscr O_X(X)$ itself is not necessarily $R=\mathscr F_X(X)$, which means that its actually really hard to compute $\mathscr O_X$. In particular, the path to affine schemes begins with trying to compute $\mathscr O_X$.
One easy explicit description of $\mathscr O_X$ comes from noticing that if we set for any $f\in R$ $X_f=X\setminus V_f$, then the $X_f$ are a basis of open sets for $X$ in the Zariski topology since $X_{fg}\subset X_f\cap X_g$ and $\bigcup_\alpha X_{f_\alpha}=X\setminus V(\sum(f_\alpha))$.
We know that $\mathscr F_X(X_f)=R_f$ where $R_f$ is the localization of $R$ at $R_f=S^{-1}R$ for $S=\{g\in R\colon g(x)\neq0\forall x\in V(f)\}$, or equivalently, at $S=\{f,f^2,\dots\}$. Hence, the structure sheaf $\mathscr O_X$ is fully determined by the $R_f$ according to the rule $\mathscr O_X(U)=\varprojlim_{X_f\subset U}R_f$. This is still really hard to compute unless a certain miracle occurs, which is the following: if $V(f)\subset V(J)$ implies $J\subset\DeclareMathOperator{\rad}{rad} f$, then the $X_f$ satisfy what Eisenbud and Harris call the $\mathscr B$-sheaf axioms, which imply that $\mathscr O_X(X_f)=\mathscr F_X(X_f)=R_f$.
Why doesn't $V(f)\subset V(J)$ always imply $J\subset\rad J$ always? Well, we certainly have $I(V(J))\subset I(V(f))$, and $J\subset I(V(J))=\bigcap\{\ker x\colon J\subset\ker x\}$, but $I(V(f))=\bigcap\{\ker x\colon f\subset\ker x\}$ which could be strictly bigger than $\rad f=\bigcap\{\mathfrak p\subset R\colon f\in\mathfrak p\}$ as not all prime ideals of $R$ are necessarily $\ker x$ for some $x\in X$. Requiring that every prime ideal $\mathfrak p\in\Spec R$ correspond to an $x\in X$, and removing the unnecessary duplicates (two points $x$ and $y$ are not distinguishable by vanishing sets if $\ker x=\ker y$), we obtain that $X=\Spec R$ is a simple sufficient condition for the structure sheaf $\mathscr O_X$ to be mostly computable (we would know that $\mathscr O_X(X_f)=R_f$).
\maxSpec
So far I have explained (to the best of my ability) why the Zariski topology on $\Spec R$ is natural, which does not answer the question posed if we think (as it is often done) of $\mathbb A^n_\Bbbk$ as the set of maximal ideals of $\maxSpec\Bbbk[x_1,\dots,x_n]$ rather than the set of prime ideals $\Spec \Bbbk[x_1,\dots,x_n]$, which is done quite frequently. The reason for doing this is the Jacobson property of rings, which a ring has if every prime ideal is the intersection of the maximal ideals containing it. It should be clear from the above that for such rings $R$ we also have that $V(f)\subset V(J)$ implies $J\subset\rad f$ since certainly the intersection of the prime ideals containing $f$ is the same as the intersection of the maximal ideals containing $f$ whenever $R$ is Jacobson. Hence, the structure sheaf for $X=\maxSpec(R)$ satisfies $\mathscr O_X(X_f)=R_f$ when $R$ is Jacobson.
So when is $R$ Jacobson? Well, as can be read in Eisenbud's Commutative Algebra with a View toward Algebraic Geometry, fields are certainly Jacobson, $\mathbb Z$ is Jacobson, and the most general version of the Nullstellensatz: if $R$ is a Jacobson ring then so is $R[x]$. So in particular, $\Bbbk[x_1,\dots,x_n]$ is Jacobson, which is why we can do algebraic geometry on $\mathbb A^n_\Bbbk$ using $\maxSpec$ instead of $\Spec$ (so that for $\Bbbk$ algebraically closed, for example, we can identify $\mathbb A^n_\Bbbk$ with $n$-tuples of points in $\Bbbk$).
- 
3This is a well thought-out, mature analysis: +1 – 2012-07-05
Let's ask you this - can you suggest another reasonable topology? here, we just demand that $\{0\}$ will be a closed set and that polynomials are continuous. These two conditions are certainly reasonable to demand. Do you have any other idea for a topology which you will be able to define without any reference to a specific field?
- 
2This is not really an answer but a comment. – 2012-06-22
- 
1Questioning to the question is not an answer – 2012-06-22
- 
0For example in euclidean topology polynomials are continous and $\{0\}$ is closed. – 2012-06-22
- 
5This is a rhetorical question. Anyhow, as I said, the topology is forced on us once we take {0} to be closed and polyonmials to be continuous. – 2012-06-22
- 
1@Galoisfan, to define the euclidean topology you must make a reference to a specific field. Given an abstract field which you know nothing about, you cannot define a topology like the euclidean topology. – 2012-06-22
- 
0@anonymous About this point you're right! – 2012-06-22
- 
8This explanation is also [how Eisenbud and Harris explain the Zariski topology in *The Geometry of Schemes*](http://i.stack.imgur.com/89OBY.png). – 2012-06-22
- 
3Just for the sake of a counterexample, the discrete topology on any field will make the polynomials continuous; but it is true that the Zariski topology is the weakest topology in which the polynomials are continuous. – 2012-06-23
- 
0@Norbert: you can't *seriously* be questioning the Socratic Method... ? (And sorry for the late reply!) – 2013-10-27
- 
0"Do you have any other idea for a topology .....?" Why should we even equip it with a topology, we're just studying roots of polynomials – 2016-07-14
Here is a somewhat sophisticated reason for using the Zariski topology, but perhaps it will be more convincing to someone with algebraic or logical leanings.
Suppose we are agreed that localising at prime ideals is a good thing to do when studying commutative rings – this shouldn't be too controversial, given the good properties that local rings and localisation have. Unfortunately, not every commutative ring is a local ring. Nonetheless, by a suitable "change of base", every commutative ring "becomes" a local ring!
To be precise, let $R$ be a commutative ring, and let $\mathcal{R}$ be the category of finitely-presented $R$-algebras. The Zariski topology (in the sense of a Grothendieck topology) on $\mathcal{R}^\textrm{op}$ has the following universal property: there is an equivalence of categories between the category of all local $R$-algebras and the category of all left exact Zariski-cocontinuous functors $\mathcal{R}^\textrm{op} \to \textbf{Set}$, where such a functor $F : \mathcal{R}^\textrm{op} \to \textbf{Set}$ corresponds to the local ring $F(R)$, and a local ring $A$ corresponds to the functor $\textbf{Alg}_R(-, A)$. Something similar is true when we replace $\textbf{Set}$ with any other locally small and cocomplete topos, so we say that the classifying topos for local $R$-algebras is the Grothendieck topos $\textbf{Sh}(\mathcal{R}^\textrm{op}, \textrm{Zar})$.
Now, by a Yoneda-style argument, there is a "universal" local $R$-algebra in $\textbf{Sh}(\mathcal{R}^\textrm{op}, \textrm{Zar})$, namely the functor $\mathscr{O} = \textbf{Alg}_R(R, -)$: informally, we might say that the ring $R$ "becomes" a local $R$-algebra in $\textbf{Sh}(\mathcal{R}^\textrm{op}, \textrm{Zar})$! But what does this have to do with schemes? Well, according to the Grothendieck school, we should think of a Grothendieck topos as "representing" a space of some kind; but by the very universal property of $\textbf{Sh}(\mathcal{R}^\textrm{op}, \textrm{Zar})$ as a classifying topos, there are as many points of $\textbf{Sh}(\mathcal{R}^\textrm{op}, \textrm{Zar})$ as there are local $R$-algebras – i.e. a proper class!
However, by a miracle I do not yet fully understand, the scheme $\operatorname{Spec} R$ can be extracted as follows: we take the full subcategory $\mathcal{B}$ of $\mathcal{R}^\textrm{op}$ spanned by the principal localisations of $R$ (i.e. those rings of the form $R [1/f]$), and take the induced topology on $\mathcal{B}$ to form the Grothendieck topos $\textbf{Sh}(\mathcal{B}, \textrm{Zar})$. There is then an essential geometric morphism $$i_! \dashv i^* \dashv i_* : \textbf{Sh}(\mathcal{B}, \textrm{Zar}) \to \textbf{Sh}(\mathcal{R}^\textrm{op}, \textrm{Zar})$$ induced by the inclusion $\mathcal{B} \hookrightarrow \mathcal{R}^\textrm{op}$, and by the universal property of $\textbf{Sh}(\mathcal{R}^\textrm{op}, \textrm{Zar})$, the inverse image $i^* \mathscr{O}$ of the universal local $R$-algebra $\mathscr{O}$ is a local $R$-algebra in $\textbf{Sh}(\mathcal{B}, \textrm{Zar})$. Because $\mathcal{B}$ is a preorder category, $\textbf{Sh}(\mathcal{B}, \textrm{Zar})$ is a localic topos, and it can be shown that it is equivalent to the topos $\textbf{Sh}(\operatorname{Spec} R)$, and under this equivalence, $i^* \mathscr{O}$ is identified with the structure sheaf of $\operatorname{Spec} R$.
A similar argument can be used to justify the definition of the étale topology: in the étale topology for $\operatorname{Spec} R$, $R$ "becomes" a strictly henselian local ring. This is explained by Wraith in his 1979 paper, Generic Galois theory of local rings.
The following answer has a similar spirit to Zhen Lin's. Like his answer, it has a strong logical flavor; unlike his, no toposes appear (though they are lurking in the background).
That said, allow me a slight switch in terminology: Let's define $Spec A$ not as the set of prime ideals of $A$, but as the set of filters in $A$. The axioms for a filter are precisely dual to the axioms of a prime ideal, so that a subset of $A$ is a prime ideal if and only if its complement is a filter. For instance, while prime ideals have the axiom "$x \in \mathfrak{p} \wedge y \in \mathfrak{p} \Rightarrow x+y \in \mathfrak{p}$", filters have the axiom "$x + y \in F \Rightarrow x \in F \vee y \in F$".
It happens that the axioms of a filter have a certain logical form; they form a so-called "geometric theory". For any geometric theory, there is an associated space of its models, which will be automatically endowed with a suitable topology. In the case of the geometric theory of filters, a model is precisely a filter, so the space of models coincides with the spectrum; and the automatically given topology is precisely the Zariski topology.
A very readable introduction to this point of view are notes by Steve Vickers ("Continuity and Geometric Logic").
