# Ultrafilters – I. the Stone-Čech compactification

Hi, I’m a second year math undergrad at UChicago. This group blog of my peers is in large part a result of the idea for the title, which is hilarious. Hopefully it will eventually lead to something interesting or substantive, or at least neither.

Beginnings are boring, so let’s hop to it. This will be a fluffy-ish post about the pedagogy of the Stone–Čech compactification, as a setting in which to introduce ultrafilters. Later, I hope to talk about Stone duality (or at least representation of Boolean algebras), ultralimits/ultrapowers, and topological connections to logic/model theory. I won’t discuss filters in relation to convergence or nets much, since those discussions are extremely common, and I find the exposition a bit tedious.

Terry Tao’s fantastic voting analogy for ultrafilters is one of my favorite intuitions I’ve ever read. After sitting through a lecture of said compactification this week, I thought about the analogy as it related to the ultrafilter construction.

The lecture was by Prof. Calegari, one of the best teachers I’ve ever had at conferring intuition with interesting examples, but unusually, I still felt uneasy at the end. I’ve never made a satisfying connection between the interval-product construction (which naturally follows from the universal property) and the ultrafilter construction (which feels more intuitive and concrete to me) on discrete topological spaces, and the lecture made no attempt to.

Let’s start at the beginning and see where our nose takes us. A few preliminary definitions:

A filter ${\mathcal{D}}$ on a set ${X}$ is a collection of subsets of ${X}$ which is upward closed, closed under intersections. An ultrafilter is a filter such that for each ${A\subset X}$, either ${A}$ or ${X\setminus A}$ is in ${\mathcal{D}}$.

The usual intuition is that a filter can measure what sets are “big,” hence the upward closure – of course supersets of big sets are big. Further, this “bigness” is “hiding” somewhere; it can’t be some distributed or emergent property like majority or positive density. So (dropping the scare quotes to reduce ridiculousness) we can find where the bigness comes from by taking intersections. For example, in the case of what’s known as a principal ultrafilter, the bigness hides in a particular subset ${A\subset X}$, and the big sets are precisely those containing ${A}$. Of course, principal ultrafilters then must have all that bigness concentrated in a singleton set, a veritable God of bigness who generously gives her blessings to all sets which include her.

I’ve never really liked this, because the intersection/distributed bigness property didn’t really seem precise or intuitively formulated to me. Calegari had an interesting way of presenting ultrafilters as a ${\{0,1\}}$-valued finitely-additive pseudo-probability measure on subsets of ${X}$, but while this is neat, I don’t think it adds much intuition: the lack of countable additivity makes the probability framing a bit strange and foreign.

So I was delighted by Tao’s voting intuition. It’s probably best to read about it at his blog; I’ll explain it briefly after some other setup. It begins with the usual motivating example for ultrafilters (outside of model theory).

I. Preliminaries and intuition

Take ${X=\mathbb{N}}$, and consider functions to some compact interval, say, ${f:X\rightarrow [0,1]}$ – essentially, bounded real sequences.

Let’s first show the way Calegari presented it, which I think is pretty standard: by compactness, all such functions have convergent subsequences, but many don’t actually converge. What if we wanted to make all these functions converge? Well, there are potentially many different subsequence limits for any given sequence, so there is no canonical way of choosing a limit. But every ultrafilter provides a uniform way to choose a limit: roughly, exactly one of the preimages of ${[0,1/2)}$, ${[1/2,1]}$ is a “big” set under an ultrafilter of ${X}$. Then further divide that interval, to get a big set which maps to a still smaller interval; repeat ad infinitum. We can think of the function being evaluated at ultrafilter ${\mathcal{D}}$ as its limit under ${\mathcal{D}}$, and this provides an extension of ${f}$ to the set of ultrafilters of ${X}$ which plays nicely, in the sense of algebra working normally (${(f+g)(\mathcal{D})=f(\mathcal{D})+g(\mathcal{D})}$, etc.).

Notice the nice property that evaluation at a “normal” integer ${k}$ can be seen as evaluation at the principal ultrafilter generated by ${k}$, since ${k}$ is “big” by itself in that ultrafilter, so the limit is just the value of ${f(k)}$. Hence we can think of the original ${X}$ as being embedded in the ultrafilter set, which we will denote as ${\beta X}$ – the Stone-Čech compactification, with ${\beta f}$ being the extension of ${f}$. The analogous construction is, I hope, clear on general ${X}$.

Tao’s idea is just to look at this as a voting system. Instead of ${[0,1]}$, let’s say we have a compact discrete set, like ${\{0,1\}}$, which are two political positions. We have an infinite population of people, namely, ${\mathbb{N}}$, who need to choose a decision on a given issue ${f}$, on which they have positions ${f(1),f(2),\ldots}$. A political choice is kind of like a limit – if a lot of people want something(-ish), that’s a sign we should pick it. So how do we choose coherently?

Well, we can think of “bigness” as corresponding to groups which have political power. For a given division of the people into voting for ${0}$ and voting for ${1}$, we should make the choice the same way for each issue – exactly one of the two sets is “big” and will be chosen. Further, if a set of the population is big enough to choose the outcome, certainly any bigger one is.

So that’s two out of three properties of the ultrafilter. The last one, intersectionality, is because suppose we had two propositions ${f}$ and ${g}$ – then if ${f}$ and ${g}$ both pass, then the proposition “${f}$ and ${g}$” should pass, so the intersection of their big sets of supporters should also be big. In short, a perfect political system.

We could give all this power to one person, in which case we have a principal ultrafilter – a dictatorship. Nonprincipal ultrafilters are precisely those which are not dictatorships, and thus cannot have any finite big sets at all. In fact, it is not hard to see that changing any finite number of people’s positions cannot change the outcome at all; individual votes are utterly useless – so we have a democracy of sorts. Eat your heart out, Hofstadter.

This is easily extended to any finite discrete set. On more general compact sets, like ${[0,1]}$, the choice of ultrafilter can still be thought of in the same way, as defining a social choice of sorts, though now the power of the big sets isn’t that their choice passes, but that something pretty close to it (a limit!) does – the compromise necessary when one has a spectrum of possible outcomes.

Setting our intuition aside for now, the point is to compactify, and did we do that? It kind of seems like it – the image of ${\beta X}$ under ${\beta f}$ is certainly compact, since all the subsequential limit points have been added via ultrafilters. It seems like our construction – and the voting analogy – breaks down when we try to compactify a more general set like ${(0,1)}$, but for any discrete set, at least, we seem to have succeeded.

As it turns out, our suspicions are correct, but this is not obvious. After, what is even the induced topology on the ultrafilter space ${\beta X}$? Outside of our concrete considerations of bounded maps from ${X=\mathbb{N}}$ into the reals, why is this a natural construction?

II. The universal property and functoriality

Taking a very general point of view, compactification of a space ${X}$ in general just involves an embedding map into a compact space, making sure it’s “minimal” in a sense; i.e. we should demand that ${X}$ is dense in it. Just as the (hopefully) familiar Alexandrov one-point compactification is the “coarsest” possible compactification of a locally compact space, in the sense that it, by construction, gives only the bare minimum of extra open sets, we can think of the Stone-Čech compactification as the finest, or most general possible compactification.

In fact, it satisfies the following universal property:

Every map ${f}$ of ${X}$ into compact Hausdorff space lifts to a map ${\beta f}$ from ${\beta X}$, as we saw in our ultrafilter example.

Another way to look at this is to see that the Stone-Cech compactification has a simple abstract niceness which the Alexandrov compactification lacks: it’s a functor ${\beta: \text{Top} \rightarrow \text{Top}_{\text{CHaus}}}$, from the category of topological spaces to the category of compact Hausdorff spaces. From this perspective, it is simply the left adjoint of the fully faithful forgetful functor in the opposite direction, and the universal property is obvious, as of course ${\beta K = K}$ for compact Hausdorff ${K}$.

Given this very abstract and “natural” motivation, we can come up with the following naive construction of ${\beta}$: given ${X}$, take the closure of its image in

$\displaystyle \prod_{f\in \mathcal{F}} K_f$

where ${\mathcal{F}}$ is all continuous maps of ${X}$ into compact Hausdorff spaces, and ${f(X)\subset K_f}$. This is compact by Tychonoff’s theorem, and clearly satisfies our universal property in either formulation. Of course, this doesn’t work because ${f}$ is a proper class, but we can easily fix it by seeing that something as simple as ${[0,1]}$ is a cogenerator of ${\text{Top}_{\text{CReg}}}$. Then we have that ${X}$ is included in

$\displaystyle \prod_{f\in \hom(X,[0,1])} [0,1]_f.$

and the closure of its image is our compactification. Roughly, the way the cogenerator argument works is that compact Hausdorff spaces will all embed in some unit hypercube (see, e.g., Munkres), so morphisms from ${X}$ to different compact Hausdorff spaces can all be distinguished inside our unit interval product, giving us our universal property.

A technical point: the inclusion ${X\rightarrow \beta X}$ is an embedding only for completely regular Hausdorff ${X}$ (a well-known but tedious result, see e.g. this), so only in that case is our construction a “compactification”. However, the categorical niceness only requires continuous maps (since those are the arrows in ${\text{Top}}$), so our functor is on all of ${\text{Top}}$ regardless.

III. The Stone topology

With definitions out of the way, we can get to interesting stuff. So we have this natural construction, from an abstract point of view. Let’s consider our construction on just discrete spaces again. Where are the ultrafilters?

Well, let’s return to our voting analogy. The Stone-Čech compactification can be seen as democratization – transitioning from individual decisions (corresponding to principal ultrafilters) to our nonprincipal system in which nobody’s choice matters.

The image of ${X}$ in ${\prod_{f\in \mathcal{F}} [0,1]_f}$ is all the sets of “dictatorial outcomes” – the collection of sets of preferences of one “voter” (element of ${X}$) across all issues ${f}$. Its closure ${\beta X}$ then includes also the “democratic outcomes” for a fixed democratization (choice of which sets of voters are “big” enough to make decisions). Intuitively, the outcomes associated with non-principal ultrafilters are among the limit points of the image of ${X}$ because they are arbitrarily close (subsequential limit) to dictatorial outcomes on every set, and the converse holds because the intersection of finitely many “big” sets (indices defining a class of subsequential limits) is big (still defines the same subsequential limits). This can be rigorized in a fairly straightforward way.

What about the induced topology? In the product construction, these are obviously just the intersections of neighborhoods in ${\prod [0,1]}$ with the closure of the image of ${X}$, putting open-set restrictions on finitely many of the coordinates. But this doesn’t seem to lend to very much intuition. ${X}$ is discrete, but could be of any size; conceivably, its image in the various ${[0,1]_f}$ factors could look like anything, and open subset of the product could slice up finitely many of these images in any way. This formulation doesn’t help us work with the topological structure much.

Let’s take a different tack. The open sets on ${\prod [0,1]_f}$, as with any product topology, can be generated by the subbasis ${\pi^{-1}_f(U)}$ for any open ${U\subset [0,1]}$, where ${\pi_f}$ is the projection onto the coordinate corresponding to function ${f}$. We find that ${U}$ restricts the ${f}$ coordinate of to a certain set of evaluations of ${f}$ on ${X}$ – say, ${x\in X_U}$ – as well as all limit points thereof.

As we saw earlier, limit points correspond to nonprincipal ultrafilters, and in fact we find our key correspondence: ${\text{points in the closure of } X_U \leftrightarrow \text{nonprincipal ultrafilters containing } X_U}$, something which is not too difficult to see: ${X_U}$ is “big” in a voting system precisely when its say in the voting system allows it to determine the outcome – i.e., either one of its members is a dictator, or the result is one of its limit points. Rigorizing this is left to the reader.

Then in this construction, it is immediate to see that the subbasis element ${\beta X \cap U}$ is uniquely specified by this set of ultrafilters, since each ultrafilter corresponds to a unique evaluation across all coordinates.

Thus, if we prefer to view ${\beta X}$‘s underlying set as the set of ultrafilters on ${X}$, we have as subbasis the collection of sets of the form ${\hat{A}=\{\mathcal{U} \in \beta X|A\in \mathcal{U}\}}$ for all open ${A\subset X}$ – sets of ultrafilters which consider ${A}$ as “big”. More or less, this is all democratic systems in which “voting bloc” ${A}$ holds power. This is known as the Stone topology.

It’s not hard to see that ${\widehat{A}\cup \widehat{B} = \widehat{A\cup B}}$ and ${\widehat{A}\cap \widehat{B} = \widehat{A\cap B}}$, so this is actually a basis! In fact, the only open sets not contained in this basis are infinite unions of the basis elements

We can further prove that ${\widehat{X\setminus A} = \beta X \setminus \widehat{A}}$ as well; all three of these facts are instructive exercises. As a result, all the open sets besides the infinite unions are actually clopen! Symmetrically, the closed sets are precisely arbitrary intersections of our base sets, with finite ones being clopen.

All these nice properties are essentially “topological translations” of properties of ultrafilters. The base being closed under finite intersections is the intersectionality of filters, closure under finite unions is the “primeness” of ultrafilters, and closure under complements is the maximality of ultrafilters.

In voting language, Hausdorffness is existence of voting blocs ${A}$ and ${B}$ which hold power in respective ultrafilters/voting systems ${F}$ and ${G}$ so that there is no ultrafilter/system in which both ${A}$ and ${B}$ are big – i.e. ${A}$ and ${B}$ are disjoint. So Hausdorffness translates to “distinct ultrafilters contain disjoint sets,” which is a corollary of the maximality of ultrafilters (prove it!). Notice that this also follows immediately from the open basis being closed under complements, which jives with our earlier interpretation.

The fact that this is a compact Hausdorff space follows immediately from equivalence with the product formulation. But just for fun, we briefly see that the Hausdorff compactness of ${\beta X}$ in this construction is also a consequence of the use of ultrafilters themselves to define topological properties.

This is, in fact, perhaps the “right” way to do this, from a “everything in mathematics is trivial from a sufficiently high vantage point” (i.e. categorical) perspective, because we simply prove the universal property, in the same vein that we did for the natural numbers earlier: if ${f:X\rightarrow K}$ is a map to a compact Hausdorff space, it can be extended to every ultrafilter on ${X}$, because the image of an ultrafilter is an ultrafilter, and per Qiaochu’s post, every ultrafilter in ${K}$ converges uniquely to a point. The definition of image filter evidently makes this map the unique continuous one under the Stone topology, so we have our ${\beta f}$.