We first present the definition of the cutoff phenomenon, as presented in the Levin-Peres-Wilmer book:

Definition (Cutoff) A family of Markov Chains has a cutoff time $t_n$ with window $w_n$ if $w_n = o(t_n)$ and

\[\lim_{\alpha \rightarrow \infty} \liminf_{n \rightarrow \infty} d_n(t_n - \alpha w_n) = 1\] \[\lim_{\alpha \rightarrow \infty} \liminf_{n \rightarrow \infty} d_n(t_n + \alpha w_n) = 0\]

First, notice that this is a definition for a family of Markov Chains, and not for a single Markov Chain. This family is parameterized by $n$, which is the size of the domain. Examples of this $n$ may be the number of cards you are shuffling, or the dimension of a vector, or the number of nodes in a graph. This means that the Markov Chain is different for each $n$, because the state space is different for each $n$.

Cutoff times are intrinsically tied to the dimension of the state space. The following statements convey the same idea, but might still be helpful to build intuition:

  1. While you may observe a cutoff time $t_n$ for a chain for some parameter $n$, you should remember that it is a cutoff time tied to that particular $n$, and that the cutoff might be (and probably will be) different for different $n$.
  2. A cutoff time is not a single time $t$ that converges as $\lim n \rightarrow \infty$, but rather a sequence ${t_1, t_2, \dots }$.

Building up the definition of Cutoff Times

Let us build up the definition slowly, to see how we could have come up with it ourselves.

The phenomenon we are trying to capture is that the total variation distance stays very close to 1 until some time step $t$ after which it quickly descends to 0. Additionally, the sharper the descent the more interesting the phenomenon is, so we would like to somehow capture the size of the interval of this descent.

Let’s say that $t$ is the time step in the middle of this rapid descent. This gives us some window $w \in [ 0, t ]$ before $t$ for the descent to start and then after for it to end. Importantly, note that $w$ as we define it here is not a parameter we get to choose - it is an observed property of this Markov Chain. It is defined as half of the time taken for the Markov Chain’s total variation distance to go from 1 to 0.


Defining a time $t$ that captures a drop in TVD

(Attempt 1) We say that our Markov Chain has a cutoff $t$ if for some $w \in [ 0, t ]$

\begin{equation} d(t - w) = 1 \end{equation}

\begin{equation} d(t + w) = 0 \end{equation}

Note that the $w \in [ 0, t ]$ condition implies that $w < t$. (This is the equivalent to what will later become the $w_n = o(t_n)$ condition in the final definition).

This definition already gives us how long it takes for total variation distance of our Markov chain to go to 1/2. Note, however, that this does not capture how fast the total variation distance goes from 1 to 0, only that it does. Therefore any Markov Chain that eventually converges to its stationary distribution (which are all irreducible, aperiodic Markov Chains) has such a time $t$. We explicitly are interested in a definition that says that this descent from TVD 1 to 0 happens very fast, so we will need to impose a rule that the window $w$ can only be so big.


Setting an upper limit on the size of the window $w$

(Attempt 2) We say that our Markov Chain has a cutoff $t$ if $w < c$ for some chosen $c \in [0, t]$ and :

\[d(t - w) = 1\] \[d(t + w) = 0\]

We can now control for how large our window can be for our Markov Chain to have a cutoff time. This allows us to filter out chains that do not converge to their stationary distribution “fast enough”, where our definition of “fast enough” is that the TVD drop from 1 to 0 should happen within $2w$.


Parameterizing by domain size $n$

The previous definition already suffices to define a cutoff time for a specific Markov Chain. The actual definition extends it as a phenomenon that appears as the problem size $n$ also increases, making it a property of a family of Markov Chains.

Let us think why parameterizing this by $n$ is something useful.

First of all, it seems like this is a phenomenon that appears only for large enough $n$, so this extension might be a result of necessity rather than just utility.

Second, even in terms of plain interestingness, I think it is interesting that the total variation distance does not drop from 1 to 0 for a Markov Chain running on a small domain (say size $n =1$) but does do so for large $n$. This parameterization does not take away from analyzing Markov Chains with small $n$, but does give us a way to say that we are only really interested when this phenomenon occurs for large problems. Our definition now becomes:

(Attempt 3) We say that our Markov Chain has a cutoff $(t_n, c_n)$ if $w_n < c_n$ for some chosen $c_n \in [0, t]$ and :

\[\liminf_{n \rightarrow \infty}d_n(t_n - w_n) = 1\] \[\limsup_{n \rightarrow \infty}d_n(t_n + w_n) = 0\]

The $\liminf$ is simply to account for the fact that a limit might not actually exist.

But now we have put ourselves in a slightly awkward position of having to choose an infinite sequence ${c_n}$ to control our windows ${w_n}$. It would be easier to not worry about this and use a constant $c$ instead across the family of Markov Chains:


A uniform upper bound on the window size for all $w_n$

(Attempt 4) We say that our Markov Chain has a cutoff $(t_n, c)$ if $w_n < c$ for some chosen $c \in [0, t_n]$ and

\[\liminf_{n \rightarrow \infty}d_n(t_n - w_n) = 1\] \[\limsup_{n \rightarrow \infty}d_n(t_n + w_n) = 0\]

We still have to deal with the somewhat annoying task of choosing $c$.

On the one hand, this lets us dictate exactly how big we allow our windows to be for all $n$.

On the other hand, if we are only interested in sharp TVD drops at large $n$, then maybe we should say that for these large $n$ we expect the cutoff window to tend towards 0, and do away with the pesky parameter $c$. Another downside of having to explicitly state a uniform $c$ is that it is restrictive for chains with small $n$, because we expect their windows to be wider.

This is a normative claim here that we are more interested in the latter. If we are fine with giving up control on how large $w_n$ can be exactly, and it suffices that for large $n$ the windows get very very small, then we could use this definition:


Doing away with the explicitly stating window sizes

(Attempt 5) We say that our Markov Chain has a cutoff $t_n$ if $w_n < c$ for all $n$ and:

\[\lim_{c \rightarrow 0}\liminf_{n \rightarrow \infty}d_n(t_n - w_n) = 1\] \[\lim_{c \rightarrow 0}\limsup_{n \rightarrow \infty}d_n(t_n + w_n) = 0\]

Now we don’t even need to choose $c$!

I believe that the above definition is a valid formulation of the actual cutoff definition used widely and reproduced at the top of this post.

However, the above equations look just a little confusing because $c \rightarrow 0$ but there is no $c$ in the equations themself. A clearer way to do this would be to have control ${w_n}$ with a multiplicative parameter $\alpha$:


(Final) Multiplicatively controlling the window sizes instead of an additively

(Final Attempt) We say that our Markov Chain has a cutoff $t_n$ if:

\[\lim_{\alpha \rightarrow 0}\liminf_{n \rightarrow \infty}d_n(t_n - \alpha w_n) = 1\] \[\lim_{\alpha \rightarrow 0}\limsup_{n \rightarrow \infty}d_n(t_n +\alpha w_n) = 0\]

We have now arrived at our final definition.