A random walk on a graph G is a random process on the vertices of G in which, at each step in the walk, we choose uniformly at random among the neighbors of the current vertex. Random walks have been studied extensively, and are used in a variety of algorithms involving graphs. For a comprehensive survey on random walks on graphs, see  , and for applications of spectral techniques to random walk theory, see  . Random walks on graphs have the useful property that given any initial distribution on the vertex set, the random walk converges to a unique stationary distribution as long as the graph is connected and not bipartite. The speed at which this convergence takes place is referred to as the mixing rate of the random walk. In a graph where a random walk has a fast mixing rate, vertices can be sampled quickly using this random process, making this a useful tool in theoretical computer science.
A non-backtracking random walk on a graph is a random walk with the added condition that, on a given step, we are not allowed to return to the vertex visited on the previous step. Viewed as a walk on vertices, a non-backtracking random walk loses the property of being a Markov chain, making its analysis somewhat more difficult. However, their study has received increased interest in recent years. Recently, Angel, Friedman, and Hoory  studied non-backtracking walks on the universal cover of a graph. Fitzner and Hofstad  studied the convergence of non-backtracking random walks on lattices and tori. Krzakala et al.  use a matrix related to non-backtracking walks to study spectral clustering algorithms. Most pertinent to the current paper, Alon, Benjamini, Lubetzky, and Sodin  studied the mixing rate of a non-backtracking walk for regular graphs. In particular, they prove that in most cases, a non-backtracking random walk on a regular graph has a faster mixing rate than a random walk allowing backtracking.
In this paper, we study the mixing rate for a non-backtracking random walk, with the goal of removing the condition of regularity needed in the results of Alon et al. in  . We take a different approach than Alon et al. by looking at the non-backtracking walk as a walk along directed edges of a graph, as is done in  . This allows us to turn the non-backtracking random walk into a Markov chain, but on a larger state space, which in turn allows us to determine the stationary distribution to which a non-backtracking walk converges for a general graph, whether or not it is regular. In the case of regular graphs, our approach allows us to compute the spectrum of the transition probability matrix for a non-backtracking random walk, expressed in terms of the eigenvalues of the adjacency matrix. This allows for easy comparison of the mixing rates of a non-backtracking random walk, and an ordinary random walk. As a corollary, this gives us an alternate proof of the result in  for regular graphs. Our approach gives more information than the approach in  , since we give the full spectrum of the transition probability matrix. In addition, we are able to compute the spectrum of the non-backtracking transition probability matrix for biregular graphs. As a corollary, we generalize the result in  for regular graphs to an analogous result for biregular graphs.
A key component in our proof is a weighted version of a result known as Ihara’s Theorem, also called the Ihara zeta identity, which relates an operator indexed by the directed edge set of a graph to an operator indexed by the vertex set of the graph. Ihara’s Theorem was first considered in the study of number theoretic zeta functions on graphs, and was first proved for regular graphs by Ihara in 1966 (see  ). Numerous other proofs have been given since, along with generalizations to irregular graphs, by Hashimoto (  , 1989), Bass (  , 1992), Stark and Terras (  , 1996), Kotani and Sunada (  , 2000), and others. We will give an elementary proof of Ihara’s Theorem that, to our knowledge, is original. In addition, we follow ideas similar to those in  to obtain a version of Ihara’s Theorem with weights that allows us to study the relevant transition probability matrices for random walks.
The remainder of this paper is organized as follows. In Section 2, we give the necessary background and preliminary information on random walks, and develop the corresponding theory for non-backtracking walks, including the convergence of a non- backtracking walk to a stationary distribution for a general graph. We accomplish this via walks on the directed edges of a graph. We also investigate bounds obtained from the normalized Laplacian for a directed graph. We also give the relevant background on Ihara’s Theorem, and a new elementary proof. In Section 3, we prove our weighted version of Ihara’s formula. Finally, in Section 4, we use this formula to obtain the spectrum of the transition probability matrix for a non-backtracking random walk for regular and biregular graphs. This gives a new proof of the result of Alon et al. concerning the mixing rate of a non-backtracking random walk on a regular graph, and generalizes this result to the class of biregular graphs.
2.1. Random Walks
Throughout this paper, we will let denote a graph with vertex set V and (undirected) edge set E, and we will let, and vol(G) denote the sum of the degrees of all the vertices of G. A random walk on a graph is a sequence of vertices where is chosen uniformly at random among the neighbors of. Random walks on graphs are well-studied, and considerable literature exists about them. See in particular  and  for good surveys, especially in the use of spectral techniques in studying random walks on graphs.
The adjacency matrix A of G is the matrix with rows and columns indexed by V given by
It is a well-known fact that the entry of is the number of walks of length k starting at vertex u and ending at vertex v. Define D to be the diagonal matrix with rows and columns indexed by V with, where denotes the degree of vertex v. A random walk on a graph G is a Markov process with transition probability matrix, so
Given any starting probability distribution on the vertex set V, the resulting expected distribution after applying k random walk steps is given by. Here we are considering and as row vectors in.
Note that, in general, P is not symmetric for an irregular graph, but is similar to the symmetric matrix. Thus, the eigenvalues of P are real, and if we order them as, then it is easy to see that with eigenvector, and. By Perron-Frobenius theory, if the matrix P is irreducible, then we have that, and if P is aperiodic, then. The matrix P being irreducible and aperiodic corresponds to the graph G being connected and non-bipartite.
The stationary distribution for a random walk on G is given by
The stationary distribution has the important property the, so that a random walk with initial distribution will stay at at each step. An important fact about the stationary distribution is that if G is a connected graph that is not bipartite, then for any initial distribution on, we have
for all v (see  ).
Knowing that a random walk will converge to some stationary distribution, a fundamental question to consider is to determine how quickly the random walk approaches the stationary distribution, or in other words, to determine the mixing rate. In order to make this question precise, we need to consider how to measure the distance between two distribution vectors.
Several measures for defining the mixing rate of a random walk have been given (see  ). Classically, the mixing rate is defined in terms of the pointwise distance (see  ). That is, the mixing rate is
Note that a small mixing rate corresponds to fast mixing. Alternatively, the mixing rate can be considered in terms of the standard (Euclidean) norm, the relative pointwise distance, the total variation distance, or the -squared distance. In general, these measures can yield different distances, but spectral bounds on the mixing rate are essentially the same for each. See  for a detailed comparison of each. For our purposes, we will primarily be concerned with the -squared distance, which will be defined below.
The mixing rate of a random walk is directly related to the eigenvalues of P.
Theorem 1 (Corollary 5.2 of) Let G be a connected non-bipartite graph with transition probability matrix P, and let the eigenvalues of P be. Then the mixing rate is.
Thus, the smaller the eigenvalues of P, the faster the random walk converges to its stationary distribution.
2.2. Non-Backtracking Random Walks
A non-backtracking random walk on G is a sequence of vertices where is chosen randomly among the neighbors of such that for. In other words, a non-backtracking random walk is a random walk in which a step is not allowed to go back to the immediately previous state. A non-back- tracking random walk on a graph is not a Markov chain since, in any given state, we need to remember the previous step in order to take the next step. In order for this to be well-defined, we assume throughout the remainder of the paper that the minimim degree of G is at least 2.
Define to be the transition probability matrix for a k-step non-back- tracking random walk on the vertices. That is is the probability that a non-backtracking random walk starting at vertex u ends up at vertex v after k steps. Note that, where is the transition matrix for an ordinary random walk on G. However, is not simply since a non-backtracking random walk is not a Markov chain.
This process can be turned into a Markov chain, however, by changing the state space from the vertices of the graph to the directed edges of the graph. That is, replace each edge in E with two directed edges (one in each direction). Then the non-back- tracking random walk is a sequence of directed edges where if, and then and. That is, the non-back- tracking condition restricts the walk from moving from an edge to the edge going in the opposite direction. Denote the set of directed edges by. The transition probability matrix for this process we will call. Observe that
Note that is a matrix. Note also that is the transition matrix for a walk with k steps on the directed edges.
Lemma 1. Given any graph G, the matrix as defined above is doubly stochastic.
Proof. Observe first that the rows of the matrix sum to 1, as it is a transition probability matrix. In addition, the columns of sum to 1. To see this, consider the column indexed by the directed edge.
The entry of this column corresponding to the row indexed by is if and if. Since this is equal to. Otherwise, the entry is 0.
Thus the column sum is
as claimed. □
Define the distribution by
where is the vector of length 2m with each entry equal to 1.
Lemma 2. Let be any distribution on the directed edges of G. If the matrix is irreducible and aperiodic, then
Proof. It follows from Lemma 1 that is a stationary distribution for. This follows because, since the columns of sum to 1, we have
Therefore, if the sequence converges, it must converge to. Now, being irreducible and aperiodic are precisely the conditions for this to converge. □
Let f be a probability distribution on the vertices of G. Then f can be turned into a distribution on as follows. Define
Conversely, given a distribution on, define a distribution g on the vertices by
Thus, given any starting distribution on the vertex set of G, we can compute the distribution after k non-backtracking random walk steps as follows. First compute the distribution on the directed edges as above, then compute, then is given by. The following proposition tells us that this converges to the same stationary distribution as an ordinary random walk on a graph.
Theorem 2. Given a graph G and a starting distribution on the vertices of G, define to be the distribution on the vertices after k non-backtracking
random walk steps. Define the distribution by (note that
this is the stationary distribution for an ordinary random walk on G). Then if the matrix is irreducible and aperiodic, then for any starting distribution on V, we have
Proof. As described above, take the distribution on vertices to the corresponding distribution on directed edges. Then define. Then by Lemma 2,
converges to. Now, and observe that
So pulling the distribution on directed edges back to a distribution on the vertices yields. Thus the result follows. □
Definition 1. The -squared distance for measuring convergence of a random walk is defined by
Notice that since,
Theorem 3. Let be the eigenvalues of. Then the convergence rate for the non-backtracking random walk with respect to the -squared distance is bounded above by
Proof. We have
Observe that is orthogonal, which is the eigenvector for, so we see that
2.3. Non-Backtracking Walks as Walks on a Directed Graph
The transition probability matrix for the walk on directed edges can be thought of as a transition matrix for a random walk on a directed line graph of the graph G. In this way, theory for random walks on directed graphs can be applied to analyze non-back- tracking random walks. Random walks on directed graphs have been studied by Chung in  by way of a directed version of the normalized graph Laplacian matrix. In  , the Laplacian for a directed graph is defined as follows. Let P be the transition probability matrix for a random walk on the directed graph, and let be its Perron vector, that is,. Then let be the diagonal matrix with the entries of along the diagonal. Then the Laplacian for the directed graph is defined as
This produces a symmetric matrix that thus has real eigenvalues. Those eigenvalues are then related to the convergence rate of a random walk on the directed graph. In particular, the convergence rate is bounded above by, where is the second smallest eigenvalue of (see Theorem 7 of  ).
Applying this now to non-backtracking random walks, define as before. Then as seen above, is the constant vector with for all v. Then the directed Laplacian for a non-backtracking walk becomes
Then Theorem 1 of  , applied to the matrix as defined, gives the Rayleigh quotient for a function by
From this it is clear that is positive semidefinite with smallest eigenvalue. If are the eigenvalues of, then Theorem 7 from  implies that the convergence rate for the corresponding random walk is bounded above by
We remark that for an ordinary random walk on an undirected graph G, the convergence rate is also on the order of, where now denotes the normalized Laplacian of the undirected graph G. Note that
where denotes the Rayleigh quotient with respect to
with given above.
The following result shows that the Laplacian bound does not give an improvement for non-backtracking random walks over ordinary random walks.
Proposition 1. Let G be any graph, and let be the normalized graph Laplacian and the non-backtracking Laplacian defined above. Then we have
Proof. Let be the function orthogonal to D that achieves the minimum in the Rayleigh quotient for. So
Define by. Observe that
So is orthogonal to . Therefore
2.4. Ihara’s Theorem
The transition probability matrix defined above is a weighted version of an important matrix that comes up in the study of zeta functions on finite graphs. We define B to be the matrix with rows and columns indexed by the set of directed edges of G as follows.
The matrix B can be thought of as a non-backtracking edge adjacency matrix, and the entries of describe the number of non-backtracking walks of length k from one directed edge to another, in the same way that the entries of powers of the adjacency matrix, , count the number of walks of length k from one vertex to another. The expression is closely related to zeta functions on finite graphs. A result known as Ihara’s Theorem further relates such zeta functions to a determinant expression involving the adjacency matrix. While we will not go into zeta functions on finite graphs in this paper, the following result equivalent to Ihara’s theorem will be of interest to us.
Ihara’s Theorem. For a graph G on n vertices and m edges, let B be the matrix defined above, let A denote the adjacency matrix, D the diagonal degree matrix, and I the identity. Then
We remark that the expression is the characteristic polynomial of B evaluated at, multiplied by the appropriate power of. In this way the complete spectrum of the matrix B is given by the reciprocals of the roots of the polynomial. Numerous proofs of this result exist in the literature  -  . For completeness, we will include here an elementary proof that uses only basic linear algebra. To the knowledge of the author, this proof is original. To begin, we will need a lemma giving a well-known property of determinants.
Lemma 3. Let M be a matrix, N a matrix, and A an invertible matrix. Then
Proof. Note that
Taking determinants of both sides gives the result. □
Proof of Ihara’s Theorem. Define S to be the matrix
so S is the endpoint incidence operator. Define T to be the matrix given by
so T is the starting point incidence operator. We will also define to be the matrix giving the reversal operator that switches a directed edge with its opposite. That is,
Now, a straightforward computation verifies that
Then from Lemma 3 and (1) we obtain
where u is chosen so that the matrix is inverivle.
Observe that, so that, so
. Thus, applying (2) and (3), the above becomes
where the last step is obtained by observing that. This is the desired equality for our choice of u. This is a polynomial of finite degree in u, and there are infinitely many u that make invertible, so the equality holds for all u. □
3. A Weighted Ihara’s Theorem
In this section, we will give a weighted version of Ihara’s Theorem. The proof presented in the previous section does not lend itself well to generalization to the weighted setting, so we will not follow that strategy. Rather, we will follow the main ideas of the proof of Ihara’s theorem found in  to obtain our weighted version of this result.
To each vertex we assign a weight, and let W be the diagonal matrix given by. Define S and T to be the matrices from the proof of Ihara’s Theorem in the previous section, and define and. So is the weighted version of the endpoint vertex-edge incidence operator, and is the weighted version of the starting point vertex-edge incidence operator. Define from the proof of Ihara’s Theorem, and define to be the weighted version of, that is
Finally, define the matrix by
Then is the weighted version of the non-backtracking edge adjacency matrix B seen above in Ihara’s theorem, with the weight on edge. We remark that if we take for each, then is exactly the transition probability matrix for a non-backtracking random walk on the directed edges of G defined in Section 2.2. This case is our primary focus, but we note that our computations apply for any arbitrary positive weights assigned to the vertices.
Now, a straightforward computation verifies that
We will define. Note that, so this is the adjacency matrix for the weighted graph with edge weights. The matrix is similar to, so when, this is the matrix whose entries are the transition probabilities for a single step of a non-backtracking random walk G.
From (5) and (6) we obtain the following equations.
We define to be the diagonal matrix and observe that. It then follows that
We remark that in the proof in  , they use the unweighted versions of each of these matrices, so rather than yields. Hence S and T will factor through, so that the term stays on the right hand side of the above equations. Here we have is a diagonal matrix with. Depending on the’s this matrix might not behave nicely with respect to the action of S and T, hence the extra terms that need to stay on the left-hand side above. This difference from  is one of the primary difficulties in generalizing this result.
We will now perform a change of basis to see how the operator behaves with respect to the decomposition of the space of functions as the direct sum of and. To this end, fix any basis of the subspace, and let R be the matrix whose columns are the vectors of that basis (note that has rank n). Define. This will be our change of basis
matrix. To obtain the inverse of M, form the matrix and observe that
Therefore we have that.
Applying this change of basis, direct computation, applying (7) and (9), yields
Therefore, the matrix is similar to the matrix
, so they have the same determinant. Thus, we have
proven a weighted version of Ihara’s Theorem, which we state as the following.
Theorem 4. Let G be a graph on n vertices and m edges, and assign an arbitrary positive weight assigned to each vertex x. Let be the weighted non-backtracking edge adjacency matrix with edge weight assigned to edge as defined in (4). Let be the weighted adjacency matrix with edge weight assigned to each edge. Let be the weighted reversal operator defined above, and the diagonal matrix with as defined above. Then we have
As a corollary to the decomposition in Equation (11), if we take for all x, then, and the usual unweighted Ihara’s Theorem falls out immediately.
If we take, then becomes the transition probability matrix for the non-backtracking walk on directed edges, and. This is
clearly similar to the matrix. So in this case is similar to the matrix whose entries are the transition probabilities for a single step in a non-backtracking random walk. (Note, however, that is not the transition probability matrix for a non-backtracking random walk.)
4. The Mixing Rate of Non-Backtracking Random Walks
4.1. An Alternate Proof for Regular Graphs
Applying the results of the previous section to regular graphs yields a different proof of the results from  on the mixing rate of non-backtracking random walks on regular graphs.
Let G be a regular graph where each vertex has degree d. Then choosing
for all x yields gives us that is the transition probability matrix for the non-backtracking random walk on G. We remark that, from the previous
section, we have, , , and.
Therefore, the decomposition in (11) becomes
Noting that can be thought of as block diagonal with m blocks of the form
, then taking determinants, we find that
where the product ranges over all the eigenvalues of the adjacency matrix A for. As remarked previously, the left hand side is the characteristic polynomial of evaluated at, so from this we obtain the spectrum of.
Theorem 5. Let G be a d-regular graph with m edges and n vertices, and let be the transition probability matrix for a non-backtracking random walk as defined above. Then the eigenvalues of are
where ranges over the eigenvalues of the adjacency matrix A, and each have multiplicity.
From this we obtain the result from  .
Corollary 1. Let G be a non-bipartite, connected d-regular graph on n vertices for, and let and denote the mixing rates of simple and non-backtracking random walk on G, respectively. Let be the second largest eigenvalue of the adjacency matrix of G in absolute value.
If and, then
Proof. We remark that the expression is precisely the ex-
pression derived by Alon et al. in  for the mixing rate of a non-backtracking random walk on a regular graph, and we may proceed with the analysis of the convergence rate in the same way they do. The convergence rate is given by the second largest eigenvalue of, which will be obtained setting to be the second largest eigenvalue of A. Let be this eigenvalue.
For we have
So. Since is the second largest eigenvalue of the transition
probability matrix P for the usual walk, the first case follows.
For, is complex, and we obtain
We remark that in this case that, a classic result of Nilli  related to
the Alon-Boppana Theorem implies that we are never too far below this bound. Indeed, the result states that if G is d-regular with diameter at least, then
. With the restriction that, then the diameter is at
least, and so, and the second case follows. □
4.2. Biregular Graphs
A graph G is called -biregular if it is bipartite and each vertex in one part of the bipartition has degree c, and each vertex of the other part has degree d. In the weighted
Ihara’s Theorem, we have, so in the case where G is -biregular, then we have. So since is a multiple of the
identity, as with regular graphs, in the decomposition (11), the term can be taken to the other side of the equation. Note that is diagonal with
if u has degree c, or if
u has degree d. Then is diagonal with entry
the decomposition (11) becomes
where is the adjacency matrix of G.
Note that is similar to a block diagonal matrix with blocks of the form
, so taking the determinant above we obtain
We will look at the matrix. Suppose the first part in the
bipartition of G has size r, and the second part has size s, where without loss of generality,. By row reduction, this has the same determinant as the matrix
Now, the above determinant is given by the product of the eigenvalues of the matrix. Observe that if is an eigenvalue of the adjacency matrix A, then is an eigenvalue of. Therefore, in all we have
where the product ranges over the largest eigenvalues of A (or in other words, ranges of the s eigenvalues of). Therefore the characteristic polynomial is given by
Thus we can explicitly obtain the eigenvalues of.
Theorem 6. Let G be a -biregular graph, let the part with degree c have size r, and the part with degree d have size s, and assume without loss of generality that. Suppose G has n vertices and m edges. Then the eigenvalues of the non-backtracking transition probability matrix defined above are
as well as the 4 roots of the polynomial
for each value of ranging over the s positive eigenvalues of the adjacency matrix A. These roots are
We can now give a version of Corollary 1 for -biregular graphs.
Corollary 2. Let G be a -biregular graph with. Let be the square of the second largest eigenvalue of the transition probability matrix P for a random walk on G, and let be the square of the second largest modulus of an eigenvalue of. Let be the second largest eigenvalue of the adjacency matrix of G. Then we have the following cases.
If and both c and d are, then
Proof. We need to compare the eigenvalues of to the eigenvalues of
. Note that for an eigenvalue of A, we have
which implies and. Then observe
so the eigenvalues of P are where ranges over the eigenvalues of A. Note that the largest eigenvalue of A is.
Let equal the expression (12), and consider the following cases.
If, then is real. Direct computation verifies that, evaluating the expression (12) at yields and for in this range. Therefore, in this case the eigenvalue of always has smaller absolute value than the corresponding eigenvalue of P, implying. The lower bound follows from (12) ignoring the square root inside. Thus the first case follows.
If, then is complex, and direct computation shows
A version of the Alon-Boppana Theorem exists for -biregular graphs as well, proven by Feng and Li in  (see also  ).
Theorem 7.  Let G be a -biregular graph, and let be the second largest eigenvalue of the adjacency matrix A of G. Then
where the diameter of G is greater than.
Observe that certainly the diameter is at least, so that the condition on the degrees and Theorem 7 imply that
As, so this gives the result for the second case. □
We have looked at non-backtracking random walks from the point of view of walking along directed edges. For the special cases of regular and biregular graphs, our weighted version of Ihara’s Theorem (Theorem 4) has given us the complete spectrum of the transition probability matrix for the non-bakctracking walk, allowing for easy comparison between the non-backracking mixing rate, and the mixing rate of the usual random walk. Clearly, it would be desirable to extend these reults to more general classes of graphs. The difficulty in applying Theorem 4 directly is with the term involving. As seen in Section 3, is a diagonal matrix with
In the case of regular and biregular graphs, this expression is constant (we get and for the d-regular and -biregular cases respectively), making simply a multiple of the identity. This allows the difficulty to be handled relatively easily. Regular and biragular graphs are in fact the only graphs for which is a multiple of the identity, suggesting that these exact techniques will not work as nicely on more general classes of graphs. If a cleaner version of Theorem 4 could be proven, then, aside from being interesting in its own rite, it could potentially be used to extend our results on non-backtracking random walks.
The author would like to thank Fan Chung for numerous helpful discussions throughout the process of writing this paper.
 Alon, N., Benjamini, I., Lubetzky, E. and Sodin, S. (2007) Non-Backtracking Random Walks Mix Faster. Communications in Contemporary Mathematics, 9, 585.
 Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborova, L. and Zhang, P. (2013) Spectral Redemption in Clustering Sparse Networks. Proceedings of the National Academy of Sciences, 110, 20935-20940.
 Ihara, Y. (1966) On Discrete Subgroups of the Two by Two Projective Linear Group over p-adic Fields. Journal of the Mathematical Society of Japan, 18, 219-235.
 Hishimoto, K. (1992) Artin-Type L-Functinos and the Density Theorem for Prime Cycles on Finite Graphs. International Journal of Mathematics, 3, 809-826.