Counting hypergraph colourings in the local lemma regime

We give a fully polynomial-time approximation scheme (FPTAS) to count the number of q-colorings for k-uniform hypergraphs with maximum degree Δ if k≥ 28 and q > 315Δ14/k−14. We also obtain a polynomial-time almost uniform sampler if q>798Δ16/k−16/3. These are the first approximate counting and sampling algorithms in the regime q≪Δ (for large Δ and k) without any additional assumptions. Our method is based on the recent work of Moitra (STOC, 2017). One important contribution of ours is to remove the dependency of k and Δ in Moitra’s approach.


INTRODUCTION
Hypergraph colouring is a classic and important topic in combinatorics. Its study was initiated by Erdős' seminal result [Erd63], a sufficient upper bound on the number of edges so that a uniform hypergraph is 2-colourable. Many important tools in the probabilistic method have been developed around this subject, such as the Lovász local lemma [EL75], and the Rödl nibble [Röd85].
In this paper, we consider the problem of approximately counting colourings in k-uniform hypergraphs. The most successful approach to approximate counting is Markov chain Monte Carlo (MCMC). See [DFK91,JS93,JSV04] for a few famous examples. Indeed, MCMC has been extensively studied for graph colourings in low-degree graphs. Jerrum [Jer95] showed that the simple and natural Markov chain, Glauber dynamics, mixes rapidly, if q > 2∆, where q is the number of colours and ∆ is the maximum degree of the graph. As a consequence, there is a fully polynomial-time randomized approximation scheme (FPRAS) for the number of colourings if q > 2∆. This result initiated a series of research and the best bound in general, due to Vigoda [Vig00], requires that q > 11/6∆. It is conjectured that Glauber dynamics is rapidly mixing if q > ∆ + 1, the "freezing" threshold, but current evidences typically require extra conditions in addition to the maximum degree [HV03,DFHV13]. On the flip side, see [GSV15] for some (almost tight) NP-hardness results.
In k-uniform hypergraphs, the Markov chain approach still works, if q > C∆ for C = 1 when k ≥ 4 and C = 1.5 when k = 3 [BDK08,BDK06]. However, the local lemma implies that a hypergraph is q-colourable if q > C∆ 1/(k −1) for some constant C. This threshold is much smaller than ∆ when ∆ is large. Moser and Tardos' algorithmic version of the local lemma [MT10] implies that we can efficiently find a q-colouring under the same condition. Indeed, algorithmic local lemma has been a highly active area. See [KS11, HSS11, HS13a, HS13b, HV15, AI16, Kol16, CPS17, HLL + 17] for various recent development.
In view of the success of algorithmic local lemma, it is natural to wonder, whether we can also randomly generate hypergraph colourings, or equivalently, approximately count their number, beyond the q ≍ ∆ bound and approaching q ≍ ∆ 1/(k −1) ? Unfortunately, designing Markov chains quickly runs into trouble if q ≪ ∆. "Freezing" becomes possible in this regime (see [FM11] for examples 1 ), and the state space of proper hypergraph colourings may not be connected via changing the colour of a single vertex, the building block move of Glauber dynamics.
The only successful application of MCMC in this regime is due to Frieze et al. [FM11,FA17], which requires that q > max{C k log n, 500k 3 ∆ 1/(k −1) } and the hypergraph is simple. 2 Here q = Ω(log n) is necessary to guarantee that "frozen" colourings are not prevalent. Furthermore, it is reasonable to believe that simple hypergraphs are much easier algorithmically than general ones, since their chromatic numbers are O ∆ log ∆ 1/(k −1) [FM13], significantly smaller than the bound implied by the local lemma, and related Glauber dynamics for hypergraph independent sets works significantly better in simple hypergraphs than in general ones [HSZ16].
Our main result is a positive step beyond the freezing barrier in general k-uniform hypergraphs. Our result also answers some open problems raised in [FM11].
When k and ∆ are large, our result is better than the Markov chain results [BDK08,BDK06] and gets into the freezing regime. The exponent of our polynomial time bound depends on the constants k and ∆.
Our method is based on an intriguing result shown by Moitra [Moi17] recently, who gave fully polynomial-time deterministic approximation schemes (FPTAS) to count satisfying assignments of k-CNF formulas in the local lemma regime. It is not hard to see that Moitra's approach is rather general, and indeed it works for hypergraph colourings if some strong form of the local lemma condition holds, and k ≥ C log ∆ for some constant C, without any requirement on the connectedness of the state space. Unfortunately, the requirement that k ≥ C log ∆ is necessary for a "marking" argument to work in Moitra's approach. This is not an issue for k-CNF formulas, as in that setting the (strong) local lemma condition dictates that k ≥ C log ∆. However, for hypergraph colourings, we generally want k and ∆ to be two independent parameters. Marking is no longer possible in our general situation.
We briefly describe Moitra's approach before introducing our modifications. The first observation is that if the maximum degree is much smaller than the local lemma threshold, variables in the target distribution are very close to uniform. As a consequence, if we couple two copies of the Gibbs distribution while giving different colours at a particular vertex, sequentially and in a vertexwise maximal fashion, the discrepancy in the resulting coupling will be logarithmic with high probability. Then, one can set up a linear program to do binary search for the marginal probability, where the variables to solve mimic the transition probabilities in this coupling. The marking procedure ensures these locally (almost-)uniform properties to hold at any point of the coupling process above, by finding a good set of vertices so that we only couple these vertices and nothing goes awry.
Since marking is no longer possible in our setting, we take an adaptive approach in the coupling procedure to ensure local (almost-)uniform properties, rather than marking what we are going to couple in advance. Although similar in spirit, our proof details are rather different from those by Moitra [Moi17]. Since this coupling (or the analysis thereof) is used repeatedly in the whole algorithm, we have to rework almost all other proofs as well. In addition, quite a few steps (or the success thereof) in Moitra's approach seem rather mysterious. Our proofs unravel some of those mysteries, streamline the argument, and tighten the bounds at various places. Hopefully they also shed some light on where the limit of the method is.
The outline above only gives an approximation of the marginal probabilities. Due to the lack of marking, we also need to provide new algorithms for approximate counting and sampling. For approximate counting, we use the local lemma again to find a good ordering of the vertices so that the standard self-reduction goes through. For sampling, we use the marginal algorithm as an oracle, to faithfully simulate the true distribution, in an adaptive fashion similar to the coupling procedure. At the end of this process, not all vertices will be coloured. However we show that with high probability, all remaining connected components have logarithmic sizes and we fill those in by brutal force enumeration. The threshold we obtain for sampling is larger than the one for approximate counting.
Theorem 2. For integers ∆ ≥ 2, k ≥ 28, and q > 798∆ 16 k −16/3 , there is a sampler whose distribution is ε-close in total variation distance to the uniform distribution on all proper colourings, with running time polynomial in the number of vertices and 1/ε.
The correlation decay approach of approximate counting [Wei06,BG08] have been successfully applied to graph colouring problems [LY13,LYZZ17] or hypergraph problems [BGG + 16], but it seems difficult to combine the two in our setting. More recently, there are other progresses with respect to approximate counting in the local lemma regime [HSZ16,GJL17,GJ17]. However, these results do not directly apply to our situation either. Indeed, our result can be seen as one step further to linking the local lemma with approximate counting, as we made Moitra's approach applicable in a more general setting, where the constraint size does not have to be directly related to the probability of bad events or the dependency degree. However, there still seem to be a few difficulties, such as constraints that cannot be satisfied by partial assignments, to go further towards the most general abstract setting of the local lemma, and this is an interesting direction for the future.
The paper is organized as follows. Section 2 introduces basic notions as well as the local lemma, and Section 3 introduces the coupling procedure. We give the algorithm of estimating marginal probabilities in Section 4, and use this algorithm to do counting and sampling in Sections 5 and 6, respectively. To maintain flexibility, in Sections 3, 4, 5, and 6, we keep track of various parameters, and all parameters are optimized in Section 7. We conclude in Section 8 by describing the bottleneck of the current approach, and outlining the difficulties for further generalizations.

PRELIMINARY
A hypergraph is a pair H = (V , E) where V is the collection of vertices and E ⊆ 2 V is the set of hyperedges. We say a hypergraph H is k-uniform if every e ∈ E satisfies |e | = k. Let q ∈ N be the number of available colours. A proper colouring of H is an assignment σ ∈ [q] V so that every hyperedge in E is not monochromatic, namely that σ satisfies |{σ (v) : v ∈ e}| > 1 for every e ∈ E.
Although our goal is to count colourings in k-uniform hypergraphs, as the algorithm progresses, vertices will be pinned to some fixed value. Therefore we will work with a slightly more general problem, namely hypergraph colouring with pinnings. Formally, an instance of hypergraph colouring with pinnings is a pair (H (V , E), P) where P = {P e ⊆ [q] : e ∈ E} and P e is the set of colours that are already present (pinned) inside the edge e. In the intermediate steps of our algorithms, P will be induced by pinning a subset of vertices, but it is more convenient to consider this slightly more general setup. For an instance with pinning, a colouring σ ∈ [q] V is proper if for every e ∈ E, it holds that |{σ (v) : v ∈ e} ∪ P e | > 1.
Denote by C the set of all proper colourings of (H, P). For any C ′ ⊆ C, we use µ C ′ to denote the uniform distribution over C ′ . Since there is no weight involved, µ C is our targeting Gibbs distribution.
Let µ be a distribution over colourings ([q] ∪ {−}) V , where "−" denotes that the vertex is not coloured (yet). We say µ(·) is pre-Gibbs with respect to µ C if for every σ ∈ C, where σ |= σ ′ means that the full colouring σ is consistent with the partial one σ ′ . In other words, if we draw a partial colouring σ ′ from a pre-Gibbs distribution µ, and then complete σ ′ uniformly conditioned on coloured vertices (with respect to µ C ), the resulting distribution is exactly µ C . Note that in our definition we do not require the support of µ to be all partial colourings.

Lovász Local Lemma
Let (H (V , E), P) be an instance of hypergraph colourings and q ∈ N be a non-negative integer. We use ∆ to denote the maximum degree of H . Although we consider k-uniform hypergraphs in Theorem 1, in both the sampling and the counting procedure we will pin vertices gradually. Throughout the section, we assume that for every e ∈ E, k ′ ≤ |e | ≤ k. These are the instances that will emerge in Theorem 20 and Theorem 22. Let Lin(H ) be the line graph of H , that is, vertices in Lin(H ) are hyperedges in H and two hyperedges are adjacent if they share some vertex in H . The "dependency graph" of our problem is simply the line graph of H . For e ∈ E, let Γ(e) be the neighbourhood of e, namely the set {e ′ | e ∩ e ′ ∅}. It is clear that the maximum degree of Lin(H ) is at most k(∆ − 1). Hence |Γ(e)| ≤ k(∆ − 1) for any e ∈ E. With a little abuse of notation, for v ∈ V , let Γ(v) be the set of edges in E incident to v, i.e., Γ(v) := {e ∈ E : v ∈ e}. Furthermore, for any event B depending a set of vertices ver(B), let Γ(B) be the set of dependent sets of B, i.e., Γ(B) = {e | e ∩ ver(B) ∅}.
The (asymmetric) Lovász Local Lemma (proved by Lovász and published by Spencer [Spe77]) states a sufficient condition for the existence of a proper colouring. Note that in the following Pr [·] refers to the product distribution where every vertex is coloured uniformly and independently.
Theorem 3. If there exists an assignment x : E → (0, 1) such that for every e ∈ E we have then a proper colouring exists.
When the condition of Theorem 3 is met, we actually have good control over any event in the uniform distribution µ C due to the next theorem, shown in [HSS11]. (1 − x(e)) −1 .
Theorem 4 also allows us to have some quantitative control over the marginal probabilities.
Lemma 5. If k ′ ≤ |e | ≤ k for any e ∈ E, t ≥ k and q ≥ (et ∆) 1 k ′ −1 , then for any v ∈ V and any colour c ∈ [q], Proof. Let x(e) = 1 t ∆ for every e ∈ E. We first verify that (1) holds. Since |Γ(e)| ≤ k(∆ − 1) and t ≥ k, Hence, Theorem 4 applies. Then, Unfortunately, Theorem 4 does not give lower bounds directly. We will instead bound the probability of blocking v to have colour c.
Lemma 6. If k ′ ≤ |e | ≤ k for any e ∈ E, t ≥ k, and q ≥ (et ∆) 1 k ′ −1 , then for any v ∈ V and any colour c ∈ [q], Proof. Fix v and c. For every e ∈ Γ(v), let Block e be the event that vertices in e other than v all have the colour c. Clearly, conditioned on none of Block e occurring, the probability of v coloured c is larger than 1/q. Hence we have that Clearly Pr [Block e ] = q 1− |e | ≤ q 1−k ′ . Again let x(e) = 1 t ∆ for every e ∈ E and (1) holds. Since |Γ(Block e )| ≤ k(∆ − 1) + 1 and t ≥ k, by Theorem 4,
Lemma 7. If k ′ ≤ |e | ≤ k for any e ∈ E, t ≥ k and q ≥ (et ∆) 1 k ′ −1 , then for any v ∈ V and any colour c ∈ [q],

THE COUPLING
Recall that a partial colouring is an assignment σ ∈ ([q] ∪ {−}) V where "−" denotes an unassigned colour. Fix a vertex v ∈ V and two distinct colours c 1 , c 2 ∈ [q], we define two initial partial colourings X 0 and Y 0 that assign v with colours c 1 and c 2 respectively and let all other vertices be unassigned. We use C 1 and C 2 to denote the set of proper colourings with v fixed to be c 1 and c 2 respectively. For a partial colouring X , we use C X to denote the set of proper colourings consistent with X . Moitra [Moi17] introduced the following intriguing idea (in the setting of CNF) to compute the ratio of marginal probabilities on v. Couple µ C 1 and µ C 2 in a sequential way. Start from v, where the colours differ, and proceed in a breadth-first search manner, vertex by vertex. At each vertex we draw a colour from µ C 1 and µ C 2 , respectively, conditioned on all the existing colours, and couple them maximally. The process ends when the set of vertices coupled successfully form a cut separating v from uncoloured vertices. If every vertex we encounter has its marginal distribution close enough to the uniform distribution, then this coupling process terminates quickly with high probability. These local almost-uniform properties are guaranteed by Lemma 7. Then Moitra sets up a clever linear program (LP), where the variables mimic transition probabilities during the coupling (but in some conditional way), and shows that the LP is sufficient to recover the marginal distribution at v by a binary search.
We apply the same idea here for hypergraph colourings. However, one needs to carefully implement the coupling to guarantee that all marginal distributions encountered are close enough to uniform. Formally, we describe our coupling process in Algorithm 1. The coupling process applies to hypergraphs with edge size between k 1 and k for some parameter 0 < k 1 ≤ k. There is another parameter 0 < k 2 < k 1 and all these parameters will be set in Section 7. The output is a pair of partial colourings (X , Y ) extending X 0 and Y 0 respectively. Notice that in order to implement the coupling process, we fix an arbitrary ordering of edges and vertices in advance.
The set V col consists of all coloured vertices. Intuitively, the set V 1 contains vertices that have failed the coupling and V 2 is its complement. Once a hyperedge is satisfied by both partial colourings X and Y , it has no effect any more and is thus removed.
The main difference from Moitra's coupling [Moi17] is that we cannot choose what vertices to couple in advance ("marking"). Instead, we take an adaptive approach to ensure that no hyperedge becomes too small. Once k 2 vertices of a hyperedge are coloured, all the rest vertices are considered "failed" in the coupling (namely they are added to V 1 ). However these failed vertices are left uncoloured.
Algorithm 1 outputs a pair of partial colourings X , Y defined on V col and a partition of vertices V = V 1 ⊔ V 2 . For any edge e in the original E such that e ∩ V 1 ∅ and e ∩ V 2 ∅, it is removed because either it is satisfied by both X and Y , or k 2 vertices in e have been coloured. In the latter case, all vertices in e are either coloured or in V 1 , namely e ⊂ V 1 ∪ V col . Hence all edges intersecting V 1 and V 2 \ V col are satisfied by both X and Y . This fact will be useful later.
For u ∈ V , let Γ ver (u) denote the neighbouring vertices of u (including u), namely Γ ver (u) = {w | ∃e ∈ E, {u, w } ⊆ e}, and let Algorithm 1 The coupling process 1: Input: A hypergraph H (V , E) with pinnings P and k 1 ≤ |e | ≤ k for every e ∈ E, two partial colourings X 0 and Y 0 .
Let e be the first such hyperedge; 7: Let u be the first vertex in e ∩ V 2 ; 8: Sample a pair of colours (c x , c y ) according to the maximal coupling of the marginal distribution at u conditioned on X and Y respectively; 9: Extend X and Y by colouring u with c x and c y , respectively; 10: if c x c y then 12: end for 21: end while Γ ver (U ) = u ∈U Γ ver (u) for a subset U ⊆ V . The following lemma summarizes some properties of this random process.
Lemma 8. The following properties of Algorithm 1 hold: (1) All coloured vertices are either in V 1 or incident to V 1 , namely V col ⊆ Γ ver (V 1 ); (2) The distributions of X and Y are pre-Gibbs with respect to µ C 1 and µ C 2 respectively.
Proof. For (1), notice that whenever we add a vertex u into V col , it must hold that u ∈ e for some e ∩ V 1 ∅ at the time. The claim follows from a simple induction.
For (2), we only prove the lemma for X . The proof for Y is similar. The partial colouring X is generated in the following way: at each step either the process ends, or the next uncoloured vertex u is chosen and extend X to u with the correct (conditional) marginal probability and repeat. Our decisions (whether or not to halt, and what is the next u) depend on Y in addition to the partial colouring X so far.
An intermediate state S of Algorithm 1 consists of partial colourings X , Y , V col , and V 1 . 3 Our claim is that, conditioned on any valid S, the distribution of the final output (on the X side) of Algorithm 1 is pre-Gibbs with respect to µ C X . The lemma clearly follows from the claim by setting S to the initial state of Algorithm 1.
We induct on the maximum possible future steps of S. The base case is that S will halt immediately. Thus the output is simply X and completing it yields the uniform distribution on C X . That is, the output is pre-Gibbs.
For the induction step, S will not halt but rather, extend the colourings to some vertex u. Let τ S (·) denote the measure of completing the output of Algorithm 1 conditioned on S. Let X u←c be a partial colouring defined on V col ∪ {u} by extending X to u with colour c, and S ′ be an internal state consistent with X u←c , denoted by S ′ |= X u←c . Moreover, let q(S ′ ) be the probability of transiting from S to S ′ . Since the marginal probability at u only depends on the previous partial colourings X ′ , we have that where µ C X (X u←c ) is in fact the marginal probability of the colour c at u conditioned on X . By our induction hypothesis, conditioned on S ′ , the final output is pre-Gibbs with respect to C X u←c . That is, For σ ∈ C X , suppose X u←c is the partial colouring of σ restricted to V col ∪ {u}. Then we have that where in the second line we use (5), and in the fourth line we use (4). The claim follows. □ Therefore, the output of Algorithm 1 is a coupling of two pre-Gibbs measures such that they are defined on the same set of vertices V col . We use µ cp (·, ·) to denote this joint distribution.
It is possible to show that the final size of |V 1 | is O(log |V |) with high probability. This fact will not be directly used, and is indeed not strong enough for the algorithm and its analysis in the next section. We will omit its proof. What we will show eventually is that, conditioned on a randomly chosen colouring from C 1 or C 2 , the probability that the coupling process terminates decays exponentially with the depth. There are two levels of randomness here, and they will be separated, since the linear program later will only be able to certify the second kind randomness.
Later, in Section 6, when we do sampling, we will be facing a similar procedure, Algorithm 2, and we will show that the connected components produced by Algorithm 2 are O(log |V |) with high probability (Lemma 21). This is in the same vein as |V 1 | being size O(log |V |) with high probability in Algorithm 1.

COMPUTING THE MARGINALS
In the previous section, we introduced a random process to generate a joint distribution of partial colourings µ cp (·, ·), whose marginal distributions are pre-Gibbs. Recall that we fixed X (v) = c 1 and Y (v) = c 2 . Let q i denote the marginal probability in µ C of v being coloured by c i , for i = 1, 2. That is, q i = | C i | | C | for i = 1, 2. The coupling naturally induces an (imaginary) sampler to uniformly sample from C 1 ∪ C 2 as follows: Step 1: Sample (X , Y ) using Algorithm 1; Step 2: Let v ← c 1 with probability q 1 q 1 +q 2 and v ← c 2 otherwise; Step 3: If v is coloured by c 1 , uniformly output a colouring in C X , otherwise uniformly output a colouring in C Y .
We denote this sampler by S. The output of S is uniform over C 1 ∪C 2 is because by Lemma 8, the output distribution of Algorithm 1, projected to either side, is pre-Gibbs. Then we choose the final colouring proportional to the correct ratio.
One can represent the coupling process (Algorithm 1) as traversing a (deterministic) coupling tree T constructed as follows: each vertex in T represents a pair of partial colourings (x, y) 4 defined on some V col that have appeared in the coupling. We write (x, y) ∈ T if (x, y) is a pair of partial colourings represented by some vertex in T . Although the intermediate state of Algorithm 1 consists of partial colourings x, y together with V col and V 1 , we can actually deduce V col from x, y, as well as V 1 by simulating Algorithm 1 from the start given x and y. Thus the pair (x, y) determines either that the coupling should halt or the next vertex u to extend to. In the coupling tree T , (x, y) either is a leaf or have q 2 children, which correspond to the q 2 possible ways to extend (x, y) by colouring u. The root of the tree is the initial pair (x 0 , y 0 ) defined on {v}.
In the following, we identify a collection of conditional marginal probabilities that keeps the information of the coupling process.
First, consider a pair of partial colourings (x, y) ∈ T which is a leaf, and any two proper colourings σ x , σ y such that σ x |= x and σ y |= y. In the probability space induced by the sampler introduced above, define These quantities are well defined and independent of the particular choices of σ x and σ y . Essentially we only condition on the random choice at step 2 of S. Once that choice is made, the output is uniform over C x or C y . Perhaps a clearer way of seeing this independence is to give more explicit expressions to p x x,y and p y x,y . By Bayes' rule, Combining two identities above we obtain A crucial observation is that, for every pair of partial colourings (x, y) that is a leaf of T with corresponding V col , V 1 , V 2 , the ratio can be computed in q |V 1 \V col | time. This is because when Algorithm 1 terminates, all edges intersecting V 1 and V 2 \ V col are satisfied by both x and y. The numbers of ways colouring blank vertices in V 2 cancel out, and we only need to enumerate all colourings for blank vertices inside V 1 . Let r x,y = Next, consider an internal (x, y) in the coupling tree T . We interpret p x x,y and p y x,y as the probability that the coupling process has ever arrived at an internal pair of partial colourings (x, y) conditioned on the output of S is σ x and σ y , respectively. Note that the definition is consistent with our previous definition when (x, y) is a leaf of T . Recall that (x 0 , y 0 ) is the root of T , namely x 0 or y 0 only colours v with c 1 or c 2 , respectively. For (x 0 , y 0 ), we have that Moreover, for an internal (x, y) whose children are defined on where we use x u←c to denote the partial colouring that extends x by assigning colour c to the vertex u.
In fact, when the coupling process is at some internal node of the coupling tree, say (x, y), defined on V col , and the next step is to sample the colour on a vertex u, one can recover the distribution of the colour on u in the next step from the values by solving linear constraints using Bayes' rule. Therefore, the collection p x x,y , p y x,y : (x, y) ∈ T encodes all information of the coupling process.

The linear program
The values p x x,y and p y x,y are unknown and we are going to impose a few necessary linear constraints on them. The basic constraints are derived from (8), (9), (10), and (11). To this end, for every node (x, y) in T , we introduce two variables p x x,y and p y x,y , aiming to mimic p x x,y and p y x,y . The full coupling tree T is too big, and we will truncate it up to some depth L > 0. The quantity L will be set later. We will perform a binary search to estimate the ratio q 1 q 2 using the truncated coupling tree. Thus, we introduce two variables r and r as our guesses for upper and lower bounds of q 1 q 2 . Let T L be the coupling tree truncated at depth L, and denote by L(T ) the leaves of a tree T . Since the coupling procedure colours one vertex at a time, for any node (x, y) ∈ T L , we have that |V col | ≤ L where V col is determined by  (x, y). Formally, we have three types of constraints.
Constraints 1: For every leaf (x, y) ∈ L(T L ) with corresponding |V col | < L, we have the constraints: Constraints 1 are relaxed versions of identity (8). These constraints are the most critical ones. However, in order to compute r x,y , one needs exp(L) amount of time. This forces us to go only logarithmic depth in the coupling tree, but we will show that this is enough.
Constraints 2: For the root (x 0 , y 0 ) ∈ T , we have Moreover, for every non-leaf (x, y) ∈ T with corresponding |V col | < L, let u be the next vertex to couple. We have the following constraints: These constraints faithfully realize the properties (9), (10), and (11).
These constraints reflects the fact that the coupling at individual vertices are very likely to succeed, due to by Lemma 7. Assume the conditions of Lemma 7 are met with t = t * . We claim that the true values {p x x,y } satisfy Then Constraints 3 follows from Constraints 2. We use (6) to show the claim. By Lemma 7, Again by Lemma 7, the coupling at u with any colour c succeeds with probability at least 1 q 1 − 1 t * . Thus the ratio µ cp (x u←c ,y u←c ) µ cp (x,y) , which can be viewed as the probability of conditioned on reaching (x, y), coupling u successfully with colour c, is at least 1 q 1 − 1 t * .
Combine these facts with (6), Similar inequalities hold for {p y x,y } due to (7).

Analysis of the LP
In this subsection, we show that the LP can be used to obtain an efficient and accurate estimator of marginals.
Theorem 9. Let ∆ ≥ 2 and k > 0 be two integers. Let 0 < β < 1 be a constant. Let 0 < k 2 < k 1 ≤ k be integers. Let H = (V , E) be a hypergraph with pinnings P, maximum degree ∆ such that k 1 ≤ |e | ≤ k for every e ∈ E. If then there is a deterministic algorithm that, for every v ∈ V , c ∈ [q] and ε > 0, it computes a number p satisfying in time poly( 1 ε ).
Before diving into the proof details, let us first imagine that we set up the LP for the whole coupling tree. To do this would require exponential amount of time, but we show that this indeed can be used to recover accurate information. Due to Constraints 2, a simple induction shows that for every L ≤ |V | and σ ∈ C 1 , (x,y)∈ L(T L ): σ |=x p x x,y = 1.
Similar equalities hold on the Y side. Using this, we rewrite the ratio | C 1 | | C 2 | as follows: Recall r x,y = |C x | |Cy | . By Constraints 1, we know that for any (x, y) ∈ L(T ), It implies that Unfortunately, as the size and the computational cost of setting up the LP is exponential in L, we have to truncate it early. The rest of our task is to show that the error caused by the truncation is small. One may notice that in the analysis above we do not use Constraints 3. Indeed, these constraints are used to bound the truncation error.
Intuitively, the truncation error comes from the proper colourings so that the coupling does not halt at depth L (since we cannot impose Constraints 1 for these nodes). A naive approach would then try to show that conditioned on any proper colouring as the final output, the coupling will terminate quickly. This is unfortunately not true and there exist "bad" colourings so that the coupling does not terminate at level L with high probability. For example, given the ordering of vertices and edges, a proper colouring σ ∈ C 1 may render all vertices encountered in Algorithm 1 with the same colour. Hence conditioned on this σ on the X side, Algorithm 1 will not stop until all edges are enumerated.
We will show, nonetheless, that the fraction of "bad" colourings is small. Let us formally define bad colourings first. We need to use the notion of {2, 3}-trees. This notion dates back to Alon's parallel local lemma algorithm [Alo91].
Definition 10 ({2, 3}-tree). Let G = (V , E) be a graph. A set of ver- (2) if one adds an edge between every u, v ∈ T such that dist G (u, v) = 2 or 3, then T is connected.
We will need to count the number of {2, 3}-trees later for union bounds. The following lemma, due to Borgs et al. [BCKL13], counts the number of connected induced subgraphs in a graph. Proof. We construct T greedily starting from T 0 := {e * }. Given T i , let B ← B \ Γ(T i ), and then let T i+1 be T i plus the first hyperedge in B which has distance ≤ 3 from T i . If no such hyperedge exists, the process stops.
We claim that when the process stops, all hyperedges in B are removed. If there is a nonempty subset B ′ ⊂ B remaining, choose an arbitrary e ∈ B ′ . Since B is connected in L 2 (H ), there is a shortest path P ⊂ B from e to some e ′ ∈ T in L 2 (H ). Assume that P is e → · · · → e 1 → e 2 → e ′ (where e 1 is possible to be e). The minimality of |P | implies that e 1 , e 2 T . If dist Lin(H ) (T , e 2 ) = 1, then dist Lin(H ) (T , e 1 ) ≤ 1 + dist Lin(H ) (e 1 , e 2 ) ≤ 3 and it contradicts to the construction of T as e 1 would be added to T . Otherwise dist Lin(H ) (T , e 2 ) = 2, and again it contradicts to the construction of T as e 2 would be added to T .
For the size of T , notice that in every step of the process, at most k∆ hyperedges are removed. Hence |T | ≥ |B | k ∆ . □ We now define bad colourings. Let e 0 be the first edge in Γ(v). Recall that in the coupling process we would attempt to colour at most k 2 vertices in an edge, where 0 < k 2 < k 1 . We will have another parameter 0 < β < 1, which denotes the fraction of (partially) monochromatic hyperedges in a bad colouring. All parameters will be set in Section 7.
Definition 14 (bad colourings). Let L > 0 be an integer and β > 0 be a constant. A colouring σ ∈ C 1 is ℓ-bad if there exists a {2, 3}-tree T in Lin(H ) and V col such that (1) |T | = ℓ and e 0 ∈ T ; (2) for every e ∈ T , |e ∩ V col | = k 2 ; (3) the partial colouring of σ restricted to V col makes at least βℓ hyperedges in T (partially) monochromatic. We say σ ∈ C 1 is ℓ-good if it is not ℓ-bad.
Note that since T is a {2, 3}-tree in Lin(H ) in Definition 14, all hyperedges in T are disjoint.
We show that the fraction of bad proper colourings among all proper colourings in C 1 is small. This allows us to throw away bad colourings in the estimates later.
Lemma 15. Let ∆ ≥ 2 and 0 < k 2 < k 1 ≤ k all be integers. Let 0 < β < 1 be a constant. Let H (V , E) be a hypergraph with pinnings P, where the maximum degree is ∆ and k 1 ≤ |e | ≤ k for every e ∈ E. If q 1−k 2 < β, q > (ek∆) Proof. Fix a {2, 3}-tree T = {e 1 , e 2 , · · · , e ℓ } in Lin(H ) of size ℓ and V col such that for every e ∈ T , |e ∩ V col | = k 2 . We say σ is ℓ-bad with respect to T and V col if σ , T , and V col satisfy the requirments in Definition 14. Denote by Z V col or simply Z the number of (partially) monochromatic hyperedges by first drawing from µ C 1 and then revealing the colours of vertices in V col . We use Theorem 4 to bound the probability that Z ≥ βℓ.
Indeed, µ C 1 can be viewed as the uniform distribution over proper colourings of an instance where v is pinned to colour c 1 . In this instance, we have that k 1 −1 ≤ |e | ≤ k for every e ∈ E. Hence, in the product distribution Pr [e is monochromatic] ≤ q 2−k 1 ≤ 1 ek ∆ for every e ∈ E by assumption. We set x(e) = 1 k ∆ in Theorem 4 and verify (1): In the product distribution (where all vertices are independent), for e ∈ T , all vertices in e ∩V col are monochromatic with probability p * := q 1−k 2 < β. Since T is a {2, 3}-tree in Lin(H ), all edges are disjoint and these events are independent in the product distribution. Hence, by a multiplicative Chernoff bound with mean p * ℓ and γ = β p * − 1 > 0, For each edge e ∈ T , there are at most k(∆ − 1) + 1 ≤ k∆ − 1 edges that intersect with e (including itself). The random variable Z thus depends on at most (k∆ − 1)ℓ hyperedges in µ C 1 . By Theorem 4 with x(e) = 1 k ∆ , To finish the argument, we still need to account for all {2, 3}trees and V col by an union bound. Since the maximum degree in Lin(H ) is k∆, the total number of {2, 3}-trees containing e 0 of size ℓ, Putting everything together, we have that By assumption, Combining these two inequalities finishes the proof. □ Let (x, y) ∈ T be a pair of partial colourings defined on V col . We are now going to prove some structural properties of (x, y). Say an edge e ∈ E such that e ∩ V col ∅ is blocked by (x, y) if one of the following holds (1) x(u) y(u) for some u ∈ e.
(2) |e ∩ V col | = k 2 and e is not satisfied by both x and y.
Notice that all edges in Γ(v) are always blocked, and in particular, e 0 is always blocked.
Let us denote the set of edges blocked by (x, y) as B x,y . Then B x,y always contains a large {2, 3}-tree.
Lemma 16. Let (x, y) ∈ T be a pair of partial colourings in the coupling tree defined on V col with corresponding V 1 . Assume |V col | = L. There exists a {2, 3}-tree T ⊆ B x,y in Lin(H ) of size at least L k 3 ∆ 2 containing e 0 .
Proof. We first claim that B x,y is connected in L 2 (H ) by inducting on L. Once an edge is blocked during Algorithm 1, it will remain blocked till the end. If u is the next vertex to be coloured in Algorithm 1, then u must be adjacent to some vertex u ′ ∈ V 1 , and u ′ is in some edge e blocked by the current (x, y). Therefore any newly blocked edge caused by colouring u has distance at most 2 to e.
Since e 0 is always blocked, e 0 ∈ B x,y . By Lemma 13, there exists a {2, 3}-tree T ⊆ B x,y in Lin(H ) such that |T | ≥ | Bx,y | k ∆ . Next we claim that B x,y ≥ L k 2 ∆ . This is because that every vertex in V 1 belongs to some blocked edge. Hence |V 1 | ≤ k B x,y . By item (1) of Lemma 8, V col ⊆ Γ ver (V 1 ). It implies that L = |V col | ≤ |Γ ver (V 1 )| ≤ k∆ |V 1 |. Combining these facts yields the lemma. □ Recall that T L is the tree obtained from T by truncating at depth L, and L(T L ) is its leaves. Because of Constraints 2, for every proper colouring σ ∈ C 1 , it holds that However, in Constraints 1, our linear program only contains constraints for those p x x,y and p y x,y whose V col is of size strictly smaller than L. The next lemma shows that, for a ℓ-good colouring σ , solving p x x,y , p y x,y provides a good approximation for the identity (12).
Lemma 17. Let 0 < β < 1 be a constant. Let H = (V , E) be a hypergraph with pinnings P and maximum degree ∆ such that |e | ≤ k for all e ∈ E. Let σ ∈ C 1 be ℓ-good where ℓ is an integer. If p x x,y is a collection of values satisfying all our linear constraints, with t * = 5 e 2 k 3 ∆ 3 2 1 1−β in Constraints 3 up to level L = k 3 ∆ 2 ℓ, then it holds that (x,y)∈ L(T L ): |V col |<L and σ |=x Proof. We construct a new coupling process similar to Algorithm 1, and show the left-hand side of (13) is the probability of an event defined by the new process. We modify S in the following two ways: (1) condition on the final output being σ ; (2) use probabilities induced by p x x,y instead of p x x,y . To be more specific, consider each step where one needs to extend (x, y) defined on V col to a new vertex u. Call the new colourings (x ′ , y ′ ). Since the output σ is fixed, we simply reveal x ′ (u) = σ (u). In the original S, the colour of y ′ (u) is drawn according to an optimal coupling of (x ′ , y ′ ) on u.
Here, we set y ′ (u) to colour c with probability p x u←σ (u) x u←σ (u) ,y u←c p x x,y . This is well-defined since p x x,y satisfies Constraints 2. If this process reaches depth L, then it stops.
The output of the new coupling defines a distribution over pairs of partial colourings (x, y) such that σ |= x and we denote it by µ. We claim that The left-hand side of (14) is the probability that our new coupling reaches some (x, y) with |V col | = L. Lemma 16 implies that the set B x,y of blocked edges contains a {2, 3}-tree T of size at least L k 3 ∆ 2 = ℓ. Thus the probability of reaching vertices of depth L is upper bounded by the right-hand side of (14).
Fix a {2, 3}-tree T of size ℓ. Since σ is ℓ-good, whatever the choice of V col is, at least a (1 − β) fraction of hyperedges in T must not be monochromatic on the X side. However, if T ⊆ B X,Y , then at least ⌊(1 − β) |T |⌋ hyperedges satisfy (1) σ (v) Y (v) for some v ∈ e ∩ V col , or (2) |e ∩ V col | = k 2 and σ | V col = X | V col satisfies e but Y does not satisfy e. It is clear that case (2) implies case (1), since if one partial colouring satisfies e and another one does not, then they must differ at some v ∈ e ∩ V col . We use T ′ = e 1 , e 2 , . . . , e |T ′ | to denote these hyperedges in T . For each T ′ , there must be at least one vertex on which the (modified) coupling fails, which happens with probability at most 5/t * due to Constraints 3. Since T is a {2, 3}tree in Lin(H ), all of these failed couplings are for distinct vertices and thus happen independently. Hence, in this new coupling, the probability that every edge in T ′ is blocked due to at least one failed vertex is at most 5 We still need to apply a union bound. The number of {2, 3}-trees of size ℓ in Lin(H ) and containing e 0 is, by Corollary 12, at most ek 3 ∆ 3 2 ℓ . Therefore the right-hand side of (14) is at most since we have chosen t * = 5 e 2 k 3 ∆ 3 2 1 1−β in Constraints 3. The lemma follows by combining (12), (14), and (15). □ Note that in Lemma 17 we do not explicitly require a lower bound of q nor a lower bound on the size of the edges. However, these requirements are implicit since we have set t * to be large in Constraints 3.
Lemma 15 and Lemma 17 also hold for any σ ∈ C 2 . Now we can prove that any solution to the LP provides accurate estimates. Proof. Let ℓ = L k 3 ∆ 2 . Let Z 1 := σ ∈ C 1 (x,y)∈ L(T): |V col |<L and σ |=x p x x,y .
Exchange the order of summation: A similar quantity Z 2 can be defined and bounded by replacing p x x,y with p y x,y . Constraints 1 impose that for any (x, y) ∈ L(T ) such that |V col | < L, Hence, We will relate |C 1 | with Z 1 . It is easy to see, by (12), that The lower bound is more complicated: where in the first line we use Lemma 15 and in the second line we use Lemma 17. Similar bounds hold with |C 2 | and Z 2 . Combining (16), (17), (18), and their counterparts for |C 2 | and Z 2 , we have that We then set up a binary search, to find r and r that are close enough to the true ratio.
We are now ready to prove the main theorem of this section.
Proof of Theorem 9. Take L = k 3 ∆ 2 log 2(1+t * ) ε so that We claim the true values of p x x,y , p y x,y always satisfy our LP. This is trivial for Constraints 1 and 2. For Constraints 3, recall that t * = 5 e 2 k 3 ∆ 3 2 1 1−β > k and we only need to verify the conditions of Lemma 7 with t = t * . At any point of Algorithm 1, the size of an edge is at least k 1 − k 2 . Hence we set k ′ = k 1 − k 2 in Lemma 7. By our assumption, Fix the colour c. It follows from Lemma 18 that for every c ′ ∈ [q], we can apply the binary search algorithm to obtain a value p c ′ , which is an estimate of We then use p : Therefore, the total running time of our estimator is poly 1 ε . □

APPROXIMATE COUNTING
Now we give our FPTAS for the number of proper q-colourings of a k-uniform hypergraph H with maximum degree ∆. The next lemma guarantees us a "good" proper colouring σ so that we can use the algorithm in Theorem 9 to compute the marginal probability of σ .
Lemma 19. Let k C 1 be an integer such that Let v 1 , . . . , v n be an arbitrary ordering of the vertices of a k-uniform hypergraph H = (V , E). There exists a proper colouring σ such that for every hyperedge e ∈ E, the partial colouring σ restricted to the first k −k C 1 vertices is not monochromatic. Moreover, σ can be found in deterministic polynomial time.
Proof. Let k ′ = k − k C 1 . Consider a new hypergraph H ′ = (V , E ′ ) on the same vertex set V , but for every e ∈ E, we replace it with its first k ′ vertices. We set x(e) = 1 k ′ ∆ in Theorem 3 and verify (1) for every e ∈ E ′ , Hence, Theorem 3 implies that there exists a proper colouring σ in H ′ , which satisfies the requirement of the lemma.
In order to find σ , we have left a bit slack in our bound on q. Thus the deterministic algorithm from [MT10] applies. □ Theorem 20. Assume the conditions of Theorem 9 (on q, ∆, k, k 1 , k 2 , and β) with k 1 = k C 1 hold, together with the conditions of Lemma 19. There is an FPTAS for the number of proper q-colourings of a k-uniform hypergraph H = (V , E) with maximum degree ∆.
Proof. Let n = |V |. Choose an arbitrary ordering of the vertices v 1 , . . . , v n of V . Lemma 19 implies that we can find a proper colouring σ so that any hyperedge is properly coloured by the first k − k C 1 of its vertices. Let Z = |C| be the number of proper colourings of H . For every ε > 0, we will deterministically compute a number Z in time polynomial in n and 1/ε such that e −ε Z ≤ Z ≤ e ε Z .
As before, let µ C be uniform over C, the set of all proper colourings of H . We will actually estimate µ C (σ ) = 1 Z . To this end, we create a sequence of hypergraphs {H i } with pinnings {P i } inductively. Let H 1 = H and P 1 be empty. Given H i = (V i , E i ) and P i , we find the next vertex u i under the ordering that are contained in at least one hyperedge of H i . We pin the colour of u i to be σ (u i ). This induces a pinning P i+1 on all hyperedges in E i . Then, H i+1 is obtained by removing u i from V i and removing all hyperedges that are properly coloured under P i+1 from E i . We also truncate the pinning P i+1 accordingly. If for some n ′ ≤ n, E n ′ is empty, then this process terminates. Notice that the construction above yields a subset of vertices u 1 , . . . , u n ′ where n ′ ≤ n. Their ordering is consistent with the given ordering.
We claim that for any i ∈ [n ′ ], for any e ∈ E i , it satisfies that k C 1 ≤ |e | ≤ k. This is because an edge e shrinks in size in the process when vertices are pinned according to σ . However, Lemma 19 guarantees that the edge e will be removed in the process above before k − k C 1 vertices are coloured. Therefore, together with our assumptions, Theorem 9 applies with k 1 = k C 1 . Let p i be the marginal probability of colour σ (u i ) at u i in H i with pinning P i . Let p i = 1 q for all i ≥ n ′ . It is easy to see that Z −1 = µ C (σ ) = n i=1 p i . Thus we can obtain our desired estimate Z by approximating each p i within e ± ε n . To this end, we appeal to Theorem 9 with ε ′ = ε n . □

SAMPLING
Finally we give the algorithm to sample proper colourings almost uniformly. As usual, let H (V , E) be a k-uniform hypergraph with maximum degree ∆, q be the number of colours, and C be the set of proper colourings. Let n = |V |. Algorithm 2 samples a colouring in C within total variation distance ε from µ C . Similar to the coupling process in Section 3, we assume that there is an arbitrary fixed ordering of all vertices and hyperedges. There is a parameter 0 < k S 1 < k − 1 in Algorithm 2, which will be set in Section 7. We first assume that at Line 9, the oracle call to Theorem 9 is always within the correct range. This simplification allows us to identify a threshold involving the parameter k S 1 to guarantee small connected components, which will be put together with the conditions of Theorem 9 later.
Algorithm 2 An almost uniform sampler for proper colourings 1: Input: A k-uniforom hypergraph H (V , E) with maximum degree ∆ and 0 < ε < 1 2: Output: A colouring in C 3: Let X be the partial colouring that X (v) = − for every v ∈ V initially; 4: while E is nonempty do 5: Choose the first uncoloured v ∈ V such that every e ∈ Γ(v) contains > k S 1 uncoloured vertex; 6: if no such vertex v exists then Apply the algorithm in Theorem 9 to compute the marginal distribution on v with precision ε 2n , and extend X with the colour on v according to the distribution; 10: Remove from E all hyperedges that are now satisfied. Proof. The proof idea is to show the existence of a large components in H S implies the existence of a large {2, 3}-tree in Lin(H ) whose vertices are edges that are not satisfied but k − k S 1 of their vertices are already coloured. Then we show the probability of the latter event is small. Now assume that the sampler ends the WHILE loop with a partial colouring X and H S . We say an edge e ∈ E is bad if X does not satisfy e and |e ∩ S | = k S 1 , namely e is partially monochromatic under X but k − k S 1 vertices have been coloured. Also, say a vertex v ∈ S is blocked by an edge e ∈ E if v ∈ e and e is bad.
Fix an arbitrary hyperedge e 0 that is bad, and e 0 is contained in a connected component of size at least L in H S . We denote the set of vertices of this component by U and its induced hypergraph H U . It is clear that every vertex in S is blocked by some bad edge.
Let F be the set of all bad edges incident to U . Then e 0 ∈ F . Since every vertex in U is blocked by some edge in F and every edge in F contains at most k vertices, |F | ≥ L k . We claim that F is connected in L 2 (H ). The reason is the following. For any two edges, say e 1 , e 2 ∈ F , since H U is connected, there exists a path in H U connecting e 1 and e 2 . Every vertex along this path must be blocked by some edge in F . Each adjacent pair of vertices along this path corresponds to a pair of edges in F that have distance at most 2 in Lin(H ).
Lemma 13 implies that F contains a {2, 3}-tree of size at least ℓ = L k 2 ∆ containing e 0 . Fix such a {2, 3}-tree T = e 1 , . . . , e |T | . Let µ be the distribution of our sampler at the end of the WHILE loop.

It holds that
Since e i ∩ e j = ∅ for every i j and Theorem 9 guarantees our estimated marginals are within e ε /2n , for every 1 ≤ i ≤ |T |, we can apply Lemma 7 with k ′ = k S 1 and t = k, Applying Lemma 7 requires that q > (ek∆) . Then by the union bound, the probability that H S contains a component with size at least L is at most where the term |n∆| ≥ |E | accounts for the choice of e 0 . By assumption, As L = k 2 ∆ log 2n∆ ε and ℓ = L k 2 ∆ , e −ℓ ≤ ε 2n∆ . Hence, by (19) the probability in Line 14 is at most Now we are ready to give the sampling algorithm.
Theorem 22. Assume the conditions of Theorem 9 (on q, ∆, k, k 1 , k 2 , and β) with k 1 = k S 1 hold, together with the conditions of Lemma 21. For any k-uniform hypergraph H = (V , E) with maximum degree ∆ and ε > 0, Algorithm 2 outputs a proper colouring whose distribution is within ε total variation distance to the uniform distribution, and the running time is poly(n, 1 ε ) where n = |V |.
Proof. First we check that the condition of Theorem 9 is met with k 1 = k S 1 , when it is called in Algorithm 2 at Line 9. This is because whenever we colour a vertex, we make sure that all hyperedges have at least k S 1 uncoloured vertices afterwards. Hence we apply Theorem 9 with the pinnings P induced by the partial colouring X so far.
We use µ(·) to denote the distribution of the final output of Algorithm 2. Recall thet µ C is the uniform distribution over C. We shall bound the total variation distance dist T V (µ C , µ). To this end, we introduce two intermediate distributions: Let µ 1 (·) be the distribution obtained from the output of Algorithm 2 but ignoring the condition on line 14 in Algorithm 2. Namely, it never checks the size of connected components in H S and proceed to enumerate all the proper colourings on S in any case. This is unrealistic since doing so would require exponential time. We also define another distribution µ 2 (·), which is the same as µ 1 (·) except at line 9, it uses the true marginal instead of the estimate by calling Theorem 9.
Denote by B the event that the condition on line 14 holds. Let p fail be the probability of event B. By Lemma 21, p fail ≤ ε/2.
First note that µ 2 = µ C . Consider the distribution of the partial colouring obtained immediately after the WHILE loop, i.e., the partial colouring X . One can apply induction similar to the proof of Lemma 8 to show that it follows a pre-Gibbs distribution. Therefore, conditioned on X , sampling a uniform proper colouring of the remaining vertices results in a uniform proper colouring.
We then bound dist T V (µ 1 , µ 2 ). For a particular partial colouring x, we use E x to denote the event that the sampler produces x at the end of the WHILE loop, namely X = x. It holds that where x runs over partial colourings. The partial colouring x may never appear at the end of the WHILE loop in Algorithm 2. In this case, Otherwise x can be the partial colouring at the end of the WHILE loop. Since the enumeration steps are identical and correct in both µ 1 and µ 2 conditioned on E x , we have that where C x is again the set of proper colourings consistent with the partial colouring x. It implies that dist T V (µ 1 , µ 2 ) = 1 2 σ ∈ C x : σ |=x (20) Fix a partial colouring x defined on V col ⊆ V that is a possible output of the WHILE loop. We note that the order of visiting V col is determined by the random choices of x. Say this order is v 1 , . . . , v s . Let where p i is our estimate of p i using Theorem 9 with error ε 2n . Theorem 9 implies that e − ε 2n p i ≤ p i ≤ e ε 2n p i .

Therefore, we have
Plugging (21) into (20), we obtain Finally we bound dist T V ( µ, µ 1 ). Since the behaviours of µ and µ 1 are identical if B does not happen, we have that Pr Z ∼ µ Z = σ B = Pr Z ∼µ 1 Z = σ B . It implies that dist T V ( µ, µ 1 ) ≤ p fail .
It remains to bound the running time of the sampler. The sampler calls subroutines to estimate marginal at most n times and each time the subroutine costs poly(n, 1 ε ). Finally, upon the condition on line 14 does not hold, the sampler enumerates proper colourings on connected components of size O(log n ε ). Therefore, the total running time is poly(n, 1 ε ). □ The distribution µ 1 has a small multiplicative error comparing to the uniform distribution µ C . We remark that there are standard algorithms to turn such a distribution into an exact sampler, dating back to [Bac88,JVV86]. However, since we cannot completely avoid event B, we can only bound the error in the final distribution µ in terms of total variation distance.

SETTLING ALL PARAMETERS
We have defined the following parameters throughout the paper: • k C 1 : the number of vertices in a hyperedge that are not fixed in approximate counting, Theorem 20; • k S 1 : the number of vertices in a hyperedge that are not fixed in sampling, Theorem 22; • k 2 : the number of vertices in a hyperedge Algorithm 1 would attempt to couple; • β: the fraction of hyperedges that are monochromatic in Definition 14. We want our bound for approximate counting to have the form C∆ One can verify that k ≥ 28 and C ≥ 798 suffice. This yields Theorem 2. We note that these constraints also hold for k ≥ 6 and C ≥ 3 × 10 10 .

CONCLUDING REMARKS
In this paper we give approximate counting and sampling algorithms for hypergraph colourings, when the parameters are in the local lemma regime. One important open question is how to get an optimal constant in the exponent of ∆ in Theorem 1 and 2. This constant comes from three places: to bound the number of "bad colourings" (Lemma 15), to bound the errors (in the LP) incurred by "good colourings" (Lemma 17), and finally to leave some slack for either counting (Theorem 20) or sampling (Theorem 22). It seems to us that the last slack is difficult to reduce, and a tighter result, if possible, would come from improvements on the first two parts, although our analysis has been pushed to the limit.
Another future direction is to generalize this approach for general constraint satisfaction problems (CSP), or equivalently, the general setup of the (variable version) local lemma. Our analysis relies on some crucial property of hypergraph colourings, that all constraints can be satisfied by partial assignments, ideally with appropriate probabilities. To be more specific, suppose a constraint C contains k variables. We require a property that, when a subset of k ′ variables are randomly assigned, the probability that C is still not satisfied is roughly c −k ′ for some constant c > 1. This property does not necessarily hold in general, even for symmetric constraints. One such example is when the variables take values from [q], and the constraint is satisfied unless the sum of all its variables is 0 modulo q. We can take q to be large so that the strong local lemma conditions hold, and yet this constraint cannot be satisfied by any subset of variables. In particular, it is problematic to bound our definition of "bad colourings" (Definition 14) when constraints cannot be satisfied by partial assignments. New ideas are required to handle more general settings.