You are currently browsing the tag archive for the ‘factorial function’ tag.

I’ve just uploaded to the arXiv my paper “Products of consecutive integers with unusual anatomy“. This paper answers some questions of Erdős and Graham which were initially motivated by the study of the Diophantine factorial equation

\displaystyle  a_1! a_2! a_3! = m^2

where {a_1 < a_2 < a_3} and {m} are positive integers. Writing {(a,N,N+H) = (a_1,a_2,a_3)}, one can rewrite this equation as

\displaystyle  s( (N+1) \dots (N+H) ) = s(a!) \ \ \ \ \ (1)

where {s(n)} denotes the squarefree part of {n} (the smallest factor of {n} formed by dividing out a perfect square). For instance, we have

\displaystyle s(8 \times 9 \times 10) = 5 = s(6!)

which corresponds to the solution {6! 7! 10! = (6!\times 7!)^2} to the original equation.

The equation (1) ties into the general question of what the anatomy (prime factorization) of the product {(N+1) \dots (N+H)} looks like. This is a venerable topic, with the first major result being the Sylvester-Schur theorem from 1892 that the largest prime factor of {(N+1) \dots (N+H)} is greater than {H}. Another notable result is the Erdős-Selfridge theorem that the product {(N+1) \dots (N+H)} is never a perfect power for {H > 1}.

Erdős and Graham were able to show that solutions to (1) were somewhat rare, in that the set of possible values of {N+H} had density zero. For them, the hardest case to treat was when the interval {\{N+1,\dots,N+H\}} was what they called bad, in the sense that {(N+1) \dots (N+H)} was divisible by the square of its largest prime factor. They were able, with some effort, to show that the union of all bad intervals also had density zero, which was a key ingredient in to prove the previous result about solutions to (1). They isolated a subcase of the bad intervals, which they called the very bad intervals, in which the product {(N+1) \dots (N+H)} was a powerful number (divisible by the square of every prime factor).

A later paper of Luca, Saradha, and Shorey made the bounds more quantitative, showing that both the set of values of {N+H}, as well as the union of bad intervals, had density {O( \exp(-c \log^{1/4} x (\log\log x)^{3/4})} for some absolute constant {c>0}. In the other direction, just by considering the case {H=1}, one can show that the number of possible values of {N+H} up to {x} is {\gtrsim c_3^1 \sqrt{x}}, where {c_3^1} is the constant

\displaystyle  c_3^1 = 1 + \sum_{a \geq 2: a \neq n^2 \forall n} \frac{1}{s(a!)^{1/2}} = 3.70951\dots.

As for the bad intervals, by again considering the case {H=1}, it is possible to show that the number of bad points up to {x} is

\displaystyle  x / \exp((\sqrt{2}+o(1)) \sqrt{\log x \log\log x});

see for instance this paper of Ivic. Similarly, the union of the very bad intervals contains as the set of powerful numbers; Golomb worked out that the number of powerful numbers up to {x} is {\sim \frac{\zeta(3/2)}{\zeta(3)} \sqrt{x}}.

It was conjectured by Erdős and Graham that all of these lower bounds are in fact sharp (up to {1+o(1)} multiplicative factors); this is Erdos Problem 380 (and a portion of Erdos Problem 374). The main result of this paper is to confirm this conjecture in two cases and come close in the third:

Theorem 1
  • The number of numbers up to {x} that lie in a bad interval of length {H>1} is {O(1/\log^{1-o(1)} x)} of the number of bad points up to {x}.
  • The number of numbers up to {x} that lie in a very bad interval of length {H>1} is {O(x^{2/5+o(1)})}.
  • The number of numbers up to {x} of the form {N+H} for a solution to (1) is {O(x^{1/2+o(1)})}.

Not surprisingly, the methods of proof involve many standard tools in analytic number theory, such as the prime number theorem (and its variants in short intervals), zero density estimates, Vinogradov’s bounds on exponential sums, asymptotics for smooth numbers, the large sieve, the fundamental lemma of sieve theory, and the Burgess bound for character sums. There was one point where I needed a small amount of algebraic number theory (the classification of solutions to a generalized Pell equation), which was the one place where I turned to AI for assistance (though I ended up rewriting the AI argument myself). One amusing point is that I specifically needed the recent zero density theorem of Guth and Maynard (as converted to a bound on exceptions to the prime number theorem in short intervals by Gafni and myself); previous zero density theorems were barely not strong enough to close the arguments.

A few more details on the methods of proof. It turns out that very bad intervals, or intervals solving (1), are both rather short, in that the bound {H \leq \exp(\log^{2/3+o(1)} N)} holds. The reason for this is that the primes {p} that are larger than {H} (in the very bad case) or {C H \log N} for a large constant {C} (in the (1) case) cannot actually divide any of the {N+1,\dots,N+H} unless they divide it at least twice. This creates a constraint on the fractional parts of {N/p} and {N/p^2} that turns out to be inconsistent with the equidistribution results on those fractional parts coming from Vinogradov’s bounds on exponential sums unless {H} is small. In the very bad case, this forces a linear relation between two powerful numbers; expressing powerful numbers as the product of a square and a cube, matters then boil down to counting solutions to an equation such as

\displaystyle  n_1^2 n_2^3 + 1 = m_1^2 m_2^3

with say {n_1,n_2,m_1,m_2 \leq x^{1/5}}. The number of solutions here turns out to be {O(x^{2/5})} by work of Aktas-Murty and of Chan; we generalize the arguments in the former to handle a slightly more general equation. A similar argument handles solutions to (1), except in one regime where the parameter {a} is somewhat large (comparable to {H \log N}), in which case one instead collects some congruence conditions on {N} and applies the large sieve.

The situation with bad intervals is more delicate, because there is no obvious way to make {H} small in all cases. However, by the large sieve (as well as the Guth–Maynard theorem), one can show that the contribution of large {H} is negligible, and from bounds on smooth numbers one can show that the interval {\{N+1,\dots,N+H\}} contains a number with a particularly specific anatomy, of the form {p_0^2 p_1 \dots p_{1000} m'} where {p_0, \dots, p_{1000}} are all primes of roughly the same size, and {m'} is a smoother factor involving smaller primes. The rest of the bad interval creates some congruence conditions on the product {p_1 \dots p_{1000}}. Using some character sum estimates coming from the Burgess bounds, we find that the residue of {p_1\dots p_{1000}} becomes fairly equidistributed amongst the primitive congruence classes to a given modulus when one perturbs the primes {p_1,\dots,p_{1000}} randomly (there are some complications from exceptional characters of Siegel zero type, but we can use a large values estimate to keep their total contribution under control). This allows us to show that the congruence conditions coming from the bad interval are restrictive enough to make non-trivial bad intervals quite rare compared to bad points. One innovation in this regard is to set up an “anti-sieve”: the elements of a bad interval tend to have an elevated chance of being divisible by small primes, and one can use moment methods to show that an excessive number of small prime divisors is somewhat rare. This can be compared to standard sieve arguments, which often seek to limit the event that a number has an unexpectedly deficient number of small prime divisors.

I’ve just uploaded to the arXiv the paper “Decomposing a factorial into large factors“. This paper studies the quantity {t(N)}, defined as the largest quantity such that it is possible to factorize {N!} into {N} factors {a_1, \dots, a_N}, each of which is at least {t(N)}. The first few values of this sequence are

\displaystyle  1,1,1,2,2,2,2,2,3,3,3,3,3,4, \dots

(OEIS A034258). For instance, we have {t(9)=3}, because on the one hand we can factor

\displaystyle  9! = 3 \times 3 \times 3 \times 3 \times 4 \times 4 \times 5 \times 7 \times 8

but on the other hand it is not possible to factorize {9!} into nine factors, each of which is {4} or higher.

This quantity {t(N)} was introduced by Erdös, who asked for upper and lower bounds on {t(N)}; informally, this asks how equitably one can split up {N!} into {N} factors. When factoring an arbitrary number, this is essentially a variant of the notorious knapsack problem (after taking logarithms), but one can hope that the specific structure of the factorial {N!} can make this particular knapsack-type problem more tractable. Since

\displaystyle  N! = a_1 \dots a_N \geq t(N)^N

for any putative factorization, we obtain an upper bound

\displaystyle  t(N) \leq (N!)^{1/N} = \frac{N}{e} + O(\log N) \ \ \ \ \ (1)

thanks to the Stirling approximation. At one point, Erdös, Selfridge, and Straus claimed that this upper bound was asymptotically sharp, in the sense that

\displaystyle  t(N) = \frac{N}{e} + o(N) \ \ \ \ \ (2)

as {N \rightarrow \infty}; informally, this means we can split {N!} into {N} factors that are (mostly) approximately the same size, when {N} is large. However, as reported in this later paper, Erdös “believed that Straus had written up our proof… Unfortunately Straus suddenly died and no trace was ever found of his notes. Furthermore, we never could reconstruct our proof, so our assertion now can be called only a conjecture”.

Some further exploration of {t(N)} was conducted by Guy and Selfridge. There is a simple construction that gives the lower bound

\displaystyle  t(N) \geq \frac{3}{16} N - o(N)

that comes from starting with the standard factorization {N! = 1 \times 2 \times \dots \times N} and transferring some powers of {2} from the later part of the sequence to the earlier part to rebalance the terms somewhat. More precisely, if one removes one power of two from the even numbers between {\frac{3}{8}N} and {N}, and one additional power of two from the multiples of four between {\frac{3}{4}N} to {N}, this frees up {\frac{3}{8}N + o(N)} powers of two that one can then distribute amongst the numbers up to {\frac{3}{16} N} to bring them all up to at least {\frac{3}{16} N - o(N)} in size. A more complicated procedure involving transferring both powers of {2} and {3} then gives the improvement {t(N) \geq \frac{1}{4} N - o(N)}. At this point, however, things got more complicated, and the following conjectures were made by Guy and Selfridge:
  • (i) Is {\frac{t(N)}{N} \leq \frac{1}{e}} for all {N \neq 1,2,4}?
  • (ii) Is {t(N) \geq \lfloor 2N/7 \rfloor} for all {N \neq 56}? (At {N=56}, this conjecture barely fails: {t(56) = 15 < 16 = \lfloor 2 \times 56/7 \rfloor}.)
  • (iii) Is {\frac{t(N)}{N} \geq \frac{1}{3}} for all {N \geq 300000}?

In this note we establish the bounds

\displaystyle  \frac{1}{e} - \frac{O(1)}{\log N} \leq \frac{t(N)}{N} \leq \frac{1}{e} - \frac{c_0+o(1)}{\log N} \ \ \ \ \ (3)

as {N \rightarrow \infty}, where {c_0} is the explicit constant

\displaystyle  c_0 := \frac{1}{e} \int_0^1 \left \lfloor \frac{1}{x} \right\rfloor \log \left( ex \left \lceil \frac{1}{ex} \right\rceil \right)\ dx \approx 0.3044.

In particular this recovers the lost result (2). An upper bound of the shape

\displaystyle  \frac{t(N)}{N} \leq \frac{1}{e} - \frac{c+o(1)}{\log N} \ \ \ \ \ (4)

for some {c>0} was previously conjectured by Erdös and Graham (Erdös problem #391). We conjecture that the upper bound in (3) is sharp, thus

\displaystyle  \frac{t(N)}{N} = \frac{1}{e} - \frac{c_0+o(1)}{\log N}, \ \ \ \ \ (5)

which is consistent with the above conjectures (i), (ii), (iii) of Guy and Selfridge, although numerically the convergence is somewhat slow.

The upper bound argument for (3) is simple enough that it could also be modified to establish the first conjecture (i) of Guy and Selfridge; in principle, (ii) and (iii) are now also reducible to a finite computation, but unfortunately the implied constants in the lower bound of (3) are too weak to make this directly feasible. However, it may be possible to now crowdsource the verification of (ii) and (iii) by supplying a suitable set of factorizations to cover medium sized {N}, combined with some effective version of the lower bound argument that can establish {\frac{t(N)}{N} \geq \frac{1}{3}} for all {N} past a certain threshold. The value {N=300000} singled out by Guy and Selfridge appears to be quite a suitable test case: the constructions I tried fell just a little short of the conjectured threshold of {100000}, but it seems barely within reach that a sufficiently efficient rearrangement of factors can work here.

We now describe the proof of the upper and lower bound in (3). To improve upon the trivial upper bound (1), one can use the large prime factors of {N!}. Indeed, every prime {p} between {N/e} and {N} divides {N!} at least once (and the ones between {N/e} and {N/2} divide it twice), and any factor {a_i} that contains such a factor therefore has to be significantly larger than the benchmark value of {N/e}. This observation already readily leads to some upper bound of the shape (4) for some {c>0}; if one also uses the primes {p} that are slightly less than {N/e} (noting that any multiple of {p} that exceeds {N/e}, must in fact exceed {\lceil N/ep \rceil p}) is what leads to the precise constant {c_0}.

For previous lower bound constructions, one started with the initial factorization {N! = 1 \times \dots \times N} and then tried to “improve” this factorization by moving around some of the prime factors. For the lower bound in (3), we start instead with an approximate factorization roughly of the shape

\displaystyle  N! \approx (\prod_{t \leq n < t + 2N/A, \hbox{ odd}} n)^A

where {t} is the target lower bound (so, slightly smaller than {N/e}), and {A} is a moderately sized natural number parameter (we will take {A \asymp \log^3 N}, although there is significant flexibility here). If we denote the right-hand side here by {B}, then {B} is basically a product of {N} numbers of size at least {t}. It is not literally equal to {N!}; however, an easy application of Legendre’s formula shows that for odd small primes {p}, {N!} and {B} have almost exactly the same number of factors of {p}. On the other hand, as {B} is odd, {B} contains no factors of {2}, while {N!} contains about {N} such factors. The prime factorizations of {B} and {N!} differ somewhat at large primes, but {B} has slightly more such prime factors as {N!} (about {\frac{N}{\log N} \log 2} such factors, in fact). By some careful applications of the prime number theorem, one can tweak some of the large primes appearing in {B} to make the prime factorization of {B} and {N!} agree almost exactly, except that {B} is missing most of the powers of {2} in {N!}, while having some additional large prime factors beyond those contained in {N!} to compensate. With a suitable choice of threshold {t}, one can then replace these excess large prime factors with powers of two to obtain a factorization of {N!} into {N} terms that are all at least {t}, giving the lower bound.

The general approach of first locating some approximate factorization of {N!} (where the approximation is in the “adelic” sense of having not just approximately the right magnitude, but also approximately the right number of factors of {p} for various primes {p}), and then moving factors around to get an exact factorization of {N!}, looks promising for also resolving the conjectures (ii), (iii) mentioned above. For instance, I was numerically able to verify that {t(300000) \geq 90000} by the following procedure:

  • Start with the approximate factorization of {N!}, {N = 300000} by {B = (\prod_{90000 \leq n < 102000, \hbox{ odd}} n)^{50}}. Thus {B} is the product of {N} odd numbers, each of which is at least {90000}.
  • Call an odd prime {B}-heavy if it divides {B} more often than {N!}, and {N!}-heavy if it divides {N!} more often than {B}. It turns out that there are {14891} more {B}-heavy primes than {N!}-heavy primes (counting multiplicity). On the other hand, {N!} contains {2999992} powers of {2}, while {B} has none. This represents the (multi-)set of primes one has to redistribute in order to convert a factorization of {B} to a factorization of {N!}.
  • Using a greedy algorithm, one can match a {B}-heavy prime {p'} to each {N!}-heavy prime {p} (counting multiplicity) in such a way that {p' \leq 2^{m_p} p} for a small {m_p} (in most cases one can make {m_p=0}, and often one also has {p'=p}). If we then replace {p'} in the factorization of {B} by {2^{m_p} p} for each {N!}-heavy prime {p}, this increases {B} (and does not decrease any of the {N} factors of {B}), while eliminating all the {N!}-heavy primes. With a somewhat crude matching algorithm, I was able to do this using {\sum_p m_p = 39992} of the {299992} powers of {2} dividing {N!}, leaving {260000} powers remaining at my disposal. (I don’t claim that this is the most efficient matching, in terms of powers of two required, but it sufficed.)
  • There are still {14891} {B}-heavy primes left over in the factorization of (the modified version of) {B}. Replacing each of these primes with {2^{17} \geq 90000}, and then distributing the remaining {260000 - 17 \times 14891 = 6853} powers of two arbitrarily, this obtains a factorization of {N!} into {N} terms, each of which are at least {90000}.

However, I was not able to adjust parameters to reach {t(300000) \geq 100000} in this manner. Perhaps some readers here who are adept with computers can come up with a more efficient construction to get closer to this bound? If one can find a way to reach this bound, most likely it can be adapted to then resolve conjectures (ii) and (iii) above after some additional numerical effort.

UPDATE: There is now an active Github project to track the latest progress, coming from multiple contributors.

In this supplemental set of notes we derive some approximations for {n!}, when {n} is large, and in particular Stirling’s formula. This formula (and related formulae for binomial coefficients {\binom{n}{m}} will be useful for estimating a number of combinatorial quantities in this course, and also in allowing one to analyse discrete random walks accurately.

Read the rest of this entry »

Archives