The Evolution of Mathematical Notation: Part 2

This is a continuation of an article written a while ago pertaining to the usage of mathematical notation and its evolution. We have covered most of the history of the evolution of modern mathematical notation. In this section we shall discuss a fun detour, comparing mathematical notation to what I would argue is a programming language.

An Imperative detour

Prime numbers are a cornerstone of our modern cryptography. Yet, how would you evaluate the $n^\text{th}$ prime number?

From the perspective of programming this is a trivial task that any self-respecting programmer should be able to code up in a couple of minutes. You plug in the definition of prime numbers and directly evaluate.

The simplest approach, which is true by definition, is to check if the number is divisible by any of the prime factors from nil to the square root¹. At some point one will have written the following code.

primes.extend([2]) // and also add 1, if you think that's a good idea
'outer: for p in 2..∞:
  for f in 2..√p:
   if p mod f == 0:
      continue 'outer;
  primes.push(p)

This is rather a crude approximation, but not a bad one. You could make this a bit better, by noting that the only even prime is $2$, therefore you only need to check the odd numbers in the range. Better yet, if you have a bit more memory, you could access the set of primes that you've already generated by creating a table to find out the index of the prime that is no greater than a given number, and avoid checking for divisibility by the same number.

This is what I would call a first approximation sieve of Eratosthenes. As you may have guessed, this method was known since at least Eratosthenes himself was alive. In all fairness, perhaps a school student would not be able to come up with the later version, but at least the naive version checking every divisor in a range is a trivial operation to compose in any self-respecting programming language.

Note

One could be curious to see if the functional representation is any clearer. From my perspective, the efficient, tail-recursive version is not.

Is this version computationally efficient? Surprisingly so. Although a mathematician would have preferred what is known as a closed form representation, looking for one is both a daunting task, and not necessarily resulting in a better outcome.

Binet's formula

To understand why mathematicians prefer what are known as closed-form solutions, let us consider a different problem, where closed-forms are more useful.

Specifically, let's look at finding the $n^\text{th}$ Fibonacci number. By using the same approach we have used before, we could write out a simple linear algorithm for evaluating the number for a given index.

When I say linear, however, I mean linear in the size of the number. As it turns out, however, when we say linear in e.g. Cormen, Leicerson and Stein, but not e.g. in the real world, we sort of neglect that evaluating a formula is considered an $O(1)$ operation, despite often being a rather involved algebraic manipulation.

As such because we are mostly optimising for the notation that does not see arbitrarily complex arithmetic operations as adding computational complexity, we can replace the linear algorithm with a constant-time algorithm by virtue of a closed-form solution:

\[ F_n = \frac{1}{\sqrt{5}} \left[ \left( \frac{1 + \sqrt{5}}{2} \right)^n - \left( \frac{1 - \sqrt{5}}{2} \right)^n \right] \]

This is a well known fact in many circles. Intuitively this makes absolutely no sense, but it takes little effort and some mathematical induction to prove to oneself that this is indeed describing a Fibonacci sequence starting from $1, 1$ and progressing onwards. How it was arrived at is an interesting matter, which we shall describe in a few moments.

But for the time being, consider the complexity of evaluating this formula for any number. Suppose we do it for $1$. We cancel the square roots and arrive at 1. How about $3$. Well, I can assure you that only a small fraction of the readers will even attempt to evaluate this number, much less accurately. At best you will use a calculator, which shows you the magic, but also avoids teaching you how labour-intensive this operation truly is.

From the perspective of a programmer, this solution has some quirks. Firstly, let's consider the algorithmic complexity of this operation. Naively, this is $O(1)$, because you were told and taught that. In reality the answer depends on the implementation. If say you did the operation honestly by keeping track of the symbols, evaluating a polynomial raised to a power, did the cancellations and used the fact that $\sqrt{5}$ is a number that when raised to the power of $2$ gives you $5$, and only that fact… you would hardly end up with an $O(1)$ solution. In fact, simply raising the polynomials to the power of $n$ is already a $O(n)$ problem.

On the other hand there is a solution to which both idiots and geniuses would gravitate, with only one of those group being able to explain why this works, and the other appealing to the cargo cult. Indeed, in the real world, you would compute elements of the Fibonacci sequence by means of IEEE-754 floating point numbers, and then convert the result into an integer. This would allow you to avoid the explosion of terms inherent in tracking the polynomials raised to the $n$. Instead you obtain approximations of the so-called golden ratio, subtract two numbers with similar exponents, and pray that you don't lose too much precision. Beyond a certain point, you cannot trust the output of this implementation, and that point can be evaluated for a given width of floating point number. However, from the perspective of algorithmic complexity, despite what I told you, this approximate method will give you a precise answer with $O(1)$ operations.

How come?

Well, realistically, this is because exponentiating a floating point number can be done quickly. You're not raising a number to an integer power, by repeatedly multiplying it with itself. You are not even actually doing the slightly more clever operation of squaring the number first, and repeatedly halving the power, which would be $O(\log n)$. No. You're doing the following operation:

std::f64::<impl f64>::powf::h6d9fecf86d5c6f10:
	sub     rsp, 24
	movsd   qword ptr [rsp], xmm0
	movsd   qword ptr [rsp + 8], xmm1
	mov     rax, qword ptr [rip + pow@GOTPCREL]
	call    rax
	movsd   qword ptr [rsp + 16], xmm0
	movsd   xmm0, qword ptr [rsp + 16]
	add     rsp, 24
	ret

What does that do? Well it calls the pow@GOTPCREL of course. What does that then do? Depends on the platform. But I can assure you that it most likely does quite a bit of the approximate computation in logarithmic space, where exponentiation is equivalent to multiplication, and figures out the mantissa to be close-enough to the correct answer.

Unsatisfying? Indeed. But if you ask a programmer, this is the triumph of closed forms. This is the case where you could make an operation simpler by knowing some mathematics, and if you suggest to them that having a lookup table of Fibonacci numbers would also similarly reduce the time complexity to $O(1)$ without also having to sacrifice precision you would be laughed off. This is different after all.

Not exactly. We have specialised hardware that is capable of performing certain kinds of operations faster. Binet's formula, for all its faults when computing the $n^{th}$ number by hand, actually maps pretty well onto operations that we have learned to do well in computers. Using e.g. SIMD instructions to do the same to the pure memoised addition is not like that, but it is also the simplest method.

So where does that lead us. Why do we consider closed-form solutions to be that important? I have a suspicion that our education has something to do with it. When you were in school you were told that you wouldn't always have a calculator in hand, but that seems to be the opposite of the truth these days. You carry now in your pocket a device that is capable of wonderful feats of computation though is often artificially prohibited from doing so². When we learn mathematics, we learn symbolic reasoning of the kind that led to Binet's formula first, and only then learn that we can talk about algorithms in abstract.

Instead of mathematics adopting programming models the same way it did with the algebraic notation, programming languages are non-standard and are largely considered out-of-tree for most of the symbolic reasoning that came before. We would need to rewrite our books. And there have been multiple failed attempts.

The Closed form Primes

This brings me to the subject that prompted this entire series.

A lesser known mathematician, C. P. Willens has come up with a closed form solution to finding the $n^\text{th}$ prime. This solution is not considered… well, a closed form for primes, despite the fact that it very clearly does what Binet's formula did for the Fibonacci sequence.

What is perhaps even more obvious today than in 1964, when the formula was published, is that while it uses the standard Algebraic notation, it very clearly follows an algorithmic pattern of thinking. \[ F(p) = \left\lfloor cos^2\left[\pi\frac{(p-1)!+1}{p}\right] \right\rfloor \]

The mathematical insight is confined to this line. It is a construction that comes from a lesser-known theorem about Primes, that allows you construct what is known as a detector function.

Willans' great contribution is finding a closed-form for this prime detector. But finding that closed-form required less traditionally mathematical acumen, and more a constructive engineering that is more common in programming. Wilson's³ theorem states that

\[ (p-1)! = -1 (mod p) \]

In human terms, we know that the number $(p-1)! +1$ is divisible by the prime $p$, and this is only true if and only if $p$ is prime. Willans used this factoid to construct a great deal of the of the above expression. The trigonometry and the floor function are simply there to convert this factoid into a function that is $1$ for primes and $0$ otherwise.

Now if you had a Turing-complete programming language perhaps of the imperative kind, you would use this function as a black-box to construct the primes directly. Sure it would perhaps involve a great number of integer operations, given that even for primes as small as $1024$ the factorial is going to be immense, but let us humour this for a few moments.

You would perhaps create a stream of prime candidates, and choose the ones for which the function evaluates to $1$ dropping the ones that evaluate to $0$. Indeed this is what we did in the previous chapter.

However, this is not something that can be easily represented in standard algebraic notation. In effect, it is something that requires a great deal of mental gymnastics. For any given $n$, to find the $p_n$, or the $n^\text{th}$ prime, one needs the following construction

\[ p_n = 1 + \sum_{m=1}^{2^{n}} \left\lfloor {\left\lfloor \frac{n} {\sum_{j=1}^m F(j)} \right \rfloor}^{\frac{1}{n}} \right \rfloor \]

I shall spare you the details of this construction, it is done much better in this video:

Instead I would like you to focus on the following observations

Algebraic notation is Turing complete.
Due to the limitations of said notation, we have to emulate memory by assigning it to algebraic objects.
The closed-form solution is a compact, but not necessarily efficient representation of the operations that need to be performed.

That last point deserves a bit of expansion. Simply put, and in many ways with a great deal of effort, one can replicate any form of solution. It is often useful to sacrifice the computational effectiveness, to maintain a smaller space footprint in one's memory.

The Willans' formula effectively composes the $n^\text{th}$ prime from ones added to each other a given number of times. In principle, this form of operational complexity could be optimised in say a quantum computer, but otherwise does not map onto what we know as efficient operations in modern hardware.

The closed-form solution can be used for symbolic manipulation, but it is often a mistake caused by doing this far too much in school, to assume that mere manipulations of complex objects such as this one would yield any new information.

This solution is not recognised by the international community of mathematicians precisely because this solution does not solve anything. There isn't a grand conspiracy to suppress this information. This information is practically useless because the sieve of Eratosthenes would allow you to reason about much more, and would simultaneously yield a better answer.

Arithmetic Progression

While the previous two examples were there to highlight the deficiencies of the modern algebraic notation, there are a great many more examples where said notation inherently allows for a greater insight.

One key situation is one that for Karl Friedrich Gauss was trivial. Consider how you would optimise evaluation of the arithmetic progression:

\[ \sum_{n=1}^N n \]

In programming language parlance, this is effectively asking how would you optimise

  int j, accumulator;
  for (j = 1; j < N; ++j) {
  	accumulator +=j;
  }

To work on the source code, requires a great deal of formal discipline. Simply put, one needs a system to ensure that they have not, in fact, made a mistake in the process. Those guard rails seldom exist as part of the standard toolchain. To figure out how to optimise this, on each iteration one would have to look at the accumulator variable. And this offers less than no useful insight. We knew already that the terms grow in an arithmetic progression.

One could try a linear or exponential approximation. That sounds like a good trick, despite giving you a rather useless approximation.

One could then try and fit a generalised polynomial, and maybe find the one that is actually useful. However, the methodology is far from obvious. We could not reason about source code.

If we took a slightly different approach, the situation is not much better.

  let rec sum n acc =
     if n == 0 then acc else sum (n-1) (acc+n)

True, this is more compact, and one can expand the stack of evaluations to find the numbers. But one only gets the crucial insight when they fully expand the evaluation for a couple of cases.

We need a separate notation for a symbolic reasoning that is actually pretty simple. By definition of sums,

\[ \sum_{n=1}^N n = 1 + 2 + 3 + 4 + \ldots + (N-2) + (N-1) + N \]

One can already spot a pattern here. The numbers on the left are increasing by one, while numbers at the right end are decreasing by 1 from $N$. These two contributions are likely to cancel out; allowing us to rewrite the sum as

\[ \sum_{n=1}^N n = (N+1) + (N-1 + 2) + (N-2 + 3) + \ldots \\ = (N+1) + (N+1) + (N+1) + \ldots + (N+1) \\ = \frac{N}{2} (N+1) \]

We use the simple fact that there is a name for an operation of adding the number to itself $N$ times. We call that in-no-way-special operation multiplication, and even though technically we have not changed the complexity of the operation, doing the work both by hand and with a computer results in a much more efficient outcome.

But why?

Back to Basics

One needs to understand what we even consider to be a more efficient operation. Truth be told from the perspective of mathematics, you don't have more or less efficient operations. The square root of two just is, and computing it is asking a question of the like "if we started building our notions of numbers from natural numbers, then extending it to rationals, and then expressing those as decimal numbers, what would the approximate representation of the number close to $\sqrt{2}$ be in that representation. We are thus required to find a connection and to connect those numbers.

From first principles, however, what is the complexity of doing that evaluation? It, of course, depends on what we consider to be given knowledge. Evaluating the number $5$ involves either:

No effort at all, because it's just the number $5$,
Five operations, because the definition of number $5$ is as a successor of $4$ which is itself a successor of $3$, which succeeds $2$, which succeeds $1$, which succeeds $0$, which is the neutral element.

Adding two numbers from the perspective of the fundamental axioms can also have more than one complexity. From the perspective of the way in which natural numbers are defined, adding a number $n$ to the number $m$ involves $O(n+m)$ operations, because you need to evaluate the recursive set of all successor calls, come to zero in both cases, glue the successors at zero, compose a string again, and count off how many successor operations you need to end up at the sum.

Note

If this particular approach may seem baroque, consider that this is exactly what you would do with Roman numerals, or with finger counting.

With a little bit of symbolic reasoning, and by introducing the concept of multiplication, you can then come back and use the positional system of number recording. Within that system, you can apply some heuristics to simplify the computations. Those heuristics are what you use for adding two large numbers with pen and paper. You use a positional system to represent the number of tallies in terms of convenient landmarks, which are the powers of the chosen base.

One such heuristic is positional addition with carries. If you know the digit makeup of the representation of number $n$ and the digit makeup of $m$, you can figure out the digit makeup of the number $n+m$, and it would require $O(\max \{\log_{10} n, \log_{10} m \})$ operations.

This by itself is not an optimisation: if we want to expand the tallies, this would still be $O(n+m)$. But because we have re-framed the problem from evaluating the string of successor evaluations to finding the representation from given representations, we have made it possible to operate on a great deal more numbers with the same effort.

So how does Gauss' insight help us find the arithmetic progression sums? Well, the heuristic is that we can evaluate the multiplication of two numbers relatively more easily than any other similar combination. We can remember the multiplication table, because this is a commonly used thing that we evaluate, but we should not necessarily remember the table of partial arithmetic sums, because their connection to the multiplication table is compact.

Binet's formula is thus simply a compact representation of the fact that accidentally, the quadratic equation

\[ x^2 + x -1 = 0 \]

can behave somewhat similar to the terms of the Fibonacci sequence, hijack that connection and compactly represent it. Evaluating that formula itself becomes a primitive description of an algorithm.

The Role of Notation

The notation that we use is a one-to-one mapping between agreed-upon standard operations, which are arbitrary by the way, that allows you to reason about the algorithm.

Imperative and functional programming languages, are similarly descriptions of algorithms, and reasoning about them is also something that we ought to learn to do, yet, algebra classes are mandatory for children, while debugger lessons are optional even for seasoned programmers.

And I have a feeling that this is because the programming languages are mostly designed to reason about problems in a very specific way. To be much more precise, they are more useful for understanding the transformations of state that is similar to the Von-Neumann model, and not by any means to reflect our symbolic understanding.

For the record, I think that these things can be fixed. Unfortunately, things like Lisp are becoming even more niche, even less popular, and programming languages that treat mathematical reasoning on an equal footing to implementing an algorithm are rare.

In most cases, the best one can hope for is a programming language with a strong type system, which would allow one, but not necessarily require one to express their reasoning.

Unfortunately, languages like Rust have heavily restrictive rules on what can and more importantly cannot be reflected in its type system, namely traits.

Fortunately, this opens a niche for a language that is a bit of a mix between an iPython notebook, a mathematical environment like Maple or Mathematica, and a good-old-fashioned procedural programming model. Usually the need for reasoning about a program is phrased in terms of correctness, or reliability, but if anything I would have made a good point on why this is important for performance as well.

Footnotes

³https://mathworld.wolfram.com/WilsonsTheorem.html

²Apple for example does not allow you to have any form of compiler or general purpose execution environment on its mobile devices.

¹Composite numbers have a tendency of being divisible by at least two numbers. If the number is greater than the square root, and no other divisor smaller than it has been discovered, the smallest candidate for the pair is the number itself; which if greater than the square root, would surely produce a number greater than the prime candidate.