Category Theory proofs in Idris
— type theory, functional programming, Idris — 11 min read
Idris is a programming language with dependent types. It is similar to Agda, but hews more closely to Haskell. The goal of Idris is to bring dependent types to generalpurpose programming. It supports multiple compilation targets, including C and Javascript.
Dependent types provide an unprecedented level of type safety. A quick example is a typesafe printf implementation (video). They are also useful for theorem proving. According to the CurryHoward correspondence, mathematical propositions can be represented in a program as types. An implementation that satisfies a given type serves as a proof of the corresponding proposition. In other words, inhabited types represent true propositions.
The CurryHoward correspondence applies to every language with type checking. But the type systems in most languages are not expressive enough to build very interesting propositions. On the other hand, dependent types can express quantification (i.e., the mathematical concepts of universal quantification (∀) and existential quantification (∃)). This makes it possible to translate a lot of interesting math into machineverified code.
This post is written in literate Idris. The original markup can be compiled and typechecked. Code blocks that are prefixed with greaterthan symbols (>) in the markup are evaluated. Blocks that are marked off with three backticks are given for illustrative purposes and are not evaluated.
In Idris, partial functions are allowed by default. A totality requirement can be specified perfunction. This line enforces totality checking by default for functions in this module.
A function that is total is guaranteed to terminate and to return a welltyped output for every possible input.^{1} A function that does not terminate, or that throws a runtime error for some inputs, is said to be partial.
A partial function can introduce a logical contradiction, which would make proofs unreliable. So totality checking is useful for theorem proving.
Theorem proving
Consider the definition of the natural number type in the Idris standard library:
This defines a type, Nat
,
with two constructors for producing values:
a number may be zero (Z
),
or it may be one greater than another number (S Nat
).
A type can be indexed by another type. That describes a type produced by a type constructor that takes one or more values as parameters. Here is a constructor for indexed types from the Idris standard library:
This declares that LTE
is a constructor that takes two Nat
values as parameters,
and produces a concrete Type
.
The types that LTE
constructs also happen to be propositions which state that
one natural number is less than or equal to another.
It should be read,
"the natural number n is less than or equal to the natural number m".
Two value constructors are given.
They are written so that a value that satisfies a given LTE
type be written
if and only if the n
in that type is less than or equal to the m
.
In this way, a value that satisfies a type of the form LTE n m
is a proof that n
really is less than or equal to m
.
lteZero
is a singleton value  it is a constructor that takes no arguments.
But its type contains a variable; so it is polymorphic.
lteZero
can satisfy any type of the form, LTE Z n
.
lteZero
is effectively an axiom, stating a fundamental property of natural numbers.
Given the definition of LTE
it is possible to write a proposition,
such as,
"zero is less than or equal to every natural number".
The proposition is written as a function that takes a number as input.
The value that is given is assigned to the variable n
,
which is used to specify the return type.
Thus the return type of nonNegative
depends on the input value.
Wherever you see (a : A)
it can be read as,
"Some value of type A
will be given here,
and that value will be assigned to the variable a
."
To write an implementation for nonNegative
,
it is necessary to produce a value of the appropriate LTE
type
without any information about what input might be given 
other than the fact that it will be a natural number.
Totality checking is enabled,
so any implementation must be applicable to every possible input.
Thus a type of the form,
(x : A) > P x
describes universal quantification over the type A
.
nonNegative
happens to be a restatement of the axiom, lteZero
.
So an implementation / proof is trivial:
On the other hand, lteSucc
maps a given proof to a proof of a related proposition.
It is used in proofsbyinduction.
For example, a proof that every number is less than or equal to itself:
The proof that zero is equal to itself is given by the axiom. For every other number, the proof is given as an inductive step using a proof for the nextsmallest number.
Because the type of lteSucc
is that of a function,
it can be read as a proposition involving logical implication:
"n <= m implies that n + 1 <= m + 1."
In general, types of the form
P x > Q y
can be read as logical implication.
We have seen that the input value to a function can be labelled and referenced in the output type. That is a special property of functions. For example, it is not permissible to label the value in the first position of a tuple type to reference it in the second position. But there is a special construction, the dependent pair, which does allow this. Dependent pairs are used to represent existential quantification.
A dependent pair type of the form
(x : A ** P x)
is read as existential quantification over the type A
.
A proof of a proposition with existential quantification can be given as a pair of an arbitrary value and a proof that the proposition holds for that value. For example, here is a proof of the Archimedean Property of natural numbers, "For every natural number, n, there exists a natural number, m, where m > n":
The quantified proposition uses (S n)
instead of just n
to indicate that m
must be strictly greater than n

greaterthanorequalto is not sufficient.
The proof supplies S n
as a witness 
a specific value that is used to prove that the quantified proposition holds.
Remember that S n
is just another way of writing n + 1
.
The second component of the dependent pair value must be a proof that the witness
is greater than or equal to S n
 as is required by the type.
Since S n
and S n
are equal, lteReflexive
suffices.
Definition of Category
I have been studying Category Theory; so I decided to use that as a topic for exercises when learning about Idris. If you are wondering what Category Theory is all about, take a look at Why Category Theory Matters.
There is a much more complete description of Category Theory concepts written in Agda. The definition below was an exercise for me in learning about Category Theory concepts myself.
A category is a set of objects combined with a set of arrows that encode relations between objects. When talking about all possible categories, the concepts of "object" and "arrow" are very abstract. They could be pretty much anything. Descriptions of specific categories make specific statements about what objects are and what arrows are.
Let's implement a type class to capture this definition.
Arrows are indexed by objects.
That is, the type of an arrow carries its domain
(the object that an arrow originates from)
and its codomain
(the object that an arrow points to).
Note that obj
is given as an unqualified variable.
In some categories objects will be types  as in the Set category.
In others they will be plain values.
The methods of this type class will define the category laws.
For starters, there must be an id arrow for every object.
This line specifies that an instance of this type class must provide
a cId
implementation with the given type:
As was shown above,
a function type serves as a proposition with universal quantification.
So the type of cId
states that every object must be both the domain
and codomain of at least one arrow.
The implementation will also provide a means to identify that arrow.
Arrows must be composable.
If one arrow points from objects a
to b
,
and another arrow points from 'b' to 'c',
then it must be possible to combine them
to produce an arrow from 'a' to 'c'.
Arguments in curly braces are implicit parameters. In most cases the compiler will infer those values. So they are generally not given as explicit arguments when invoking the function. However, it is possible to provide implicit parameters explicitly when needed.
cId
and cComp
are the only required functions that actually produce arrows.
But it is necessary to provide more specific rules about how they should behave.
These propositions state that id arrows must not only have the same
object as domain and codomain 
they must also be identities under composition.
Implementations of cIdLeft
and cIdRight
will never be used at runtime 
they are just proofs that cId
and cComp
obey the category laws.
Here (=)
is a type constructor, very much like LTE
.
Its definition looks like this:
The parameters to (=)
can be any expressions 
including plain values or types.
refl
is another axiom, stating that equality is reflexive.
Basically, to prove that two expressions are equivalent
it is necessary to demonstrate to the type checker that
they have the same normal form.
This implies that there might be multiple arrows pointing between the same two objects; and that those arrows are not necessarily equivalent. When talking about all possible categories, it is not possible to say what it is that makes arrows different or the same. Rather, any specific category must have its own definition of equality of arrows. Some categories will have many arrows between each pair of objects; some will have at most one.
One more proof is required to complete the Category
type class.
Arrow composition must be associative.
I have cheated slightly.
In the types for cIdLeft
, cIdRight
, and cCompAssociative
I left the implicit arguments to cComp
implicit.
But I was not able to get those definitions to typecheck.
I actually had to list out the implicit parameters
when applying cComp
in a type expression.
The working definitions are a bit more difficult to read:
It has been pointed out to me that the compiler is not able to determine
which implementation of cidLeft
, cIdRight
, or cCompAssociative
should be invoked,
unless the implicit parameters are listed.
If Category were implemented as a record type instead of as a type class
this would probably not be necessary.
A partial ordering category
With the definition of Category in place, it is possible to describe specific categories  and to prove that they obey the category laws.
The LTE
type constructor can be used to
describe a category where arrows are LTE
relations,
and objects are natural numbers.
In this category,
there are arrows that point from each number
to every other number that is larger,
and also back to the number itself.
An id arrow in this category is a proof that a number is less than or equal to itself.
This is the same thing that was proved by lteReflexive
;
so the implementation is the same.
Arrow composition is a proof that the lessthanorequalto relation is transitive.
For reference, here is the type for cComp
specialized for LTE
:
And the proof construction:
In the first equation, the second input is lteZero
;
which implies that a
is zero.
a
is also the domain in the LTE
result that we want to prove;
so the proof is trivial.
The second equation takes advantage of the fact that the lteSucc
constructors of input arrows can be recursively unwrapped,
until reaching the base case, where the second input is lteZero
.
There is no pattern for a case where the first input is lteZero
where the second is not.
It is not required, because that case is not allowed by the type of cComp

and the type checker is able to confirm that.
If it were necessary to make explicit that this case is not possible,
that could be stated with a third equation:
The keyword impossible
is one tool available for proofs of falsehood.
Now to prove the remaining category laws.
Demonstrating identity under composition of proofs is a little strange. What we need to show is that composing an identity arrow with any other arrow does not introduce new information.
In the base case of cIdLeft
, if the given arrow, f
, is lteZero
then its domain must be zero.
Therefore the identity arrow it is composed with must also be lteZero
.
Those are the same normal form,
so the proof invokes refl
 the axiom of reflexivity of equality.
The inductive step applies cong
,
which is a proof of congruence of equal expressions.
It is defined in the standard library:
In this case the f
that cong
infers as its implicit parameter is lteSucc
.
cong
is an example of a proof constructor:
it takes a proof as input and returns a proof of a related proposition.
Functions like cong
are the building blocks of multistep proofs.
You may notice that cong
takes refl
as an argument and returns it.
cong
has equality proofs as its input and output.
By definition, the only possible value of an equality type is refl
.
The base case of cIdRight
is not as trivial as the base case of cIdLeft
.
the codomain of lteZero
is not necessarily zero;
so the identity arrow involved could be some lteSucc
value.
However the compiler is able to do some normalization automatically.
That means that it is not necessary to spell out every step.
The proof of associativity follows a similar pattern.
A monoid as a category
The natural numbers form a monoid under addition. In particular:
 Two numbers can be combined by addition to produce a third number.
 There is an additive identity (0).
 Addition is associative.
The monoid also forms a category with just one object,
(which will be arbitrarily represented with ()
)
where the arrows are integers,
and arrows are composed by addition.
Since there are an unbounded number of natural numbers,
in this category
there are an unbounded number of arrows pointing from ()
back to ()
.
This category is made a bit complicated by the requirement that
arrows are indexed by domain and codomain.
Those indexes will not be meaningful in a category with just one object.
But for the sake of generality,
a trivial higherkinded wrapper around Nat
is needed.
In Idris, as in Haskell,
()
is a type with exactly one value,
which is also called ()
.
To make it clear that a NatArrow
is really just a Nat
,
we can prove that the two types are isomorphic.
An isomorphism is a bidirectional mapping that preserves information in both directions.
to
and from
specify the mapping;
toFrom
and fromTo
prove that information is preserved.
Now the definition of the Nat category:
This implementation uses (+)
for arrow composition.
It reuses proofs of identity and associativity for (+)
.
It is only necessary to apply cong
to the predefined proofs
to show that applying the getNat
constructor to both sides of an equality
does not change anything.
For another look at monoids in Idris, see Verifying the monoid laws using Idris (video).
Category of sets
In the Set category, objects are sets and arrows are functions.
The implementation here will use Type
as the type of objects.
In Idris, the type of every small type is Type
.
For example, the type of Nat
is Type
.
The type of String
is also Type
.
The type of Type
is Type 1

meaning that Type
is not a small type itself.
The definition of Category
requires proofs of equivalence for arrows.
But there is no predefined comparison operator for functions in Idris.
All that you can really do with a function is to give it some inputs,
and check what its outputs are.
To produce a viable category, it is necessary to introduce the axiom of function extensionality:
If two functions produce the same output for all possible inputs, then they are equivalent.
funext
takes as input two functions
and a proof that those functions produce the same output for all inputs.
Note the parentheses in that type:
As function types serve as universal quantifiers,
higherorder functions provide a means to nest quantification
inside of larger propositions.
I am told that it is not possible to prove funext
in Idris 
and that that limitation is not unique to Idris.
Therefore, function extensionality must be given as an axiom:
believe_me
is a "proof" to use sparingly.
Implementations for an identity function and for composition of functions
are provided in the standard library.
In fact, Idris includes a Control.Category
module
with a predefined Category
type class.
But that definition does not include all of the category laws.
What remains to be defined are proofs of identity and associativity.
Proving that id (f x)
and f (id x)
reduce to f x
is sufficiently easy that the compiler can do most of the work on its own.
To make the next step to f x = f x > f = f
is just a matter of applying funext
.
In order to prove associativity, it is helpful to have a helper proof that could be described as, "pointful composition".
The proof of associativity is a bit complicated. So it is broken into steps here, with proven intermediate propositions given for each step.
There are two standard library functions used here that have not been introduced yet. They are:
When a free, lowercase variable appears in a type expression,
Idris inserts an additional implicit parameter at the beginning of the expression.
It is up to the compiler to infer the types of the free variables.
After carrying out that expansion, the type of trans
looks like this:
There may be more steps listed in compAssociative
than are really required.
Hopefully including them helps to clarify the proof.
In Haskell, (>)
is an ordinary type constructor that could be used
as the arrow type in a type class like Category
.
It seems that is not the case in Idris.
To work around that, the standard library includes a type, Morphism
,
that is isomorphic to the function type.
The proofs above provide everything that is required to prove the validity of a category of Types and functions. But the definition here uses Morphism for arrows; so a little extra translating is necessary.
The operator (~>)
is an infix alias for Morphism
.
As with Nat
and NatArrow
,
the correspondence between (>)
and Morphism
is made official
with a proof of isomorphism.
It is my hope that I can use these definitions to work out exercises as I continue to explore Category theory.
 The Halting Problem states that there are programs that cannot be proven to terminate. That does not mean that it is impossible to prove that any program terminates. Idris and other languages with totality checking put some restrictions on the forms that functions are allowed to take so that totality checking is possible.↩