[UPDATE, 2015-02-23: The post has been slightly updated based on comments by Edward Kmett.]
In trying to understand the Bound library, and in particular, its limitations (if any) in terms of expressiveness and efficiency. I will here try to implement a small subset of the Feldspar EDSL using Bound.
DISCLAIMER: Statements in this text may very well reflect my misunderstandings rather than actual truths about Bound.
Note that I’m assuming basic familiarity with Bound. If you’re not familiar with the library, see its documentation.
I would be happy to receive feedback and hints for how to improve the code.
The whole code can be found here: FeldsparBound.lhs
Implementing Feldspar using Bound
Here is the subset of Feldspar that we will work with:
data Data a
= Var a
| Int Int
| Add (Data a) (Data a)
| Ix (Data a) (Data a)
| Par (Data a) (Scope () Data a)
deriving (Eq,Show,Functor,Foldable,Traversable)
Ix
represents array indexing, and Par
represents array construction, corresponding to the following function on lists:
parList :: Int -> (Int -> a) -> [a]
parList l f = [f i | i <- [0 .. l-1]]
For simplicity, I chose to put the Scope
directly on Par
instead of going through a lambda as we do in Feldspar.
The Monad
instance defines substitution in a standard way:
instance Monad Data
where
return = Var
Var a >>= k = k a
Int a >>= k = Int a
Add a b >>= k = Add (a >>= k) (b >>= k)
Ix a i >>= k = Ix (a >>= k) (i >>= k)
Par l f >>= k = Par (l >>= k) (f >>>= k)
instance Applicative Data where pure = return; (<*>) = ap
User interface
Here I choose to represent free variables as strings:
type Name = String
type D = Data Name
The user interface is then as follows:
v :: Name -> D
v = Var
instance Num D
where
fromInteger = Int . fromInteger
(+) = Add
(!) :: D -> D -> D
(!) = Ix
par :: Name -> D -> D -> D
par i l f = Par l $ abstract1 i f
Here are two simple example expressions:
ex1 = par "i" 10 $ v "i" + 0
-- Corresponds to `parList 10 $ \i -> i + 0`
ex2 = par "i" 10 $ v "arr" ! (v "i" + 0)
-- Corresponds to `parList 10 $ \i -> arr !! (i + 0)`
In the definition of par
, we use abstract1
is to bind the free variable (convert to deBruijn). It has to traverse the whole body, so the complexity is quadratic in the number of nested binders.
Transformations
Now to the interesting question of how one goes about to transform terms in the above representation.
Ideally, we want to do this without looking at deBruijn indices at all. Two questions arise:
- How to apply a transformation under a
Scope
? - How to pattern match on binders and variables and rewrite them?
For the first question, my initial naive attempt was to use a combination of instantiate1
and abstract1
to lift a transformation on D
to a transformation on Scope
:
transScopeBAD :: (D -> D) -> Scope () Data Name -> Scope () Data Name
transScopeBAD f = abstract1 "x" . f . instantiate1 (Var "x")
The use of instantiate1
changes all occurrences of the bound variable to the free variable Var "x"
and abstract1
re-binds that variable after f
has been applied to the body. Unfortunately, this doesn’t work. If there are nested scopes, their corresponding variables will all be renamed to "x"
, so the innermost abstract1
is going to capture all of them. Also, since abstract1
has to traverse the whole body, nested uses of transScope
are going to be inefficient.
A better approach is to work more directly with the representation and use fromScope
and toScope
(here with the type specialized to Data
):
transScope :: (Data (Var b a) -> Data (Var b a)) -> Scope b Data a -> Scope b Data a
transScope f = toScope . f . fromScope
(This version of transScope
was suggested by Edward. In my initial attempt, I used hoistScope
instead.)
Note how the type clearly expresses the fact that the term passed to the transformation function f
has one extra variable (of type b
) in addition to the free variables (of type a
). This information was missing in transScopeBAD
.
Using transScope
, we can now write the following simplifier:
simplify :: Data a -> Data a
simplify e = simp $ case e of
Add a b -> Add (simplify a) (simplify b)
Ix a i -> Ix (simplify a) (simplify i)
Par l f -> Par (simplify l) $ transScope simplify f
_ -> e
where
simp (Add (Int 0) b) = b
simp (Add a (Int 0)) = a
-- The catch-all case `simp e = e` will come later
Note that it is important for simplify
to be polymorphic in the free variables since we’re using it at a different type when passing it to transScope
. This requirement was not obvious to me at first. (In retrospect, it is obvious that the polymorpic type is preferred, since I don’t want the simplifier to do anything with the free variables.)
To make the simplifier more interesting, we add a case corresponding to the following rule for parList
(defined above):
parList l f !! i ==> f i
This turns out to be easy:
simp (Ix (Par l f) i) = instantiate1 i f
Another similar rule that we may want to express is
parList l (\i -> arr !! i) ==> arr (where i `notElem` freeVars arr)
(In both of these rules, we ignore out-of-bounds indexing for simplicity.)
To express this rule, we need to pattern match on binders and variables. My initial attempt was the following:
-- simp (Par l f)
-- | Ix arr i <- instantiate1 (v "i") f
-- , i == v "i"
-- , "i" `Foldable.notElem` arr
-- = arr
This looks quite nice from a readability perspective, but it doesn’t work because transScope
does not permit free variables as strings. (And, in general, using instantiate1
in this way has the same problem as transScopeBAD
above, that the variable "i"
might easily clash with other instantiated variables.)
Instead, it seems that one has to work more directly with the deBruijn representation:
simp (Par l f)
| Ix arr i <- fromScope f
, Var (B ()) <- i
, Just arr' <- lower arr
= arr'
-- Catch-all case to complete the definition of `simp`:
simp e = e
The lower
function tries to lower the scope of an expression, and succeeds if the expression does not refer to the closest bound variable:
lower :: Data (Var b a) -> Maybe (Data a)
lower = traverse $ unvar (const Nothing) Just
Note the use of fromScope
rather than unscope
above. This is to change the generalized deBruijn representation into one the standard representation which lower
expects.
I would have hoped to avoid matching explicitly on the deBruijn representation of i
(namely Var (B ())
). But I guess this is just unavoidable?
The check that i
does not occur in arr
is replaced by the pattern match on the result of lower
. One nice thing here is that the type system does not allow us to return arr
directly, so we cannot accidentally forget to check whether i
occurs in arr
(which I tend to forget everytime!).
Examples
*Main> ex1
Par (Int 10) (Scope (Add (Var (B ())) (Int 0)))
*Main> simplify ex1
Par (Int 10) (Scope (Var (B ())))
*Main> ex2
Par (Int 10) (Scope (Ix (Var (F (Var "arr"))) (Add (Var (B ())) (Int 0))))
*Main> simplify ex2
Var "arr"
Discussion
At first I was tempted to use instantiate1
and abstract1
everywhere and to represent free variables as strings. Although this leads to code that looks readable, it does not work in general:
- Representing free variables as strings can easily lead to capturing bugs
- It leads to inefficient code:
abstract1
has to traverse the whole expression, so it should be used sparinglyinstantiate1
may not have to traverse the whole expression due to the generalized deBruijn representation, but I’m not sure what this means in practice
From a capturing point of view, it may be considered OK to use abstract1
and variables as strings the way I did in the user interface above. The user will just have to make sure to use distinct variable names. However, due to the nested uses of abstract1
, it is still very inefficient to construct terms in this way. A better way is probably to construct terms in a different representation and then convert them all to deBruijn in a single pass.
Once I realized that I couldn’t use instantiate1
and abstract1
for transformations under a Scope
, I was forced to use fromScope
/toScope
which expose a bit the representation of variables. However, this turned out to be a very good thing that just made the code safer.
The resulting code does not use abstract1
, so it avoids that source of inefficiency. There is one use of instantiate1
, but that is a substitution coming from the rewrite rule – it has nothing to do with the Bound library (and it could be avoided by rewriting to a let binding instead of performing the substitution).
In the end, there appears to be only one problematic inefficiency left: the lower
function. In order to know whether an expression refers to the closest bound variable, lower
naturally has to traverse the whole expression. Of course, this is true regardless of how variables are represented. But in a standard named representation of variables, it is e.g. possible to return a set of free variables from simplify
, so that membership can be tested without traversing the expression. I don’t see how that could be done here.
In conclusion, I see many advantages of using Bound, especially when it comes to safety and not worrying about capturing. However, there are some things that make me hesitate on adopting it for something like Feldspar:
- I wasn’t able to completely hide the deBruijn indices in the simplifier. Can it be done? Is this going to be a big problem for people trying to express rewrite rules?
- The
lower
function is a potential source of inefficiency in a simplifier. Can it be fixed? Will there be other inefficiencies as the rules get more complex?
If someone has the answers to those questions, I would be happy to know!