Thursday, January 22, 2015

Trying out the Bound library

[UPDATE, 2015-02-23: The post has been slightly updated based on comments by Edward Kmett.]

In trying to understand the Bound library, and in particular, its limitations (if any) in terms of expressiveness and efficiency. I will here try to implement a small subset of the Feldspar EDSL using Bound.

DISCLAIMER: Statements in this text may very well reflect my misunderstandings rather than actual truths about Bound.

Note that I’m assuming basic familiarity with Bound. If you’re not familiar with the library, see its documentation.

I would be happy to receive feedback and hints for how to improve the code.

The whole code can be found here: FeldsparBound.lhs

Implementing Feldspar using Bound

Here is the subset of Feldspar that we will work with:

data Data a
    = Var a
    | Int Int
    | Add (Data a) (Data a)
    | Ix (Data a) (Data a)
    | Par (Data a) (Scope () Data a)
  deriving (Eq,Show,Functor,Foldable,Traversable)

Ix represents array indexing, and Par represents array construction, corresponding to the following function on lists:

parList :: Int -> (Int -> a) -> [a]
parList l f = [f i | i <- [0 .. l-1]]

For simplicity, I chose to put the Scope directly on Par instead of going through a lambda as we do in Feldspar.

The Monad instance defines substitution in a standard way:

instance Monad Data
  where
    return = Var
    Var a   >>= k = k a
    Int a   >>= k = Int a
    Add a b >>= k = Add (a >>= k) (b >>= k)
    Ix a i  >>= k = Ix (a >>= k) (i >>= k)
    Par l f >>= k = Par (l >>= k) (f >>>= k)

instance Applicative Data where pure = return; (<*>) = ap

User interface

Here I choose to represent free variables as strings:

type Name = String

type D = Data Name

The user interface is then as follows:

v :: Name -> D
v = Var

instance Num D
  where
    fromInteger = Int . fromInteger
    (+) = Add

(!) :: D -> D -> D
(!) = Ix

par :: Name -> D -> D -> D
par i l f = Par l $ abstract1 i f

Here are two simple example expressions:

ex1 = par "i" 10 $ v "i" + 0
  -- Corresponds to `parList 10 $ \i -> i + 0`

ex2 = par "i" 10 $ v "arr" ! (v "i" + 0)
  -- Corresponds to `parList 10 $ \i -> arr !! (i + 0)`

In the definition of par, we use abstract1 is to bind the free variable (convert to deBruijn). It has to traverse the whole body, so the complexity is quadratic in the number of nested binders.

Transformations

Now to the interesting question of how one goes about to transform terms in the above representation.

Ideally, we want to do this without looking at deBruijn indices at all. Two questions arise:

  1. How to apply a transformation under a Scope?
  2. How to pattern match on binders and variables and rewrite them?

For the first question, my initial naive attempt was to use a combination of instantiate1 and abstract1 to lift a transformation on D to a transformation on Scope:

transScopeBAD :: (D -> D) -> Scope () Data Name -> Scope () Data Name
transScopeBAD f = abstract1 "x" . f . instantiate1 (Var "x")

The use of instantiate1 changes all occurrences of the bound variable to the free variable Var "x" and abstract1 re-binds that variable after f has been applied to the body. Unfortunately, this doesn’t work. If there are nested scopes, their corresponding variables will all be renamed to "x", so the innermost abstract1 is going to capture all of them. Also, since abstract1 has to traverse the whole body, nested uses of transScope are going to be inefficient.

A better approach is to work more directly with the representation and use fromScope and toScope (here with the type specialized to Data):

transScope :: (Data (Var b a) -> Data (Var b a)) -> Scope b Data a -> Scope b Data a
transScope f = toScope . f . fromScope

(This version of transScope was suggested by Edward. In my initial attempt, I used hoistScope instead.)

Note how the type clearly expresses the fact that the term passed to the transformation function f has one extra variable (of type b) in addition to the free variables (of type a). This information was missing in transScopeBAD.

Using transScope, we can now write the following simplifier:

simplify :: Data a -> Data a
simplify e = simp $ case e of
    Add a b -> Add (simplify a) (simplify b)
    Ix a i  -> Ix (simplify a) (simplify i)
    Par l f -> Par (simplify l) $ transScope simplify f
    _ -> e
  where
    simp (Add (Int 0) b) = b
    simp (Add a (Int 0)) = a
    -- The catch-all case `simp e = e` will come later

Note that it is important for simplify to be polymorphic in the free variables since we’re using it at a different type when passing it to transScope. This requirement was not obvious to me at first. (In retrospect, it is obvious that the polymorpic type is preferred, since I don’t want the simplifier to do anything with the free variables.)

To make the simplifier more interesting, we add a case corresponding to the following rule for parList (defined above):

parList l f !! i   ==>   f i

This turns out to be easy:

    simp (Ix (Par l f) i) = instantiate1 i f

Another similar rule that we may want to express is

parList l (\i -> arr !! i)   ==>   arr   (where i `notElem` freeVars arr)

(In both of these rules, we ignore out-of-bounds indexing for simplicity.)

To express this rule, we need to pattern match on binders and variables. My initial attempt was the following:

--     simp (Par l f)
--       | Ix arr i <- instantiate1 (v "i") f
--       , i == v "i"
--       , "i" `Foldable.notElem` arr
--       = arr

This looks quite nice from a readability perspective, but it doesn’t work because transScope does not permit free variables as strings. (And, in general, using instantiate1 in this way has the same problem as transScopeBAD above, that the variable "i" might easily clash with other instantiated variables.)

Instead, it seems that one has to work more directly with the deBruijn representation:

    simp (Par l f)
      | Ix arr i   <- fromScope f
      , Var (B ()) <- i
      , Just arr'  <- lower arr
      = arr'

    -- Catch-all case to complete the definition of `simp`:
    simp e = e

The lower function tries to lower the scope of an expression, and succeeds if the expression does not refer to the closest bound variable:

lower :: Data (Var b a) -> Maybe (Data a)
lower = traverse $ unvar (const Nothing) Just

Note the use of fromScope rather than unscope above. This is to change the generalized deBruijn representation into one the standard representation which lower expects.

I would have hoped to avoid matching explicitly on the deBruijn representation of i (namely Var (B ())). But I guess this is just unavoidable?

The check that i does not occur in arr is replaced by the pattern match on the result of lower. One nice thing here is that the type system does not allow us to return arr directly, so we cannot accidentally forget to check whether i occurs in arr (which I tend to forget everytime!).

Examples

*Main> ex1
Par (Int 10) (Scope (Add (Var (B ())) (Int 0)))

*Main> simplify ex1
Par (Int 10) (Scope (Var (B ())))

*Main> ex2
Par (Int 10) (Scope (Ix (Var (F (Var "arr"))) (Add (Var (B ())) (Int 0))))

*Main> simplify ex2
Var "arr"

Discussion

At first I was tempted to use instantiate1 and abstract1 everywhere and to represent free variables as strings. Although this leads to code that looks readable, it does not work in general:

  • Representing free variables as strings can easily lead to capturing bugs
  • It leads to inefficient code:
    • abstract1 has to traverse the whole expression, so it should be used sparingly
    • instantiate1 may not have to traverse the whole expression due to the generalized deBruijn representation, but I’m not sure what this means in practice

From a capturing point of view, it may be considered OK to use abstract1 and variables as strings the way I did in the user interface above. The user will just have to make sure to use distinct variable names. However, due to the nested uses of abstract1, it is still very inefficient to construct terms in this way. A better way is probably to construct terms in a different representation and then convert them all to deBruijn in a single pass.

Once I realized that I couldn’t use instantiate1 and abstract1 for transformations under a Scope, I was forced to use fromScope/toScope which expose a bit the representation of variables. However, this turned out to be a very good thing that just made the code safer.

The resulting code does not use abstract1, so it avoids that source of inefficiency. There is one use of instantiate1, but that is a substitution coming from the rewrite rule – it has nothing to do with the Bound library (and it could be avoided by rewriting to a let binding instead of performing the substitution).

In the end, there appears to be only one problematic inefficiency left: the lower function. In order to know whether an expression refers to the closest bound variable, lower naturally has to traverse the whole expression. Of course, this is true regardless of how variables are represented. But in a standard named representation of variables, it is e.g. possible to return a set of free variables from simplify, so that membership can be tested without traversing the expression. I don’t see how that could be done here.

In conclusion, I see many advantages of using Bound, especially when it comes to safety and not worrying about capturing. However, there are some things that make me hesitate on adopting it for something like Feldspar:

  • I wasn’t able to completely hide the deBruijn indices in the simplifier. Can it be done? Is this going to be a big problem for people trying to express rewrite rules?
  • The lower function is a potential source of inefficiency in a simplifier. Can it be fixed? Will there be other inefficiencies as the rules get more complex?

If someone has the answers to those questions, I would be happy to know!