Tuesday, April 17, 2012

DSLs Are Generic Solutions

Last week, I wrote a grant proposal to develop generic methods for implementation of embedded domain-specific languages (DSLs). I tried to argue that DSLs are becoming increasingly important as software development is getting more and more complicated. The goal of the project is to make it dead easy to implement high-quality DSLs embedded in Haskell. As I was trying to promote this idea, I realized that it might not at all be obvious to the reviewer that an ever-increasing number of DSLs is a good thing. Is there not a risk of just causing a confusing jungle of different language flavors?

This is a valid concern; if the primary purpose of introducing a DSL is provide domain-specific syntax, then having multitude of different DSLs is indeed going to be confusing. However, the most important part of DSLs is not their syntax (even if syntax is important). My view of a DSL is this:1

A DSL is a way for an experienced programmer to make a generic solution available to other (perhaps less experienced) programmers in such a way that only problem-specific information needs to be supplied.

The Unix DSL jungle

Consider the following example: A Unix shell has access to lots of small programs aimed to solve specific classes of problems. For example, the program ls is a solution to the problem of listing the contents of a directory. Moreover, it is a generic solution, since it can list the contents in many different ways. Specific solutions are obtained by providing options at the command line. The command ls -a will include hidden files in the listing, and ls -a -R will additionally list the contents of sub-directories recursively. The different options to ls can be seen as a DSL to specialize ls to specific use-cases. ls is a very successful DSL – compare to programming the same functionality using lower-level constructs, e.g. looping over the directory contents testing whether or not a file is hidden, etc.2 Still, the important aspect of ls is not its syntax (though, again, that is also important), but rather the program ls itself (the DSL implementation), which is a generic solution that can be reused in many different situations.

If we consider all commands in a Unix shell environment, it may appear as a jungle of different DSLs. However, this is not at all a problem. Every command is a rather limited sub-DSL which can usually be summarized in a relatively short manual page. Moreover, the interaction between the sub-DSLs is done using a common data format, text strings, which makes it possible to write shell programs that seamlessly combine different sub-DSLs.

What would be confusing is if there were several different versions of ls, all with slightly different syntax. Luckily that is not the case.

The Hackage library jungle

We can also draw a parallel to programming libraries. A library can be seen as a DSL. (Especially when considering embedded DSLs, the border to libraries is not very clear.) The Haskell package database Hackage has since the launch in 2007(?) received nearly 4000 packages (2012-04-16). There are quite some occurrences of confusingly similar libraries, which may give a feeling of Hackage as a jungle, but in all Hackage has been a huge success. The risk of confusion does not mean that we should stop developing libraries, merely that some extra care is needed to specify the purpose of and relation between different libraries.

Reusability is key

I have claimed that a DSL is a way for an experienced programmer to make a generic solution available to other (perhaps less experienced) programmers in such a way that only problem-specific information needs to be supplied. Clearly, syntax is an important component to this definition (“only problem-specific information”), but I think the main focus should be on “making a generic solution available”. It is all about reusing programming effort! We might just as well say that a DSL is a way to have a one-time implementation effort potentially benefit a large number of DSL users for a long time.

As long as there are generalizable solutions out there, there will be a need for new DSLs!


  1. The general definition is, of course, broader.

  2. OK, I know at least one language where looping and testing can be expressed very succinctly, but you get the point.

Difference between LTL and CTL

(Note that this article assume some intuitive understanding of CTL.)

I’m currently teaching a course on Hardware Description and Verification, in which we cover computation tree logic (CTL) model checking. In preparing for the course, I wanted to fully understand the differences between CTL and LTL. The common informal explanation that LTL has linear time while CTL has branching time is useless without further explanation. I had a certain understanding of CTL, and my initial assumption was that LTL is what you get by excluding all E formulas from CTL. This assumption turned out to be wrong.

My assumption would imply that LTL is a subset of CTL, but the Wikipedia page claims that neither logic is a subset of the other. It seems like I’m not alone in being confused about the difference between the two logics: This Stack Exchange page has the same question. The answer on that page talks about the LTL formula

FGp

which, apparently, is not the same as the CTL formula

AFAGp

(as I previously thought). After reading the Stack Exchange answer I started to get a sense of the difference, but I still wasn’t able to formulate it easily. So I had no more excuses not to look into the actual formal semantics of LTL and CTL. This survey by Emerson served as a good reference for that.

Common superset: CTL*

The difference between LTL and CTL is easiest to understand by considering CTL* which has both LTL and CTL as subsets. CTL* has two syntactic categories giving rise to two different kinds of formulas: state formulas and path formulas. A state formula is something that holds for a given state in a system (modeled as a Kripke structure), while a path formula holds for a given path. A path is an infintie sequence of states such that each consecutive pair of states is in the transition relation; i.e. a trace of an infinite run of the system. Note that, even though a state formula is a property of a single state, it can still talk about future states by using the path quantifiers A and E.

Now, the important thing to note is that an LTL formula is a (restricted) CTL* path formula while a CTL formula is a (restricted) CTL* state formula. Path formulas are not very interesting on their own, since one is usually interested in expressing properties of all or some paths from the initial state. Hence, the real meaning of an LTL formula is obtained by adding an implicit A quantifier to it turning it into a state formula. Thus the meaning of the LTL formula f is the meaning of the CTL* formula Af.

The example

With this understanding, we can try to reformulate the above examples in English:

The LTL formula FGp can be read as

“Every path (implicit A) will eventually (F) reach a point after which p is always true (Gp).”

Or perhaps more clearly:

“Every path has a finite prefix after which p is always true.”

The CTL formula AFAGp can be read as

“Every path (A) will eventually (F) reach a state from which every path (A) has the property that p is always true (Gp).”

These formulas are subtly different, and they can be distinguished by the following system of three states:

S1S1orS2(atomic: p)

S2S3(atomic: ¬p)

S3S3(atomic: p)

Every run of this system will either (1) stay in S1 forever, or (2) wait some finite time and then make a transition to S2 and S3. In both cases, it holds that the trace has a finite prefix after which p is always true. Hence the LTL formula holds.

However, the CTL formula does not hold, because there is a path that never reaches a state at which AGp holds. This is the path in which the system stays in S1 forever. Even if it stays in S1 forever, it always has the possibility to escape to S2.

Summary

As demonstrated by the above example, the difference between LTL and CTL (except for the absense of E in LTL) is that an LTL formula can be verified by considering each path in isolation. If each individual path obeys the path formula, the LTL formula is true. To verify a CTL formula, one may also have to consider the alternative possibilities for each state in a path.