Friday, October 06, 2006

A categorization of DSLs

At this year's JAOO conference, I was the track host for the track called "DSLs and Beyond". It was planned that Martin Fowler would do the intro session, but he had problems with his back and he couldn't make it. So I had to prepare an alternative intro session. I decided I'd like to talk about categorizations of DSLs. So this blog post is basically what I came up with.

Domain Selection: If you talk about a Domain Specific Language, the first thing you need to do is to define the domain for which you want to build your DSL. In the context of software development it is useful to distinguish (at least) two kinds of domains: Technical Domains adress key technical issues related to software development such as Distribution, Failover and Load-Balancing, Persistence and Transactions, GUI Design or Concurreny and Realtime. Fuctional (or "Business") domains represent the business/professional issues; examples include Banking, Human resource management, Insurance, Engine Controllers or Astronomical Telescope Control. Some times these two kinds of domain are also called Horizontal and Vertical domains. You also have to decide whether you DSL should be rathergeneral to the domain you've selected or specific to your use cases in the domain; in other words, selecting a domain always implies tradeoffs between more general applicability of the DSL and more specificity (and thus, better support for your specific needs).

Expressive Power: The next issue you have to decide upon is whether your DSL should allow creative construction or use a metaphor that is more configuration like. Creative Construction means that the DSL defines a number of "words" from which you can creatively construct "sentences", the models. On the other hand, configuration languages typically allow you to select and unselect features that your system might or might not have - this selection can be as simple as three checkboxes or as sophisticated as a feature model. The two alternatives - creative construction and configuration - are in fact not alternatives, they are the two extremes on a continuum. Your DSL can be located anywhere on that continuum. In any case, you have to decide about the expressive power of the DSL and the freedom you want to give your users (a lot of freedom in case of creative construction, less freedom for configuration DSLs). Note that, of course, this issue is related to the domain. Certain domain lend themselves to configuration, others are better served by creative construction. In general, I think it is fair to say that the more mature a domain is, the more configuration-like the DSL to describe models in the domain will be.

Concrete Syntax:Yet another dimension along which DSLs can be structured is their concrete syntax. Graphical notations is what everybody talks about in MDA. However, textual notations are also very useful. You can even use tabular notations or some kind of custom-made forms (or GUI dialogs). Graphical notations are useful if you want to destribe structures or relationships between things (and also to impress prospective customers :-)). Textual notations are well suited for behavioral or algorithmic aspects (although state machines and petri nets kind of proof the opposite :-)). Typically, graphical, as well as textual modeling is a form of creative construction. For configuration DSLs, often tabular notations or specific GUI forms are well suited. Textual notations are also often useful. The amount of work you need to put into the construction of the editor is vastly different for these various alternatives. Building graphical editors is quite a bit of work (in spite of tools such as GMF!). Tools for building textual editors are starting to emerge (e.g. oAW's xText or INRIA's TCS and TGE). From my perspective, building textual editors (even with syntax hightlighting, constraint checks and code completion) will probably stay less complex than building graphical ones. The effort for builing GUI-based editors (incl. tree views and the like) is probably even lower ... and the editors are usually less pleasing and productive. There's one thing that has to be kept in mind: if you use textual editors, things like diff and merge is much easier. So the integration of the models (i.e. text files) with CVS etc. is much simpler compared to all kinds of other models (ever built a graphical diff for your GMF editor? Or tried to diff on the object structures directly?)

Execution: Another important aspect is the way the "program" is executed. There are two fundamental forms. One is transformation (also known as compilation), the other one is interpretation. Each of these approaches requires tradeoffs in respect to performance, code size, the ability to change stuff at runtime, reflection, etc. We all know about these tradeoffs from programming languages. There are also ways to combine the two: you can, for example, generate some XML document from a graphical representation of a state chart and then interpret this XML document. In general, it is easier to write a code generator (or any kind of transformer, for that matter) since you're dealing with more concrete things. There is another reason that can explain why transformation and code generation is the predominant form of executing DSLs (at least, external ones): if you have to run your system on a predefined infrastructure that requires certain artifacts to be present (such as J2EE, or many embedded OS) you have no choice but to generate these artifacts. Interpretations cannot help you there.

Internal vs. External: There's another aspect, and that is actually considered by many to be the most important distinguishing feature of DSLs: a DSL can either be separate from „normal“ programming languages or it can be embedded (in what is then often called the host language"). The premier language these days for embedding DSLs is Ruby (historically, Lisp was the champion there). To be able to embed a DSL into a language, the syntax of the host language must be flexible in order to "tweak" it a bit to suit the DSL. You'll always be limited by the kind of syntax the host language allows (tweaked or not) - that's the primary disadvantage of this approach: external DSLs are more flexible wrt. to syntax. However, internal DSLs have one big advantage: and that is symbolic integration (a term borrowed from Martin Fowler). this means that a symbol that is defined in the DSL part of the overall program can be used in the host language - and vice versa. Also, you can for example use the expression language of the host language in the DSL - very useful! Often internal DSLs are interpreted (as in Ruby, relying heavily on runtime meta programming). However, you can also use compile time metaprogramming to have a more static approach to embedded DSLs. Larie Tratt's converge language is an example here.

Tooling: So, these are the most important distinguishing characteristics of DSLs. There's another consideration. It's not a language characteristic, but it is important: tool support. Are there decent editors? Debuggers? What about refactoring support? You should also take these aspects into account when selecting a DSL approach.

permalink

Comments: Post a Comment

<< Home