Wednesday, July 16, 2008
 

Replying to Martin Fowler's recent posts

Martin Fowler, in the context of writing his book on DSLs, recently had two interesting blog posts (this one, and that one.

I agree with some parts (e.g. the MDA stuff), and strongly disagree with other parts.

Now, because I had to go flying for the last two days (yipieeh :-)) I did not immediately reply, and hence, Sven was faster :-). I agree with everything Sven says, and I don't repeat it here.

However, there's one interesting issue that pops up from time to time: repositories. From time to time, people (including me and Sven) point out that it is some kind of problem to store models in repositories. The arguments basically center around the fact that "we want to manage everything as files, in CVS or SVN".

Now, I agree to that, however:

What if you don't have any text at all, if you store everything in models and repositories, don't all the integration problems go away then? Of course you still need a diff/merge facility (based on the models' concrete syntax, not XML!) and a way to version things, but assuming your infrastructure provides that, wouldn't that be ok?

Also, what is a repository in the first place? A couple of text files in a file system clearly isn't a repository. A database storing an AST clearly is.

But what about a CVS/SVN? What about a bunch of text files arranged in a specific directory structure, with an index file rebuilt from time to time by an indexer?

My point is that a repository is not per se a bad thing, provided the following criteria: (1) you store all your relevant stuff in it (2) it provides versioning facilities (3) supports diff/merge on a meaningful abstraction level.

As it happens, CVS/SVN provides all of this for text files. Hence, it is the lowest common denominator "repository".

Any comments or thoughts? I would really like to get a somewhat better grasp on this repository thing, to find out what *really* is the problem.

Labels: ,

 
Comments:
Not having a good way to diff/merge models so that people can work on them independently is a big drawback to a Model Driven approach. It's absolutely critical from an agile standpoint that people can work independently and merge the changes back into a working whole. Otherwise, you loose a big advantage of a agile approach to development.

It's one of the biggest pain points I have with migrating to a model driven approach. If I don't have a text representation that is easy to merge/diff to see the changes, then a model driven approach is going be a low priority for me.

The other issue is on model exchange. It is still very difficult unless everybody is using the EXACT same tool and EXACT same version to get models exchanged. It's getting better but if you are working with many different people from a variety of organizations all with their own particular tool to do modeling, you can't exchange and merge in an easy way.

Text and code provide a happy medium for now, but a Model Driven approach needs to address these issues before it will really be of benefit to me.
 
"""
What if you don't have any text at all, if you store everything in models and repositories, don't all the integration problems go away then? Of course you still need a diff/merge facility (based on the models' concrete syntax, not XML!) and a way to version things, but assuming your infrastructure provides that, wouldn't that be ok?
"""

Besides that this would mean to reinvent a lot of proven tools, IMHO it would be on par with a text file based solution. Maybe a bit better because of the ability to diff/merge in the concrete syntax.
But it would also feel a bit heavy weight and locked up, wouldn't it? I like text files, because I can edit and read them everywhere: via web, on the command line, in eclipse and since last week even on my iphone :-).
 
Obviously you need a way to merge. Did I say anywhere that this is not the case?

Of course text-based models are easy to diff/merge, that's why I am currently a big proponent of textual DSLs.

Markus
 
@Sven:

Of course text files have all the advantages you talk about. And again, that's why I am currently a big proponent of textual DSLs.

But on the other hand, if you really want to mix different forms of concrete syntax (textual, graphical, etc.) than you do want to be able to diff/merge using those, right?

Also, text files tend to be a bit hard to scale. Often the minimum you need is some kind of "cross-indexer" via a database so you can efficiently cross-ref, search, etc. In a "real repository" that's easier.

Consider the Xtext case. What do you do once you have hundreds of Xtext resources? Each linking into each other. How do you efficiently load, unload, search, find-refs, etc? You need some kind of (in memory or persistent) index.

Doesn't that make this a repository?

Markus
 
Ok, there are two different aspects in what you define being a repository
a) the team stuff, merge diff, versioning, etc.

b) the data sink which knows about the language semantics and provides rich tools for viewing, querying and editing the information in it.

The first aspect is solved for all kinds of textual information (by CVS, SVN, etc.).

The latter is solved for specific textual languages. For instance the Java IDE in Eclipse uses the workspace as it's repository. JDT provides a rich API, much richer than every object oriented database I have worked with. I also think they proved the scalability of building this kind of repository on top of text files.

So what we have to do is to provide an IDE for DSLs using basically the same technics JDT uses and have everything being based on a common infrastructure (EMF, etc.).
There are a lot of tools in an IDE which could be generic and therefore work with every DSL without customization. Additional alternative syntaxes could be made on top of the textual syntax.

I think it's good to have one primary syntax showing all the information. And I think graphical syntaxes are overvalued.

BTW.: textual languages have the additional *very* important advantage of that migration to new versions of a language is much easier (search and replace).
 
@Sven:

1) I don't think we disagree. But it's still an interesting discussion.

2) I am not sure the comments facility is the best place for having the discussion :-)

so here are some replies:

> a) the team stuff, merge diff,
> versioning, etc.
> b) the data sink which knows
> about the language semantics and
> provides rich tools for viewing,
> querying and editing the
> information in it.

right. I am not sure that you actually need a repository for doing the "rich" stuff, but I have seen it work only in repo based environments.

> The first aspect is solved for
> all kinds of textual information
> (by CVS, SVN, etc.).

of course.

But the important caveat is this: it only works if the concrete syntax the user is used to working with is the text representation. The argument that you can diff/merge something as (XML) text that has been drawn via a graphical editor does not work. People don't want to diff/merge in a "serialization form" if they are used to working graphically.

> The latter is solved for
> specific textual languages. For
> instance the Java IDE in Eclipse
> uses the workspace as it's
> repository.

absolutely! That's a good example.

And connecting to me previous comments, it actually does build indexes and stuff (in the .metadata folder). So, having a bunch of text files, various indexes, and a good IDE (now to be called modeling tool :-)) is one way to go.

> JDT provides a rich API, much
> richer than every object
> oriented database I have worked
> with.

sure. Because it is not generic. Using modeling terminology, JDT is metamodel-specific (M2), whereas an OO database would typically work on the meta meta level (M3).

> I also think they proved the
> scalability of building this
> kind of repository on top of
> text files.

yes -- see above.

> So what we have to do is to
> provide an IDE for DSLs using
> basically the same technics JDT
> uses and have everything being
> based on a common infrastructure
> (EMF, etc.).

right, I agree. However, that's not available today (yet).


> There are a lot of tools in an
> IDE which could be generic and
> therefore work with every DSL
> without customization.

yes and no. Of course you can have frameworks, but you'll have to customize quite a bit in a language-specific way (of course, that's what IMP and TMF are aiming at, so nothing fundamentally new here. Maybe my point is this: in addition to being able easily create nice text editors, a DSL framework needs to take into account what's necessary to "scale" the models/resources. Current Xtext, eg. doesn't do that very well).

> Additional alternative syntaxes
> could be made on top of the
> textual syntax.

yes - but see my comment above. This only works if people actually work with it!

> I think it's good to have one
> primary syntax showing all the
> information. And I think
> graphical syntaxes are
> overvalued.

of course. But they are not generally bad or useless. And while I am in favour of visualization, I don't think we want to go without graphical syntax in the long run. And then we need graphical diff/merge (because we can solve the other aspect of the repository by the "IDE approach", right?)

> BTW.: textual languages have the
> additional *very* important
> advantage of that migration to
> new versions of a language is
> much easier (search and replace).

agree. I have been using this argument recently with a colleage (you know him, lives in the same city as I), and he thought that wasn't a good argument :-)
 
I disagree with David's comments because an XML Schema is a model and I don't think he's directing any of his comments at XML Schema. So that leaves me confused because there's nothing about the model driven approach, also known as the XML Schema driven approach in the XML realm of the multiverse, that precludes using all the established source code control tools.

It also seems to me that all models ought to have a human readable textual representation, even if they have other representations. I should point out that I consider XML to be a poor excuse for a human readable notation, although arguably it's adequate though far from ideal. As such, all models ought to be amenable to storage using standard tools and ought to be amenable to differencing and merging the same way. Not only ought things be this way, they generally are; again, take XML Schema as a case in point.

This doesn't mean that a pure textual representation is necessarily the ideal representation. Consider a diagram or a richly formatted document described with HTML and CSS as obvious examples. So it's not the case that a textual diff/merge will be an ideal way to represent the concepts being compared or merged. That's why we have cool stuff to do this at the model-level.

Also consider that even if a model is stored in a repository in a structured way, which would be ideal from the point of view of querying the deep structure and global relationships across the instances, if the model has a textual representation, as ought to be the case, diff/merge can still be supported against the virtual textual representation, if there isn't actually a much better way, such as comparing the actual visual representation of two graphs.

Ultimately the point is that models are plastic and are amenable to being molded to suit what is beauty in the eye of the beholder. Most people are already doing model driven development, but they just call their model something else and consider it to be too wonderful to be a mere model.
 
€Ed:

> It also seems to me that all models ought to have a human readable
> textual representation, even if they have other representations.
> I should point out that I consider XML to be a poor excuse for a
> human readable notation, although arguably it's adequate though far
> from ideal. As such, all models ought to be amenable to storage using
> standard tools and ought to be amenable to differencing and merging the
> same way. Not only ought things be this way, they generally are; again,
> take XML Schema as a case in point.

I agree that the storage format should be accessible on a lower level
(I would accept an SQL interface as similarly useful as XML). However,
my point is this: if people are used to work with a graphical notation
of something, then the XML format, even if it is technically human readable,
is NOT suitable for diff/merge, because the people who use the graphical
notation have no idea about the XML. It's like if you had to diff/merge
Word documents using Microsoft's internal file format. Nobody does that,
even if it was XML, and hence technically human readable.

Unless the primary, user-facing notation is textual, using a textual
"serialization format" is not useful for diff/merge.

> This doesn't mean that a pure textual representation is necessarily the
> ideal representation. Consider a diagram or a richly formatted document
> described with HTML and CSS as obvious examples. So it's not the case that
> a textual diff/merge will be an ideal way to represent the concepts being
> compared or merged. That's why we have cool stuff to do this at the model-level.

right. Except we don't have such "cool stuff" for GMF. And that's where it
is really needed, since nobody wants to work with EMF trees.

> Ultimately the point is that models are plastic and are amenable to being
> molded to suit what is beauty in the eye of the beholder.

yes. In theory. In practice you need the tools. And that's what I am
talking about. For graphical DSLs (specifically: GMF) those don't exist, AFAIK.

> Most people are already doing model driven development, but they just call
> their model something else and consider it to be too wonderful to be a mere model.

hehe :-)
 
markus said:
My point is that a repository is not per se a bad thing, provided the following criteria: (1) you store all your relevant stuff in it (2) it provides versioning facilities (3) supports diff/merge on a meaningful abstraction level.

I would like to add a few things you need in practice:
(4) optional locking. sometimes things can NOT be merged, so I need locking
(5) and open and documented format
(6) and open and documented API, it should be a common API, e.g like JMI is it for MOF
(7) an facility to export/import it to the next repository (like cvs2svn ;-)
(8) is should be scalable
(9) it should be partitionable
(10) rights managements etc.
(11) a log (who changed what, when, why)
...

(and a lot of more ;-)

cheers,
Bernhard.
 
My main issue with the whole MD approach is this: It stinks from all ends.

Some tools force me to use a graphical editor. Graphical editors mean lock in. I can't extend them. I can't write a small tool to fix any pain I might have.

Then we have the repository. As long as I can't store all my information (texts, docs, idea, Wiki, pictures, Word documents, links) in that repository and edit that information in the editors I need, it's going to be part of the problem.

Then we have brilliant people coming up with cool models for things that I never encountered in my whole career. When I try to model a simple problem which I have to solve, I routinely find that the MDA tool or language or whatever just can't do it. Duh.

If I can model what I need, the model will be more complex than anything I can do in code.

From my limited point of view, MDA is further away from pen&paper than text files and therefore, they solve no problem I have.
 
@Aaron:
I guess if it solves non of your problems than I would use other approaches that do solve your problems.

Model-Driven Development and DSLs obviously do solve problems that other people have.

Markus
 
A project I used to work on, IBM WebSphere Integration Edition, had a tonne of what one might consider DSL's (e.g. BPEL files, Decisition Tables, etc.). These were serialized into XML via EMF, and then stored into what repository was integrated in Eclipse. At first we had no decent compare/merge, just XML level. We knew that was wrong but it turned out that many of our users were ClearCase users and used pessimistic locking, which avoids conflicts and thus avoids a major requirement for merge. Of course optimistic repos like CVS/SVN have an inherent larger need of compare/merge.

Additional development complexity arises when you have many files for one logical type (e.g. plugin editor is split over plugin.xml and manifest.xml), or one file containing many logical artifacts. This is referred to as the "logical/physical problem".

The good news about Eclipse is that compare/merge is first class (v.s. command line CVS with no repo integration). Plus there's some support for the logical/physical mapping. So a solution path exists. But its expensive to do well, and I know the WID team is still working on improving their support. The fact that years later our plugin.xml and .project compares are still textual is somewhat showing.

Nonetheless, none of these issues for me preclude a DSL approach, which I believe to an important and needed trend in software development. It does though raise the investment bar.
 
VCS or databases? A great question. My answer is a bit long to be added as a comment, so I replied in this post.
 
Interesting that you picked the issue of repositories as the primary point. I'd agree, and I guess Martin Fowler thinks similarly as he's written a separate article on them:
http://martinfowler.com/bliki/RepositoryBasedCode.html

Since I find his idea of repositories differs from my experience in important areas, I've written my own reply on these issues:
http://www.metacase.com/blogs/stevek/blogView?showComments=true&entry=3395579842
 
Just to add a link here for completeness: Richard Welke's seminal article on The CASE Repository is now online. More in my blog entry: "The Model Repository (was: The CASE Repository)"
 
Post a Comment

<< Home

back to voelter.de

ABOUT ME
This is Markus Voelter's Blog. It is not intended as a replacement for my regular web site, but rather as a companion that contains ideas, thoughts and loose ends.

ARCHIVES
December 2005 / January 2006 / February 2006 / March 2006 / April 2006 / May 2006 / June 2006 / July 2006 / August 2006 / September 2006 / October 2006 / November 2006 / December 2006 / February 2007 / March 2007 / April 2007 / May 2007 / June 2007 / July 2007 / September 2007 / October 2007 / November 2007 / December 2007 / January 2008 / February 2008 / March 2008 / April 2008 / May 2008 / June 2008 / July 2008 / August 2008 / September 2008 / October 2008 / November 2008 / December 2008 / January 2009 / February 2009 / March 2009 / April 2009 / May 2009 / June 2009 / July 2009 / August 2009 / September 2009 / October 2009 / November 2009 / December 2009 / January 2010 / February 2010 / April 2010 / May 2010 / June 2010 / July 2010 / August 2010 / September 2010 / October 2010 / November 2010 / December 2010 / January 2011 / March 2011 / April 2011 / May 2011 / June 2011 / July 2011 / October 2011 / November 2011 / December 2011 / January 2012 / February 2012 / October 2012 / January 2013 /

FEED
You can get an atom feed for this blog.