More on the evolution of programming languages

May 14th, 2003 § 4 comments

Almost two months ago, I wrote a post about the evolution of programming languages. In the post, I noted that many of today’s mainstream languages are just syntactical variations of older languages, and consequently are not contributing with anything new to the evolution of the field. I also asked what evolutionary path new languages would follow, considering that programming needs obviously will not remain constant. The post was more of a big question as I was just trying to think about how languages would be in the future, looking at current trends such as dynamic typing, just-in-time virtual machines, generic programming, and other concepts that are not necessarily new but have gained wider acceptance of late. I also argued that many of the most powerful languages are based on a core of simplicity that can be easily extended to represent new concepts.

Interestingly enough, in the weeks that followed that post, I found that many people were interested in the same topic and had independently written about the issue in their weblogs or sites. I also found pointers to interesting material that, coupled with the aforementioned posts, allowed me to find answers to some of my questions and get an interesting vision of the future of programming languages, which validated some of my own thoughts.

One of the first documents I came across was a PowerPoint presentation by Todd Proebsting, a Microsoft researcher. In his presentation, titled Disruptive Programming Language Technologies (PowerPoint, 218KB), Proebsting argues that the most important problem when developing a modern programming language is how to improve programmers’ productivity, as hardware is now powerful enough and doesn’t required compilers to massively optimize code and data structures. Although I obviously didn’t attend his presentation, I can easily see Proebsting’s point. In my own post I wrote that, as the age of inefficient virtual machines was over, experimentation would likely increase as efficient virtual machines provide a more fertile ground to tinkering with new concepts without the constraints of forced optimization.

The main point of Proebsting presentation is not to show how languages can be designed, but rather how can they succeed where others failed. However, his look at the problem is noteworthy: he says that proper algorithms and design are sufficient to create efficient programs in modern compilers, and that languages designers need instead to worry about ways to make programmers produce good programs quickly, correctly, and easily. He goes to note that languages are the place to tackle at the problem, because they are at the root of the programming process, as opposed to software engineering and analysis.

Another article I came across was the widely divulged The Hundred-Year Language, by Paul Graham. In that article, based on a talk he gave at PyCon, Graham tries to imagine how programming languages will be a century from now. He argues that the languages of the future will probably be based on clean and concise core containing simple axioms from which all the other features of the languages can be derived — a point I also made in my post, although I was not as clear. He wrote:

At the very least, it has to be a useful exercise to look closely at the core of a language to see if there are any axioms that could be weeded out. I’ve found in my long career as a slob that cruft breeds cruft, and I’ve seen this happen in software as well as under beds and in the corners of rooms.

I have a hunch that the main branches of the evolutionary tree pass through the languages that have the smallest, cleanest cores. The more of a language you can write in itself, the better.

Graham is a Lisp evangelist. And, Lisp is, of course, the perfect example of a concise language. It has a very small set of syntactical and semantic rules that can be almost unimaginably extended to create new forms of expressing problems in a highly intelligible and maintainable way. I think Graham is right. Syntactically simpler languages are inherently more powerful because they don’t required programmers to hold too many concepts in their minds at the same time, which naturally leads to more succinct code and less bugs. Also, like Proebsting, Graham believes that hardware speed will not be a problem to future programming languages. Instead, it will be a way to help language designers to concentrate on what really matters: simplification of data structures to allow better expressiveness. Graham goes to explain how that can happen. He writes:

Most data structures exist because of speed. For example, many languages today have both strings and lists. Semantically, strings are more or less a subset of lists in which the elements are characters. So why do you need a separate data type? You don’t, really. Strings only exist for efficiency. But it’s lame to clutter up the semantics of the language with hacks to make programs run faster. Having strings in a language seems to be a case of premature optimization.

Smalltalk is a good example of that needed kind of simplification in programming languages. Taking strings for example, Smalltalk provides a generic class appropriately named String to handle them. The String class is a descendant of the CharacterArray class, another generic class that handles arbitrary arrays of characters, as it name says. The CharacterArray class, in turn, descends from the Collection class, which is again a generic class handling arbitrary collections. Each step in the class hierarchy just adds enough functionality to handle its problem domain. So, the Collection class has other descendants like Set, Bag, and Dictionary to handle special cases of lists. Likewise, the String class has special descendants to handle different string types. Those descendants easily integrate in a greater class framework to provide specific functionality without cluttering the language itself with obscure and unnecessary syntactical devices. It’s interesting to note that the introduction of generics in some strongly typed language is just a poor man’s attempt to emulate the natural polymorphism of dynamically typed languages as exemplified by Smalltalk.

However, further in his article, Graham slams object-orientedness, and writes:

(…) Though I don’t think [object-orientedness] has much to offer good programmers, except in certain specialized domains, it is irresistible to large organizations. Object-oriented programming offers a sustainable way to write spaghetti code. It lets you accrete programs as a series of patches. Large organizations always tend to develop software this way, and I expect this to be as true in a hundred years as it is today.

I think Graham misses a point about programming here. Most good programs are written in a series of small steps. Graham himself recognizes this fact in other of his articles. The biggest problem with object-orientedness is not in the paradigm itself, but in the way languages are implemented and how environments are constructed to support those languages. Much as the Lisp environment is part of what makes the language powerful, so object-oriented languages need a powerful environment to thrive. Just take a look at Smalltalk. In that language, applications are usually written in a series of small steps, with simple classes building on top of other simple classes leading to a highly flexible and maintainable application. However, without proper support from the environment, the relationships between the classes would quickly scale out of control. Smalltalk has a powerful environment that follows the same philosophy of the language showing programmers a high-level view of the classes on it, and allowing them to quickly manipulate, refactor, and extend those classes. Modern object-oriented languages like Java and C# fail to recognize this and use trivial development environments that are little more than glorified text editors. The languages themselves have other shortcomings, of course, but the lack of good environments just adds to their problems.

One could indeed argue that if the language were simpler the environment would not be needed. However, taking Smalltalk as example once again, it’s easy to see that the language is built around a few rules of which the most important is the message passing mechanism. From this point of view, it’s possible to consider the classes as simple packages grouping separate functionality passing messages back and forth to make an application work, which means that Smalltalk is as simple as it needs to be. As Einstein once said: “Things should me made as simple as possible, but no simpler.” The environment just makes finding and handling those packages easier.

Finally, it should be noted that Smalltalk allows manipulation of the core classes that make its framework. A programmer can modify and add to the behavior of base classes like Boolean and Number, without needing to derive new classes from them. This is another factor in making it an extremely powerful languages, and it’s again a concept notably lacking from Java, C#, and other object-oriented languages.

The problem with Smalltalk, then, is not is its object-oriented nature, but rather on the features it lacks to be a truly extensible language. Of course, it’s possible to extend the language parser, but that would be beyond what most programmers would accept to improve a language.

Another good approach to extensibility in object-oriented languages is to mix different paradigms as Python does. Much of Python’s success results from its ability to intermix different concepts, integrating imperative, functional, and object-oriented notions, to create a readable and expressive language, while keeping its simplicity.

Another interesting document I found was a paper by two other Microsoft researchers, Erik Meijer and Wolfram Schulte, titled Unifying Tables, Objects, and Documents (PDF, 163KB). Their paper also focuses on the issue of programmer productivity and proposes extensions to a statically typed object-oriented language (which can be C# or Java) to natively support some common kinds of data present in today’s environments, namely relational data (SQL tables) and hierarchical data (XML). The extensions are really interesting, and they would really simplify the task of manipulating such structures in the referred languages — the paper presents alternative syntax for streams, tuples, unions, content classes and queries. However, adding to the language core cannot solve the fundamental problem of making the language directly extensible by programmers. The proposed extensions are just syntactic sugar, and a language explicitly designed for extensibility would accept those expansions naturally, which also means programmers would be using them naturally without the need of a conscious effort to evolve the language.

This paper reminds me of a problem I have had to confront in all my years as a Web developer. When I started developing Web applications I was using Borland’s Delphi as my primary language. Naturally, I decided to use it to develop Web applications as well because I could leverage my knowledge about the language and its class framework. At the time, some people I knew were using Visual Basic to develop CGI applications so I ported the framework they were using to Delphi. I quickly found that, although the framework was a big help in the development, the task of producing HTML and handling forms was terribly boring and too manual to appeal to my tastes. Therefore, in the following years, I developed a lot of special frameworks to make HTML processing easier and to automatically handle form submission and validation. I went through three different commercial frameworks and many of my own. However, I was never satisfied because even with the use of templating mechanisms, the task was still boring, repetitive, and error prone. HTML, as a structured language, simply didn’t fit with Delphi. So, the approach the paper describes to handle HTML is interesting because it makes the structural aspect of HTML (or XML, for the matter) a part of the language, reducing context switching and improving productivity. More so, it’s much better than current environments that allow programmers to intermix HTML and programming language because there is a clear separation between logic and presentation. I have used other languages as well, but the problem remains in all of them.

Meijer and Schulte’s paper is limited in that it only looks at a restricted form of language extensibility, which would have to be implemented at the core of the language and would not solve other general programming problems. Another paper, Donovan Kolbly’s PhD dissertation, which was mentioned in the Lambda the Ultimate thread discussing the aforementioned Meijer and Schulte’s paper, proposes news approaches to the construction of arbitrarily extensible languages based on Earley parsing. In this case, the target language would acquire functionality similar to Lisp macros, making it arbitrarily extensible.

It’s worth noting that extensibility comes at a price. In a post about two months ago, Rafe Colburn showed an extremely idiosyncratic bit of Perl code in Blosxom, a weblogging tool, that was nearly (he says completely) incomprehensible for non-Perl programmers. Perl’s syntax allows programmers to express an algorithm much more concisely that many other languages, but it can become quickly unreadable because of this very terseness. A openly extensible language would suffer the same problem. Parts of the code using an extended syntax would required additional explanation to be understood. In some cases, the need of documentation would cancel the benefits of extensibility.

In another of his articles, the also recently famous Hackers and Painters, Paul Graham writes:

Source code, too, should explain itself. If I could get people to remember just one quote about programming, it would be the one at the beginning of Structure and Interpretation of Computer Programs:

Programs should be written for people to read, and only incidentally for machines to execute.

Extensibility is always better in the long run because it allows the language to grow beyond its original purpose, but it doesn’t mean much if it can’t be understood when needed.

Extensibility is also needed in today’s programming scenario because of the new requirements of an interconnected world of fast changing protocols. Granted, one can always resort to raw sockets to implement anything requiring a network connection, but it’s much better to integrate some mechanism in the language itself. Python XML-RPC implementation, for example, allows a transparent use of remote methods as if they belonged to the class representing the remote connection. Database integration, structured text handling, and other similar subjects could equally benefit from such approach.

What do all those things mean? I think it’s possible to come to some conclusions from the facts mentioned above:

Firstly, programming languages in the future will likely be more about productivity than raw power. As hardware becomes more powerful there will be more room for experimentation without the constraints of required optimization. Virtual machines — and possibly extended processors — will allow languages to outgrow their memory and processor starved origins.

Secondly, programming languages in the future will likely be more concise, building around a simple core. Human being cannot hold too many concepts in their minds at the same time. Programmers are not much different. Languages that can completely fit in a programmer’s mind are much more efficient because the programmer can make full use of its resources without struggling to remember many too many different concepts.

Thirdly, programming languages in the future will likely aggregate different paradigms. Each existing programming paradigm provides a different benefit to a programming language. It’s not necessary to build a language around just one paradigm, as Python and other languages have demonstrated.

Fourthly, programming languages in the future will likely make use of good development environments to improve productivity. I think a good environment is paramount to productivity. Today’s environments are little more than expanded text editors, but there is much to improve in this area. Testing and refactoring are just two examples of what good environments can provide additionally to the language itself.

Fifthly, programming languages in the future will likely find a way to balance terseness and readability. Succinctness in a programming languages leads to more productivity, but can degrade readability. Approaches will have to be found to compensate for terseness is a language while keeping it understandable. As languages are just notations, it’s just a matter of find the proper way to represent programming constructs in the required terse and readable way.

Finally, programming languages in the future will likely integrate extensible mechanisms in their syntax, allowing programmers to grow the language as needed. Many tasks in programming can be enormously simplified by languages extensions. And in many cases those extensions are nothing more than syntactic sugar. Take, for example, the for each loops that are present in many languages in one way or another. They are just a simplified interface to a enumeration. In the most basic sense, all loops are while loops. Providing programmers with ways to hide complex structures and algorithms behind a syntactical facade is a powerful approach to extensibility. Semantic extensibility is harder, but can also be done. As I said before, power comes at a price, but the price must be paid if a language is intended to evolve.

Some final observations:

In all the points above, I’m talking of near future programming languages. It’s quite possible that some yet unknown genius will discover a new programming paradigm that will radically change the way we program. Obviously, such development cannot be predicted and I’m not considering it in the previous analysis.

I’m interested in hearing complementary analysis, rebuttals, and even flames. Feel free to use the comments below to provide feedback.

§ 4 Responses to More on the evolution of programming languages"

  • Amit Patel says:

    Random comments:

    Sometimes having N simple but very different things is easier for my brain to deal with than having 1 thing that can do it all. I can deal with C and Scheme and Python all at the same time because they’re so different. I have trouble dealing with C and C++ at the same time, or Scheme and Lisp at the same time.

    It seems that we had really cool (Lisp, Smalltalk) development environments in the past, and today’s environments are relatively dumb. Are we really going to move in the direction of smarter environments?

    Languages that rely on a development environment have a harder time becoming successful because they’re typically only accessible on one platform. Languages that only rely on text editors are usable just about everywhere.

    Extending classes defined elsewhere sounds cool. What worries me is that an application takes on a life of its own, in its own little world. If I want to send you something of mine, I have to ship you my entire Squeak world. I can’t just send you my class, because it depends on Integer having been modified in some way, maybe using my own Logging or Exception framework. I ran into this problem on LambdaMOO, where objects defined on one MOO could not be used on another MOO because the basic types (person, room, etc.) were different. Customization and reuse leads to dependencies. Dependencies make it harder to extract something to reuse it elsewhere.

  • Ronaldo says:

    [ Sometimes having N simple but very different things is easier for my brain to deal with than having 1 thing that can do it all. I can deal with C and Scheme and Python all at the same time because they’re so different. I have trouble dealing with C and C++ at the same time, or Scheme and Lisp at the same time. ]

    I guess that has something to do with syntax. When I switch between Delphi and Visual Basic (aaaargh!) at work, I have problems with their similar syntax. For example, I keep using semicolons in VB. But I think programmers can benefit from such switching because it forces them to look at problems from different angles and that helps them to evolve their skills. However, I believe minimalists syntax help more because they can be used in their entirety without context switching within the same language.

    [ It seems that we had really cool (Lisp, Smalltalk) development environments in the past, and today’s environments are relatively dumb. Are we really going to move in the direction of smarter environments?

    Languages that rely on a development environment have a harder time becoming successful because they’re typically only accessible on one platform. Languages that only rely on text editors are usable just about everywhere. ]

    Well, as I wrote in the post, I think a good environment is helps a lot. Smalltalk showed that, and it also showed that you can have a good environment in multiple platforms if the language is open. I believe the problems is that people today are just too used to using glorified text editors and don’t worry too much about good environments. Nonetheless, every time a good environment is created for a given language people realize they’re much more productive if the environment is more powerful. Delphi, for example, has many components that integrate directly in its IDE to help programmers to be more productive. One of such components is Bold, a model-driven architecture tool that allow the problem domain of the application to be mapped to classes that seamlessly access the database. You can code everything by hand, but it will take much time to do so. Also, people are always trying to create extensible editors that tightly integrate with the languages they support. That’s especially true of Java, PHP, Perl and Python. I think that also demonstrates that good environments will be much more successful in the future.

    [ Extending classes defined elsewhere sounds cool. What worries me is that an application takes on a life of its own, in its own little world. If I want to send you something of mine, I have to ship you my entire Squeak world. I can’t just send you my class, because it depends on Integer having been modified in some way, maybe using my own Logging or Exception framework. I ran into this problem on LambdaMOO, where objects defined on one MOO could not be used on another MOO because the basic types (person, room, etc.) were different. Customization and reuse leads to dependencies. Dependencies make it harder to extract something to reuse it elsewhere. ]

    Well, Smalltalk deals nicely with this problem. The environment keeps track of what has been changed, and you can send just what was modified. And, in practice, you rarely have confliting changes in different projects as many of those changes do not modify the base behavior of the classes. Of course, you can still have problems — especially when you are modifying classes in the problem domain. However, those problems can happen in any kind of project. For example, changes in the implementation of interfaces in a Java project can cause the same kind of trouble. Also, Smalltalkers tend to program in small and independent pieces and I guess that also helps avoiding problems.

  • James says:

    With the exception of point #3 (aggregate different paradigms), this reads like a *history* of programming language design, rather than predictions of the future.

  • Ronaldo says:

    What about going back to the roots? :-)

What's this?

You are currently reading More on the evolution of programming languages at Reflective Surface.

meta