Building a cross-platform & cross-language IDE [Part I]

Hi, and welcome to Part I of this article series where we'll build a cross-platform & cross-language IDE.

If you haven't done so already, feel free to read the Introduction article in order to get a basic idea of what we're trying to achieve here.

In this part and the next one, we'll solely focus on the back-end piece, keeping our client piece (IDE) for later.

First of all, let's try to get a high level idea of what our back-end should achieve/offer without getting into any implementation details nor language/technology choice just yet.

As a client (we'll build one later : our IDE), here are the two basic things we'll expect from the back-end (functional requirements) :

  • [FR1] If requested to, the back-end should return the list of languages it supports.
  • [FR2] The back-end, should offer compilation (optional depending of the language) and execution of any code send by the client (if written in a supported language) and send back the (compilation)/execution output to the client in real time.

I am not discussing yet about the REPL or the ability to add a new language. Let's keep some more fun for later on ;)

Ideally our back-end should also comply with some non-functional requirements such as :

  • [NFR1] Be built using widely adopted standards so that it can be easily integrated and accessed by any client. Yes we are going to build our own client IDE but that's not a reason to tight couple our back-end to our client. Unless absolutely necessary, it's better to avoid this and to go for an highly decoupled architecture, promoting re-usability and allowing anyone to build any kind of client.
  • [NFR2] We should be able to run the back-end locally (for testing purpose or even for final use) or to run it remotely.
  • [NFR3] It should be easy enough to install the back-end on any of the major platforms (Windows/Linux/OSX).
  • [NFR4] Requirements in term of software(s) to be installed/configured to be able to run the back-end should be kept to a strict minimum.
  • [NFR5] It should be easy enough to uninstall the back-end properly.

Given this set of functional and non-functional requirements, here is the direction we will take :

  • Our back-end functionalities will be exposed through a web service built as a REST API. Thus complying with [NFR1] and [NFR2]. We will build this service as a Node.js application (meaning that JavaScript will be the language we'll use to implement the public surface of our back-end).
  • We'll use Docker to contain the various environments used to compile/execute code in each of the supported languages (more on that later). This indirectly makes our back-end a bit more compliant with [NFR3] [NFR4] and [NFR5].

You've most probably heard (even maybe played with ... or even being an expert) about Node.js and/or Docker.
Both of these technologies are getting quite some hipe, and it's actually pretty hard nowadays not to fall on an article mentioning one or the other.
Anyway, if you are not familiar with one or both of these technologies, don't worry too much. Before working on this project/article I had never really used any of them, but considering the scope of our project, there is not too much to learn.

Please note that we could have implemented our REST API in almost any language (C++, Java, C#, Go, your-favorite-language-here ...). I chose Node.js because it is quite popular nowadays and I wanted to play with it a bit. What's better than a concrete hands-on project to learn a language ? ;)

That being said, regarding the choice of Docker as the containing environment for our languages runtime, there is not really many alternatives (that I know of) that fits so nicely to our needs, and I guess that given the requirements, Docker just imposed itself as the technology to use.

Now that we've selected the technologies we're going to use to build our back-end, we need to setup our machine to make use of them.
In the rest of this article I will quickly introduce Node.js and Docker and how they fit in the overall picture being our back-end solution.
If you are already familiar with one or both of these technologies you won't learn much here, and you can just skip the "brief intro to" sections.

As I said in the introduction article, I won't go too deep into technical details regarding these technologies. This serie is not intended to be a Node.js nor a Docker tutorial. If you feel like learning more about these technologies, there is plenty of content and resources freely available online.

A brief intro to Docker

Docker is a newcomer on the virtualization scene (first version was released in 2013) but it immediately gained a lot of traction.
It is an open-source software, written in Go and running on Linux (a Windows version is on the way).
If you don't know anything about Docker, chances are however that you heard about VMWare or VirtualBox, or even used them.
Docker can be seen as one of these players, from a very high level point of view. It lets you create and run virtual machines (called containers in Docker terminology) that are isolated from each others and from the host OS.

That being said, when taking a closer look, Docker is quite different from the aforementioned virtualization software. The major difference lies in the fact that where VMWare and VirtualBox are using a specific piece of software called the "hypervisor" which emulates hardware (virtual hardware), Docker has no such thing. Instead, it talks directly to the linux kernel of the host environment and has no notion of emulated hardware, instead accessing directly the real host hardware through the host linux kernel. It also means that there is not real "guest OS" running in Docker containers. You can only use linux based distributions, which are all of course using the underlying host linux kernel.
One of the immediate benefits is that free of both of these high overheads (hypervisor and guest os), containers are much leaner and much much much faster to start.

However, due to the fact that Docker containers are running directly on top of the host linux kernel, you can pretty much forget about installing a container running Windows for example. It's just not possible at the time of this writing (even if as said before, a Windows version of Docker is in development) and is not really the purpose of Docker anyway.

Introducing Docker wouldn't be complete without discussing about images.
A Docker image is the basis of a Docker container. When you start a container you need to tell Docker what image you want to run inside this container. You can build your own images (we will actually build one for our back-end for easy distribution) from scratch or based on other images (kind of inheritance. layers is the terminology used by Docker) or you can retrieve and run pre-built images.
Docker provides a central online repository of Docker images called Docker Hub. On this repository you'll find many images, some home-made, other which are official ones and flagged as such. For example you can find anything from OS images (Ubuntu, Debian, Arch ..), databases (Mongo, Redis, CouchDb ..) and even languages/platforms (Node.js, C#, Java ...). Sounds like something we might be interested in for our back-end doesn't it ? ;)

Installing Docker

You can install Docker on any major platform (Windows, OSX, Linux).
For Windows and OSX, the installer will automatically download and install VirtualBox in order to run a lightweight linux distro where VirtualBox will run (remember, it only runs on Linux for now). This plumbing is pretty transparent and fully taken care of by the installer.

If you want to follow this article serie in practice, please go and install Docker, as it is one of the pre-requisites for running our back-end.

Please click on the platform of your choice to be redirected to the installation instructions on the official Docker website : Windows / Mac / Linux.

Docker use for our back-end

As mentioned in our Docker introduction, there are a lot of official images for various computer languages. For example : Java, Haskell, Python ... just to name a few. Clicking on these links will direct you to the repository containing these images on Docker Hub.

It might seem strange to run a "virtual machine" containing a language. A language is not an OS. Well you are right if you think so. To clarify, each of these language images are built on top of another base image being a linux distribution image (ubuntu, debian, alpine ...).
Concretely this means that when we will run the Java image inside of a container, we'll in fact run a "virtual machine" consisting of a base OS on top of which the JDK and JVM are installed.

Given these clarifications, our back-end will make use of Docker as follow :
When a client (our IDE) wants to compile and run some code, it will somehow send it over the wire to our back-end. Our back-end will then download the necessary image for the language in which the code is written (if the image has not already been downloaded). It will then kick off a container running this image and send the source code to the container to be compiled and executed inside the container where all the necessary tools and compilers for the language will be present.
During compilation/execution we will attach our back-end application to the stdout and stderr streams of the container, to get back the compilation and execution output and/or errors.
Once execution is completed, the container will automatically stop.

Starting a new container for each compilation/execution request might seem overkill and you may think it will induce a lot of lag; but it's actually not the case. As said during the brief intro on Docker, starting a container is actually very very very fast, so you won't even notice the lag (expect the first time the language image is downloaded, it can of course take quite some time, but nothing to do with starting containers). The advantage of this is that we keep resources usage pretty low, and also every temporary files generated during compilation/execution are destroyed once container is stopped, providing nice isolation from compilation to compilation.

Part of this process can be seen on the architectural diagram of our back-end at the end of this article.

A brief intro to Node.js

Node.js is also a newcomer in the world of runtime environments. It's been around a bit longer than Docker (introduced in 2009 but really widely adopted around 2011) and as for Docker is has gained major traction (it is good to be noted that Node.js is also open source). One of the reason behind this, is that the language used to write Node.js application is one of the most popular languages in the world : JavaScript.

JavaScript was mostly only used as a client side language for web applications and websites. The "revolution" with node.js is about bringing JavaScript to the server side (it is not the first initiative to accomplish this, but it is by far the most successful and performant one).
Execution of the JavaScript code is handled by Google V8 engine, a very powerful JavaScript engine written in C++, which is also open source and which you may actually use everyday without even knowing it, as it powers JavaScript execution in the Google Chrome browser.

There is not much to learn about Node.js itself in order to start developing Node.js applications. If you know how to program in JavaScript, then you should have no problem moving to Node.js. Anyway, in the next part of this serie, once we actually start coding, I'll guide you through the basics as we go on with development.

Most of Node.js applications are quite "simple" in the sense that most of them consist of a single source file, of a few thousand lines at most. Spoiler alert : our finalized back-end will be around 200 lines.
This is due to two main factors : First, JavaScript language is not that much verbose (as compared to a heavily verbose language such as Java for example). Secondly, Node.js apps consists of many small pieces pieced together (each piece can be seen as a specific library -a module in Node.js terminology-) and most of these pieces have already been written by other developers, meaning that you can mostly only focus on the business logic of your app and don't have to care about much more than that. One of the reason why so many modules are around is surely due to the fact that Node.js is driven by a very large and active open source community.

Installing Node.js

Node.js can be installed on all the major platforms (Windows/OSX/Linux).

Installation is done through an installer and the process is pretty much straightforward : download the installer, run it, follow the instructions, done ;)
Installers for all the platforms can be found on their official website by clicking here. Just choose LTS version and download the installer for your platform / CPU.

As it was the case for Docker, installing Node.js is a pre-requisite to be able to follow this article in practice.

Appart from Node.js and Docker we won't need to install anything else.

Node.js use for our back-end

As said previously, we could have used another runtime environment / language to implement our API.
That being said, we chose to use Node.js to develop our back-end API which will be exposed to the client.
It is good to be noted that we will use the http module of Node.js to "self-host" our back-end application, meaning that we won't need an external web server to host our application. Our application will itself be the server.

In addition to the http module, we will also use three other modules :

  • express which will allow us to build our REST API easily.
  • child_process which will let us spawn processes. This will be of use to run docker commands to pull images or run containers.
  • underscore will provide us with quite a few helper functions to make our code even less verbose.

Of these 4 modules, only express and underscore are third party modules. http and child_process are base modules, part of the Node.js framework itself.

A closer look to our back-end architecture

Finally, before jumping to coding in the next part of this serie, it is said that a picture worth a thousand words, so here is an overview of our back-end architecture a.k.a "the big picture" (well it's actually quite small but whatever !).

You can see here that the client app (our IDE) will interact with our back-end through the Node.js app.
HTTP requests at the low level will be intercepted by the http module and directed to the express framework containing our API (0) which will handle it accordingly. If the request is to compile/execute some code, then our Node.js app will use the child_process module to achieve the necessary (i.e pulling image if needed (1) then starting appropriate container to run the code (2)). Output will then travel the reverse direction and traverse all of our code stack back to the client.

You'll also note the config.json file part of our Node.js app. This file will actually hold the configuration of our back-end, consisting for now solely of the languages that we will support. You could think that a database would be better suited to this mean but in fact it would only make things more complex, it's much better to stick to the YAGNI principle at this point !

Before closing this part, I'd like to point out that this architectural diagram is not yet fully complete. Indeed, the architecture could enforce a bit more the [NFR3] [NFR4] and [NFR5] requirements. Let's think about this for a minute.
Using a back-end based on this architecture, one would have to install and configure Node.js as well as Docker. Wouldn't it be nice if we could package things in such a way that only one of these software would be needed ?

I'll let you think about it until we meet again in the next part of this serie where I will present the solution and modify the architecture accordingly (not much will change).
Hint : If you've seen the movie "Inception" it will look like if the answer was inspired from it. If you've not seen this movie ... wait, what ? seriously !? How could you have missed it ? go watch it NOW ! -but don't forget to install Node.js and Docker first- :)

This marks the end of this part. It was a pretty dense one, epecially if you had no prior knowledge of Node.js or Docker. Take some time to absorb this content, review it and maybe read a bit more around it (official documentation on the Docker/Node.js website are good and easy starting points).

til next time !