This blog post is a companion document to two Chit Chat Across the Pond segments I will be recording with Allison Sheridan on the NosillaCast over the next two weeks. The first of the two shows is now out, and and can be found here. One the second show is out I’ll add that link in too.

In episode 474 when Allison was chatting with Donal Burr about Apple’s new Swift programming language said she didn’t understand what a compiler was, so I thought it might be fun to try address that! But rather than focus in on just that one very specific question, I thought it would be more useful to take a high-level look at computer programming in general, so that some of the conversations around various developer platforms will make more sense to the majority of NosillaCast listeners, who are non-programers.

I find things always make more sense with examples, so I’m going to provide a number of them throughout this post, and if you want to play along, you’ll need to have Apple’s command line developer tools installed on your Mac (or, you’ll need the GNU C Compiler AKA gcc installed on a Linux computer/VM). I find it’s helpful to have the developers tools installed on any Mac, even if you don’t program, because they add a lot of command line tools to OS X. If you don’t have them installed, I suggest you have a read of this c|net article. One of the examples uses Java. If you have Java installed, by all means play along, but if not, I wouldn’t recommend installing Java just for this one example.

With the preliminaries out of the way, lets get stuck in. In true NosillaCast style, we’ll start with a problem to be solved.

Why Do We Need Programming Languages?

When it comes to telling a computer what to do, we hit a major language barrier. Computers only understand binary machine codes, and with the possible exception of a handful of uber uber uber nerds, humans just don’t. To illustrate the magnitude of the problem, below is the actual binary code for a REALLY simple computer program (in a scrollable box so it doesn’t take up the entire page):

It’s pretty clear this is not a format in which humans can easily work!

All those 1s and 0s represent the instructions to the CPU, as well as the data the instructions should work on/with. We can re-write that same information in a slightly more human-readable form by representing the instructions to the CPU, and the various registers contained on the CUP with simple codes/names, and representing the data in ASCII form. This is what we call assembly language. The above binary code can be written in Intel x86 assembly language as:

  1. .section     __TEXT,__text,regular,pure_instructions
  2.      .globl     _main
  3.      .align     4, 0x90
  4. _main:                                  ## @main
  5. ## BB#0:
  6.      pushl     %ebp
  7.      movl     %esp, %ebp
  8.      subl     $8, %esp
  9.      movl     $L_str, (%esp)
  10.      calll     _puts
  11.      xorl     %eax, %eax
  12.      addl     $8, %esp
  13.      popl     %ebp
  14.      ret
  15.  
  16.      .section     __TEXT,__cstring,cstring_literals
  17. L_str:                                  ## @str
  18.      .asciz     "Hello World!"
  19.  
  20.  
  21. .subsections_via_symbols

It’s still very low-level, but at least humans can read it, and, if you’ve got the requisite skills, program a computer in it (Steve Gibson famously programmed SpinRite in assembly). However, it’s still not a very human-friendly way to express instructions to a computer. This is why high-level computer languages were invented.

The above assembly code (and the binary above that) are equivalent to the following very short C program:

  1. #include <stdio.h>
  2.  
  3. int main()
  4. {
  5.     printf("Hello World!\n");
  6.  
  7. }

This is obviously much more human-friendly, and even many non-programers will probably be able to intuit that this simple program just prints “Hello World!”.

Obviously, to get from this nice human readable form to the binary we saw above, we need some kind of converter, and this is where compilers and interpreters come in.

Compiled Languages -v- Interpreted Languages

There are two basic approaches different programming languages use to getting from their human-readable to binary codes the computer can execute, the can either use a compiler, or and interpreter.

A compiler takes the human readable code and transforms it into the binary code for a specific computer architecture and OS, and then saves those 1s and 0s to a file which can then be executed or run over and over again with out the compiler’s involvement. This is how the vast vast majority of commercial software works. Every app you buy in the Mac, iOS, Windows or GooglePlay app stores has been compiled, as have all the big commercial apps you buy direct from the makers like Photoshop, Office, and so on. The same is true of the majority of open-source apps many of us use like FireFox, LibreOffice, and the GIMP.

Compiling has many advantages, it is very efficient, you do it once, and then you can execute the code over and over again without any overhead, and, it means you can distribute your app without ever sharing any of your source code, which is very important to commercial software vendors.

There is however another approach, and which it’s rarely used for large software products, it is used by every scripting language I have ever encountered. Scripting languages don’t have a traditional compiler, instead they have an interpreter. In many ways interpreters are very similar to compilers, they do the same basic translation between a human-readable language and computer code, but they don’t create an executable binary file that can be used over and over again. Instead, they translate the code on the fly. Each time you run a script, you are in effect recompiling it.

This is obviously less efficient, because the translation happens each time the code is run, but, it has advantages too. Firstly, compilation is slow, because all the resources needed by an app have to be bundled into the single executable file. It’s not unusual for an app to take a few minutes to compile, and larger projects can even take hours! This can make tweaking and testing code painful. Interpreted languages run pretty much instantly, so you can tweet, test, and tweak very quickly.

Compiled code has another disadvantage, the binary codes are different for different CPU architectures, and for different operating systems, so, you have to compile different versions of your code for each platform your support. This is why download pages often give you a lot of different downloads to choose from. The same script will run on any platform that has an interpreter for that given language, so you they are much more portable.

Perl is an example of an interpreted language. I can write some Perl code on my Mac, run and test it, and then deploy it on a Linux server, or give it to a friend using Windows to run on their computer. Perl code can be run on any computer that has a Perl interpreter.

The Third Way – Compiled AND Interpreted Languages

With the rise of Java, a new approach gained real traction, using both a compiler AND an interpreter to get the portability of an interpreted language with much of the efficiency of a compiled language.

Human-readable Java code is compiled to machine code by the Java Compiler, but it’s not machine code for any real computer architecture, instead it gets compiled down to the machines code for the Java Virtual Machine.

This compiled Java code is then run using an interpreter called the Java Runtime Environment (or JRE).

You might imagine that this kind of hybrid approach would give you the worst of both approaches, but it actually doesn’t, because of one very important fact – it is much easier to translate from one type of machine code to another than from human-readable code to machine code. This means that the Java interpreter is only a little slower than running code compiled for the actual architecture, but, it has all the portability of interpreted code. The idea is that you compile Java code once, then run it on any platform that has a JRE.

Microsoft have taken this idea to the next level with their .net framework. The basic model is the same, you compile you human code down to a generic machine code, then interpret it as you execute, but, they took it one step further by supporting multiple different human-readable languages that compile down to the same .net machine codes.

Getting Practical

Some Compiled Languages:

  • C
  • C++
  • Objective C
  • Swift

Some Interpreted Languages:

  • Shell scripts (e.g. DOS Batch files, Unix/Linux shell scripts)
  • BASIC
  • Perl
  • PHP
  • Python
  • Ruby
  • JavaScript
  • AppleScript
  • VBScript

Some Hybrid Languages:

  • Java
  • Perl 6 (with the Parrot VM)
  • C# (pronounced C-sharp)

Before we get stuck into some practical examples, create a folder and open it in both the Finder and the Terminal. We’ll save all our sample files into this folder, and compile and run them all from there.

Programming always has to be done using a plaint-text editor. You can use TextEdit.app, but only if you switch it to plain-text mode (Format → Make Plain Text). Alternatively you could use a commanline editor like nano or vi, or, you could use a programming editor like the free TextWrangler, or the cheap Smultron 6 (my favourite for casual programming).

It could be argued that it’s more instructive to copy and paste the code yourself, but, if you’d prefer not to, I’ve compiled all the sample code into a single ZIP file which you can download here.

Example 1 – Writing a Compiled Program (in C)

  1. Save the code below in a file called ex1-hello.c
  2. Compile it with the command: gcc -o ex1-hello ex1-hello.c
  3. Execute it with the command: ./ex1-hello
  1. #include <stdio.h>
  2.  
  3. int main()
  4. {
  5.     printf("Hello World!\n");
  6.  
  7. }

Example 2 – Writing an Interpreted Program (in Perl)

  1. Save the code below in a file called ex2-hello.pl
  2. Execute it with the Perl interpreter: perl ex2-hello.pl
  3. OPTIONALLY – make the script self-executing (POSIX OSes allow scripts to specify the interpreter the OS should use to execute a script with using the so-called shebang line, i.e. #! then the path to an interpreter as the first line in a script)
    1. Make the file executable with: chmod 755 ex2-hello.pl
    2. execute the file: ./ex2-hello.pl
  1. #!/usr/bin/perl
  2. print "Hello World!\n";

OPTIONAL: Example 3 – Writing a Hybrid Program (in Java)

  1. Save the code below in a file called Ex3Hello.java (the different naming convention to the other examples is imposed by Java)
  2. Compile the human-readable code to Java machine code: javac Ex3Hello.java
  3. Execute the code with the Java interpreter: java Ex3Hello
  1. public class Ex3Hello{
  2.     public static void main(String args[]){
  3.         System.out.println("Hello World!");
  4.     }
  5. }

Loosely Typed -v- Strongly Typed Languages

Another major difference between different programming languages is in how they store information. Programs are data manipulators, so storing information is an absolutely essential part of every programming language. Programming languages all use variables to store information, but they all have their own rules for how variables work.

The information stored within a variable can be just about anything, a boolean true or false value, a single character, a string of characters, a whole number, a decimal number, a date, a time, or a complex record describing a person’s CV and so on and so forth. These different kinds of information are referred to as types.

While every programming language has it’s own unique quirks when it comes to how they deal with variables, you can broadly group languages into two groups, those that have very strict typing rules, and those that have very loose typing rules, or, in programmer jargon, strongly typed and loosely typed languages.

In a strongly typed language, the programer has to specify exactly what type of information a variable can store a the moment they create (or declare) that variable. In a loosely typed language you just declare a variable by giving it a name, and then put what ever data you want in there.

To illustrate this point, lets see an example of each approach:

Example 4 – Declaring Variables in a Loosely Typed Language (Perl)

  1. Save the code below into a file called ex4-looseVariables.pl
  2. Execute the code with the Perl interpreter: perl ex4-looseVariables.pl
  1. #!/usr/bin/perl
  2.  
  3. my $a = 42;
  4. my $b = 3.1415;
  5. my $c = 'd';
  6. my $d = 'Donkey';
  7.  
  8. print "$a, $b, $c, $d\n";

Notice that we create variables of four different types (an integer, a decimal number, a character, and a string of characters), but we declare each one in exactly the same way.

Example 5 – Declaring Variables in a Strongly Typed Language (C)

  1. Save the code below in a file called ex5-strongVariables.c
  2. Compile it with: gcc -o ex5-strongVariables ex5-strongVariables.c
  3. Execute it with: ./ex5-strongVariables
  1. #include <stdio.h>
  2.  
  3. int main()
  4. {
  5.     int a = 42;
  6.     float b = 3.1415;
  7.     char c = 'd';
  8.     char d[] = "Donkey";
  9.     printf("%d, %f, %c, %s\n", a, b, c, d);
  10. }

In this example we declare the same four variables as we did above, but this time we have to explicitly give each variable a type as we create it (int for an integer, float for a decimal number, char for a character, and char[] for a string of characters). We also have to tell the printf command what type of variable to expect at each insertion point in the string (%d for an int, %f for a float, %c for a char and %s for a char[]).

Some Loosely Typed Languages:

  • BASH
  • Perl
  • PHP
  • Python
  • JavaScript
  • Objective C

Some Strongly Typed Languages:

  • C
  • C++
  • Java
  • Swift

Just like with compiled versus interpreted languages, there are pros and cons to each approach. You’ll notice that most of the languages that are loosely typed are scripting languages, and that’s because it’s much quicker and easier to program in a loosely typed languages, so they are very well suited to small quick projects. But, they have a very big downside, the looseness prevents the compiler/interpreter from doing any type-checking, so a whole bunch of errors go un-caught until the program is running.

This leads nicely to a very important point that really explains why Swift is causing such a stir – not all errors are equal, and language design choices can push some errors from run-time back to compile-time, which results in more stable programs. Lets look at that concept in more detail now.

Not All Errors Are Equal

There’s a whole spectrum of types of error that us imperfect humans can introduce into programs as we write them, but they’re not all equal! For our purposes today, good errors are those that are easy to track down, and bad errors are the sneaky kind that take time and effort to find and fix.

The easiest errors to find and track down are syntax errors. Programming languages have very well defined grammar rules, just like human languages do, and if you break them, your code is said to be syntactically incorrect. When humans do be getting there grammar wrong, we have the intelligence to figure out what the speaker meant (as I just demonstrated), but computers have no intelligence, so when you make a syntax error in a programming language, the compiler compiling it, or the interpreter interpreting it, will quit with an error. You just can’t miss these kinds of errors, because while your code has them it simply won’t run!

To illustrate the point, lets intentionally break the code in the first two examples and see what happens.

Example 6 – a Syntax Error in a Compiled Language (C)

  1. Duplicate the file ex1-hello.c, and save it as ex6-syntaxerror.c
  2. Delete the last line from the file (the one that just has } on it), and save the file
  3. Try to compile the code with: gcc -o ex6-syntaxerror ex6-syntaxerror.c

You should get a compiler error, and no executable file will have been created:

bart-imac2013:CCATP140622 bart$ gcc -o ex6-syntaxerror ex6-syntaxerror.c
ex6-syntaxerror.c:5:30: error: expected '}'
    printf("Hello World!\n");
                             ^
ex6-syntaxerror.c:4:1: note: to match this '{'
{
^
1 error generated.
bart-imac2013:CCATP140622 bart$

Example 7 – a Syntax Error in an Interpreted Language (Perl)

  1. Duplicate the file ex2-hello.pl, and save it as ex7-syntaxerror.pl
  2. Edit the file and change print to pirnt on the second line
  3. Try run the script with the Perl interpreter: perl ex7-syntaxerror.pl

Again, the script does not execute, and the interpreter exits with an error:

bart-imac2013:CCATP140622 bart$ perl ex7-syntaxerror.pl 
String found where operator expected at ex7-syntaxerror.pl line 2, near "pirnt "Hello World!\n""
	(Do you need to predeclare pirnt?)
syntax error at ex7-syntaxerror.pl line 2, near "pirnt "Hello World!\n""
Execution of ex7-syntaxerror.pl aborted due to compilation errors.
bart-imac2013:CCATP140622 bart$

At the very other end of the spectrum are logic errors – the programmer has implemented exactly the algorithm he or she was asked to, but, there was a mistake in the spec, so it actually doesn’t do what it was supposed to, even though it compiles and runs. No compiler or interpreter can ever come to your rescue here, no matter how well designed your language is!

Run-Time Errors Suck!

Every programmer’s worst nightmare is an intermittent bug that only shows up when the code is in use, and only under certain conditions. These can happen in every language, no matter how well designed it is.

To illustrate the point, lets intentionally create some compiled and interpreted code which suffers from the same simplistic intermittent run-time error. We’ll write a C program, and a Perl script which take two numbers as command line arguments, and divide the first by the second, then print out the answer.

Example 8 – An Intentional Intermittent Run-Time Error (in C)

  1. Save the code below in a file called ex8-divide.c
  2. Compile it with: gcc -o ex8-divide ex8-divide.c
  3. Test that it works by using it to divide 100 by 4: ./ex8-divide 100 4
  4. Test some more combinations, say 9 and 3, 16 and 4, and 270 and 90
  5. Now trigger the intentionally planted bug: ./ex8-divide 22 0
  1. #include <stdio.h>
  2. #include <stdlib.h>
  3.  
  4. int main( int argc, char *argv[] )  
  5. {
  6.     /* Make sure two arguments were supplied, or whine */
  7.     if( argc != 3 )
  8.     {
  9.         printf("Invalid arguments - you must provide two integer numbers!\n");
  10.         exit(1);
  11.     }
  12.  
  13.     /* Convert our arguments to integers */  
  14.     int a = atoi(argv[1]);
  15.     int b = atoi(argv[2]);
  16.  
  17.     /* Do the division */
  18.     int ans = a/b;
  19.  
  20.     /* print the answer */
  21.     printf("%d divided by %d equals %d\n", a, b, ans);
  22. }

The program works fine for many combinations of numbers, but, if you pass 0 as the second number, the program crashes!

bart-imac2013:CCATP140622 bart$ ./ex8-divide 22 0
Floating point exception: 8
bart-imac2013:CCATP140622 bart$

Example 9 – An Intentional Intermittent Run-Time Error (in Perl)

  1. Save the code below in a file called ex9-divide.pl
  2. Test that it works by using it to divide 100 by 4: perl ex9-divide.pl 100 4
  3. Test some more combinations, say 9 and 3, 16 and 4, and 270 and 90
  4. Now trigger the intentionally planted bug: perl ex9-divide.pl 22 0
  1. #!/usr/bin/perl
  2.  
  3. # make sure we got two arguments, or whine
  4. unless(scalar @ARGV == 2){
  5.     print "Invalid arguments - you must provide two numbers!\n";
  6.     exit 1;
  7. }
  8.  
  9. # read the numbers from the arguments
  10. (my $num1, my $num2) = @ARGV;
  11.  
  12. # do the division
  13. my $ans = $num1/$num2;
  14.  
  15. # print the answer
  16. print "$num1 divided by $num2 equals $ans\n";

As with the C example, all works fine for most numbers, but again, if we pass 0 as the second number, the script crashes!

bart-imac2013:CCATP140622 bart$ perl ex9-divide.pl 22 0
Illegal division by zero at ex9-divide.pl line 13.
bart-imac2013:CCATP140622 bart$

Compile-Time Errors -v- Run-Time Errors

As we have seen, some errors will always be picked up by the compiler or the interpreter, and some will never be, regardless of how you design your language. However, between these two zones there’s a very interesting grey area, where decisions made when designing a programming language can push some types of error from run-time, which is bad, to compile-time, which is good!

As with everything else, these decisions come with compromises, so there are plenty of really good reasons to use languages that don’t push as many types of error to compile-time as possible.

There are lots and lots of different ways languages can push errors to compile-time, but we’ll just pick one to illustrate the point, by returning to the concept of loosely types languages as compared to strongly typed languages.

Pushing Type Errors to Compile-Time

A type error occurs when you try to do something to data of a particular kind that makes no sense. For example, you can’t divide four by baboon, that’s just arrant nonsense! If you try to force a computer to do something like that it will crash, just like it did when we tried to make it divide by zero.

Because loosely typed languages let you store anything in any variable, the interpreter can’t spot when you try to do something impossible until the code is running and you present it with the impossible operation, so type errors in loosely typed languages are always run-time errors.

Lets intentionally Create one!

Example 10 – An Intentional Type Error in a Loosely Typed Language (Perl)

This example is rather contrived, but it illustrates the point. We’ll create a subroutine that accepts two arguments, divides one by the other, and return the result. The subroutine can obviously only work when it’s passed two numbers, so we’ll intentionally pass it something else to trigger a type error.

  1. Save the code below into a file called ex10-typeerror.pl
  2. Run it to trigger the error: perl ex10-typeerror.pl
  1. #!/usr/bin/perl
  2.  
  3. # define our subroutine for dividing two numbers
  4. sub divide{
  5.     my $x = shift;
  6.     my $y = shift;
  7.  
  8.     return $x/$y;
  9. }
  10.  
  11. # print something to prove we are in runtime
  12. print "I'm running!\n";
  13.  
  14. # now trigger the type error by dividing 4 by a baboon
  15. my $a = 4;
  16. my $b = 'baboon';
  17. my $ans = divide($a, $b);
bart-imac2013:CCATP140622 bart$ perl ex10-typeerror.pl
I'm running!
Illegal division by zero at ex10-typeerror.pl line 8.
bart-imac2013:CCATP140622 bart$

Because Perl is loosely typed, it can’t check if legal values will be passed to the subroutine, because the subroutine doesn’t tell the interpreter what types it expects, and variables have no type when they’re defined, so all Perl knows is that a call will be made which passes some values to a subroutine. The only way you can find out you have an error is at run-time.

Now lets contrast this behaviour to what you get with a strongly typed language, C in this case.

Example 11 – An Intentional Type Error in a Strongly Typed Language (C)

  1. Save the code below in a file called ex11-typeerror.c
  2. Try compile it (and watch it generate errors) with: gcc -o ex11-typeerror ex11-typeerror.c
  1. #include <stdio.h>
  2.  
  3. /* Define a subroutine to divide two numbers */
  4. int divide(int x, int y){
  5.     return x/y;
  6. }
  7.  
  8. int main()
  9. {
  10.     /* trigger a type error by trying to divide 4 by a baboon */
  11.     int a = 4;
  12.     char b[] = "baboon";
  13.     int ans = divide(a, b);
  14. }
bart-imac2013:CCATP140622 bart$ gcc -o ex11-typeerror ex11-typeerror.c
ex11-typeerror.c:13:25: warning: incompatible pointer to integer conversion
      passing 'char [7]' to parameter of type 'int' [-Wint-conversion]
    int ans = divide(a, b);
                        ^
ex11-typeerror.c:4:23: note: passing argument to parameter 'y' here
int divide(int x, int y){
                      ^
1 warning generated.
bart-imac2013:CCATP140622 bart$

The C compiler was instantly able to detect that our code had a bug because the subroutine declaration explicitly stated that it needed two integers as input, and that it would return an integer as output. Each variable declaration also specified the type, so the compiler knew that divide expected two integers, but that b was not an integer, so it complained, and pushed what was as runtime error in Perl, to a compile-time error in C.

Loose Typing is not All Bad

Like I said before, this is always about pros and cons. Loosely typed language tend to suffer from more run-time errors because type errors can’t be detected up front. But, as we’ve already said, loosely typed languages tend to be easier to program quickly in, and there are other advantages too. To illustrate this point, lets re-visit our two programs for dividing numbers (ex8-divide & ex9-divide.pl)

Because C is strongly typed, we had to define the types of all the variables involved, so, our program explicitly, and ONLY divides integers. And, it will always round the answer to an integer, even when the actual result is not a whole number. We can illustrate this problem by trying to divide 10 by 3, and then 5 by 0.25:

bart-imac2013:CCATP140622 bart$ ./ex8-divide 10 3
10 divided by 3 equals 3
bart-imac2013:CCATP140622 bart$ ./ex8-divide 5 0.25
Floating point exception: 8
bart-imac2013:CCATP140622 bart$

Our loosely typed Perl script on the other hand, it has no problems switching back and forth between whole numbers and decimals as needed:

bart-imac2013:CCATP140622 bart$ perl ex9-divide.pl 10 3
10 divided by 3 equals 3.33333333333333
bart-imac2013:CCATP140622 bart$ perl ex9-divide.pl 5 0.25
5 divided by 0.25 equals 20
bart-imac2013:CCATP140622 bart$

Conclusions

All programming languages aim to solve the same problem, to allow humans to tell computers what to do, but there are many different ways to approach this problem, and different languages do things differently. We've only looked at a few of these differences, there are many many many more. Probably the biggest differentiator we've ignored is the so-called programming paradigm the language follows - is it procedural, object oriented, imperative, or something else? The key point though is that the many many choices programming language designers make all have pros and cons. There is absolutely no such thing as the one perfect programming language, the most you can argue is that a given language is the best-fit for a given task, or class of tasks (and you'll always find someone who's willing and able to argue that you're wrong!).

This is where developers joy at Apple's introduction of Swift comes in, it seems to be a language better suited to the development of desktop and mobile apps than Objective C. It's compiled, like Objective C, but unlike Objective C, it's strongly typed, forcing developers to be much more explicit when defining variables and subroutines, and hence pushing lots of run-time errors in their large code-bases back to compile-time. Objective C also had a more forgiving syntax, allowing developers to take some shortcuts, whether they intended to or not. C and Objective C both allow the curly brackets on some control statements to be omitted if there's only one line of code affected by the control statement. It's this flexibility that famously led to the Goto Fail bug. Swift has a stricter syntax, enforcing braces on all control statements, even those only controlling a single line of code.

Hopefully this very very high-level overview of the massive sea of programming languages has armed you with enough knowledge to understand at least some of the conversations you'll encounter about programming languages, and, perhaps, even whetted your appetite enough to consider learning to program!