Longest Words Followup – Java -v- Perl

Filed Under 42 (Life the Universe & Everything), Computers & Tech on November 8, 2010 at 6:40 pm

Yesterday I posted about using Perl to solve the question “what’s the longest word I can type with just half a keyboard?”. My self an Connor were joking that it would be a lot more difficult with Java, first to write the code, then to run.

I literally used the identical algorithm for the Java program, even using the same variable names, and printed the results out identically (I verified this with the Unix diff command). I also did my best to use the various built-in Java functionality and java.util classes to minimise the amount of heavy lifting my code had to do.

So, this is the resulting code:

import java.util.Vector;
import java.util.Enumeration;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;

public class dict{
    public static void main(String args[]){
        //declare the needed variables
        String longestLeft="", longestRight="";
        int minLength = 10;
        Vector<String> longLeftWords = new Vector<String>();
        Vector<String> longRightWords = new Vector<String>();
        
        try{
            //open the dictionary file
            File file = new File(args[0]);
            BufferedReader reader = null;
            reader = new BufferedReader(new FileReader(file));
        
            // loop through the file
            String line;
            while((line = reader.readLine()) != null){
                // remove the trailing new line character from the string
                line = line.replaceAll("\n|\r", "");
            
                // check for characters on the right, if not, then it's an all-left word
                if(!line.toLowerCase().matches(".*[yuiophjklnm].*")){
                    if(line.length() >= minLength){
                        longLeftWords.add(line);
                    }
                    if(line.length() > longestLeft.length()){
                        longestLeft = line;
                    }
                }
            
                //vica-versa
                if(!line.toLowerCase().matches(".*[qwertasdfgzxcvb].*")){
                    if(line.length() >= minLength){
                        longRightWords.add(line);
                    }
                    if(line.length() > longestRight.length()){
                        longestRight = line;
                    }
                }
            }
        
            // close the dictionary file
            reader.close();
        }catch(Exception e){
            System.out.println("\n\nERROR - Failed to read the dictionary file '" + args[0] + "'\n");
            e.printStackTrace();
            System.exit(1);
        }
        
        // print the results
        System.out.println("\nLong words (at least " + minLength + " letters) with the left-side of the KB only:");
        Enumeration words = longLeftWords.elements();
        while(words.hasMoreElements()){
            System.out.println("\t" + (String)words.nextElement());
        }
        System.out.println("\t\t(total: " + longLeftWords.size() + ")");
        System.out.println("\nLong words (at least " + minLength + " letters) with the right-side of the KB only:");
        words = longRightWords.elements();
        while(words.hasMoreElements()){
            System.out.println("\t" + (String)words.nextElement());
        }
        System.out.println("\t\t(total: " + longRightWords.size() + ")");
        System.out.println("\nLongest left-only word: " + longestLeft + " (" + longestLeft.length() + " letters)");
        System.out.println("\nLongest right-only word: " + longestRight + " (" + longestRight.length() + " letters)\n");
    }
}

The obvious thing is that it’s longer than yesterday’s final delux Perl solution, about twice as long in fact. The code is also much wordier, with the lines being longer than in the Perl version. There’s also a heck of a lot of ‘fluff’ in Java. In perl it literally takes two characters (<>), while in Java it takes about 6 when you include the mandatory exception handling. Getting a variable-length array is also far more cumbersome, using java.util.Vector helps a lot, but it means you have to use java.util.Enumeration to iterate through your vector for printing instead of a simple foreach loop like in Perl. Finally, notice how much clunkier the regular expressions are! Nothing as trivial as the m operator in Perl in Java

OK, so the code is longer, more fluffy, and harder to read and write, but how does it run? The simple answer, slower! About three times slower in fact:

bartmbp:Temp bart$ time ./dict.pl /usr/share/dict/words >>/dev/null

real	0m0.761s
user	0m0.275s
sys	0m0.010s
bartmbp:Temp bart$ time java dict /usr/share/dict/words >>/dev/null

real	0m2.391s
user	0m2.230s
sys	0m0.121s
bartmbp:Temp bart$

Given that Perl is a scripting language and Java is at least partially compiled, you’d expect Java to have the edge. But, when it comes to pattern matching, Perl is in its element, while Java is really rather lost. I think it’s Java’s poor RE engine that’s making the difference here.

So, there you have it, Perl really is quicker and simpler for messing with text. Who knew 😉

The Author – Bart Busschots

Site Search

Site Map

Blog Categories

Featured Tags

Creative Commons

Nov

8

Longest Words Followup – Java -v- Perl

Comments

Leave a Reply