String.replaceAll(String, String) wowes

Filed Under Databases, Java, Software Development on September 2, 2005 at 1:26 am

When you are dealing with databases one of the most common things you have to do is to escape out all single quotes in text being entered into the DB to stop PostgreSQL/MySQL/*SQL having a cow and to prevent injection attacks.

In most languages this is trivial, take Perl for example

  1. my $sqlText =~ s/'/\\'/g;

Could it be that easy in Java … not on your life!

At first I thought, well, Java being written by smart people, they must have included somewhere in the vast Java API a simple function to escape SQL, like mysql_escape(String) in PHP. I tried to find such a function with Google, no joy, so then I read through every part of the API that I could think of that is remotely related to data bases, and found nothing. If there is such a function in Java it is too well hidden and I’d really appreciate it if someone could point it out to me!

So, that means I had to rely on Java’s REs … oh dear!

String.replaceAll() would SEEM to be the answer, and indeed it is, but it’s not as straight forward as one would expect, and the documentation is quite frankly poo! The documentation doesn’t in any way explain Java’s mad behavior!

Since a single quote should not be escaped in a Java String and a backslash should it is logical to conclude that the following would work:

  1. String escaped = unescaped.replaceAll("'", "\\'");

However, it doesn’t, that just results in replacing all single quotes with a single quote and nothing else, the backslash disappears!

The API says that the first string taken is an RE and not a normal string so you have to escape anything an RE would consider special, the docs say that the replacement is just a String, nothing fancy, just a String containing your replacement. Turns out the docs are misleading because the replacement is actually an RE too so it gets interpreted twice so you need to escape everything double like you would if you were using Perl to write JS to write HTML, hence, the following is the code to escape single quotes:

  1. String escaped = unescaped.replaceAll("'", "\\\\'");

Go figure!

This stupid bug has held up a project I am working on for about half a day. I have said it before and I will say it again, Java RE support is RUBBISH!

Comments

9 Responses to “String.replaceAll(String, String) wowes”

  1. iopred on December 11th, 2007 1:06 pm

    You just saved me a lot of headaches, thank you for writing this guy!

    It seems we were going through the same thought process, except I didn’t realise tho whole double-double escape!

    Thanks again!

  2. razmaspaz on December 11th, 2007 4:55 pm

    Maybe I’m missing something here. First why can’t you just use a prepared statement and avoid this whole thing? Second Check out jakarta commons for StringEscapeUtils:
    http://commons.apache.org/lang/api/org/apache/commons/lang/StringEscapeUtils.html

  3. Bart B on December 11th, 2007 5:09 pm

    Hi Razmaspaz, no, you’re not missing anything. Prepared statements are a good way to avoid having to validate your own SQL strings. Using a third partly like the Jakarta Commons library you linked would also work for SQL.

    However, that does not change the fact that you do need regular expressions from time to time, and that Java’s RE support SUCKS. It also doesn’t change the point of this post, that if you’re getting funny results from String.replaceAll() it could well be because both the RE and the replacement need to be double-escaped. Counter-intuitive, a pain in the behind and head-wreaking, but necessary to get Java REs to work.

    Bart.

  4. Rolando Lopez on April 10th, 2009 6:32 pm

    I’m new to the JAVA world and spent a good 15 minutes trying to figure out why my single quotes were not being escaped! Thank you for your post, it saved me from more headache.

  5. a.e on January 30th, 2010 9:41 am

    Thank you so much! I spent 4 hours late into the night going nuts over this. I was dealing with around 400,000+ characters being parsed and inserted into a MEDIUMTEXT data type and didn’t recognize the problem right away, and even when I did I still couldn’t work with getInputStream().useDelimiter(); in a beneficial way.

    In any case, if it weren’t for your capability to elucidate the matter so concisely I’d still be going nuts. Thanks!

  6. skiaddict1 on February 12th, 2010 2:40 am

    Oh my god thank you so much!!! I’ve been going insane this afternoon, stupidly trusting Java’s documentation that the second String is just a plain String and not an RE. All I wanted was to transform \r and \n into “\par” (for RTF) and could I get that damn slash character to appear in the output?! grrrrrrr!

    And it’s not so much that Java’s RE support is rubbish (about which I have no opinion either way) — it’s that their documentation for replaceAll() is LIES!!!

    But, thanks to you, problem fixed, on to the next fight…

  7. Danny Rathjens on March 19th, 2010 3:27 am

    At least they added a caveat to the docs, :)
    “Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string”
    http://java.sun.com/javase/6/docs/api/java/lang/String.html#replaceAll(java.lang.String,%20java.lang.String)

  8. Jeff Weier on August 12th, 2010 6:35 pm

    So skiaddict1 … how did you transform \r and \n into “\par” (for RTF)?

    I’ve tried:
    String escaped = unescaped.replaceAll(“\n”, “\\par”);
    String escaped = unescaped.replaceAll(“\n”, “\\\\par”);

    Neither of these work. Anyone have an idea?

  9. Alberto on September 25th, 2010 11:50 am

    Thanks! Just what I was looking for, like you I didn’t put enough backslashes, haha

Leave a Reply




Before you post a comment please remember that commenting on my blog is a privilege not a right. I won't approve comments that are obscene, offensive or insulting. For more info please read this post.

Subscribe without commenting