Nerd Quizzer

February 15th, 2007

I’ve got an Apache need and I’ve hit my limit with mod_rewrite and regular expressions. I’ve tried using a Rewrite Map (thanks, Jon!) amongst other things.

Problem: I need to replace incoming URL requests that have underscores with URLs that have dashes. Since the URLs (URIs? UTIs?) were dynamically generated, the rewrite rules only need to replace the underscores with dashes. So “site/archives/2002/12/25/donkey_kong_pants_fiends” needs to be “site/archives/2002/12/25/donkey-kong-pants-fiends”. Should not require a great deal of calculus, but it appears it does.

I’ve thought about a smart 404, but then won’t that dork searchbots? Wouldn’t it be better to do a simple redirect? Why is this so hard? Is there a God?

Nerd tutorials start with a stick figure in step one and then in step two show a Da Vinci masterpiece. All in code. There’s a reason that I’m a nerd dilettante and not a nerd. I need the stick figure to evolve slowly into a slightly better stick figure.

I’ve also thought about a quickie PHP file that would do the job, but not sure that’s the best way to go.

Thoughts, oh great internet? o


This entry is filed under geek, tech. You can follow any responses to this entry through the RSS 2.0 feed. You may leave a response. Pinging is currently not allowed. Please read the Terms of Service before leaving a response.

14 Responses to “Nerd Quizzer”

  1. 1
    maerk Says:

    Something like:

    Pattern: ([a-z]*)_
    Replacement: $1-

    Not sure how you’d get that to “repeat” for multiple word/underscore combinations, since that would only match the first instance of “word_”.

    But hopefully it’s a start…

  2. 2
    blurb Says:

    maerk, thanks for that. Isn’t there a “N” variable that can be used? As in “do this N times up to 10″?

    I just re-read this post. Should not post when delirious.

  3. 3
    jon deal Says:

    I think I have another thing to try. I’ll e-mail you.

    Turning in my geek card if this doesn’t work.

    maerk’s thing is kind of what I was going for yesterday and using the “chain” [C] flag to make it repeat.

    grumble.

    Need a diet Coke now….

  4. 4
    etherdust Says:

    You could try something like this…

    In .htaccess:
    RewriteEngine on
    RewriteMap dash-map prg:/path/to/dasher.pl
    RewriteRule (_*)$ ${dash-map:$1}

    Create dasher.pl as follows and put change the RewriteMap line above as appropriate:

    #!/usr/bin/perl

    # disable buffered I/O which would lead
    # to deadloops for the Apache server
    $| = 1;

    # read URLs one per line from stdin and
    # generate substitution URL on stdout
    while () {
    tr/_/\-/;
    print $_;
    }

    This should work — I tested the Perl script and know that it works. The one part I’m sketchy on is the RewriteRule line. I started with the example from the mod_rewrite instructions on the Apache Web site and customized a bit for your situation.

    Don’t forget to change the #! line to point to the correct location for Perl. The location above is correct on most systems.

  5. 5
    blurb Says:

    etherdust,
    It appears you want my URLs to have underscores? The final print $_; seems to indicate that. Or is _ what you are calling the variable? Very nutty, this mod_rewrite world we live in.

  6. 6
    jon deal Says:

    etherdust…

    I was trying to do something like that with RewriteMap yesterday, but he got server 500 errors. “RewriteMap not allowed here”

    jon… You’d have to put the dasher.pl in the root dir (like the map.txt from yesterday) and then put your prg:/path/to/your/html/root/dasher.pl on line 3 of the .htaccess file.

    his way *should* work, too, if you can do RewriteMap.

  7. 7
    blurb Says:

    Rewrite Map should work now.

    UPDATE: it does not work. On the phone with Liquidweb.

    UPDATE: RewriteMap should work, but perhaps not in the context it is being used… Senior Tech is in this afternoon. Why does mod_rewrite strike such fear in the great and mighty nerd expanse of the internet?

  8. 8
    mikeswimm Says:

    Hey John,

    If this is just a temporary thing to fix the urls until google reindexes I have a sloppy but easy solution.

    You could reroute all archived articles (ie starting with /site/archive or whatever) to a single php page using mod_rewrite (which as you know is easy). There you could do a string replace with php (which is way more straightforward) and spit the updated url into a js redirect that would go to the right page.

    Super sloppy, but it would work, and it is way easier than spending the next week in apacheland.

  9. 9
    mikeswimm Says:

    oops.

    As soon as I left that comment I noticed that you hint at a php script solution at the end of the post.

    Friday Afternoon != Reading Comprehension.

  10. 10
    jon deal Says:

    I still don’t understand why this mess, placed in a .htaccess file, which lives in the http://www.blurbomat.com/archives/ directory, wouldn’t work:

    RewriteEngine On
    RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
    RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
    RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
    RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
    RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
    RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
    RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
    RewriteRule ([^_]*)_(.*) $1-$2 [NC,L]

    I know, it’s messy and maybe I’m missing something critical. Perhaps someone could enlighten, fix it, or make it prettier?

  11. 11
    Coelecanth Says:

    Windershins, incense and blood. Really, it’s the only way.

  12. 12
    Ortizzle Says:

    Nothing to do with the above. Just wanted to share this information:

    April 12th is NATIONAL LICORICE DAY: Gear up!
    http://tinyurl.com/dcctb

    Sorry for the interruption. — Hope your Apache Need gets solved.

  13. 13
    Chris E Says:

    Jon,

    I can’t test this myself since I am also on LiquidWeb and wouldn’t you know it, RewriteMap is disabled on the box I’m hosted on too.

    However, I wanted to point out that while etherdust’s solution _should_ work, it won’t if you just copy and paste the code because WordPress escaped some of the characters in the Perl script.

    The while condition should read:

    while([])

    Except the square brackets should be replaced by angle brackets. Not sure if you are a Perl guy, but that tells the script to read from STDIN. If you just copy and paste, it’ll just loop forever, for obvious reasons.

    Also you asked about $_, which is just the Perl syntax for “the last line that was read in” (from STDIN, in this case). So it is just printing out the post-translated URL.

  14. 14
    blurb Says:

    Jon Deal, Askimet spammed your comments. I left the most recent one. That’s exactly what I did. I had to build real files though. Dynamic wouldn’t cut it for whatever reasons.

Leave a Reply

You must be logged in to post a comment.



Copyright 2001-2008 Armstrong Media, LLC. All rights reserved. Terms of Service. This is the paranoid section of the site.