Nerd Quizzer
February 15th, 2007I’ve got an Apache need and I’ve hit my limit with mod_rewrite and regular expressions. I’ve tried using a Rewrite Map (thanks, Jon!) amongst other things.
Problem: I need to replace incoming URL requests that have underscores with URLs that have dashes. Since the URLs (URIs? UTIs?) were dynamically generated, the rewrite rules only need to replace the underscores with dashes. So “site/archives/2002/12/25/donkey_kong_pants_fiends” needs to be “site/archives/2002/12/25/donkey-kong-pants-fiends”. Should not require a great deal of calculus, but it appears it does.
I’ve thought about a smart 404, but then won’t that dork searchbots? Wouldn’t it be better to do a simple redirect? Why is this so hard? Is there a God?
Nerd tutorials start with a stick figure in step one and then in step two show a Da Vinci masterpiece. All in code. There’s a reason that I’m a nerd dilettante and not a nerd. I need the stick figure to evolve slowly into a slightly better stick figure.
I’ve also thought about a quickie PHP file that would do the job, but not sure that’s the best way to go.
Thoughts, oh great internet? o
-
This entry is filed under geek, tech. You can follow any responses to this entry through the RSS 2.0 feed. You may leave a response. Pinging is currently not allowed. Please read the Terms of Service before leaving a response.

February 16th, 2007 at 5:46 am
Something like:
Pattern: ([a-z]*)_
Replacement: $1-
Not sure how you’d get that to “repeat” for multiple word/underscore combinations, since that would only match the first instance of “word_”.
But hopefully it’s a start…
February 16th, 2007 at 8:14 am
maerk, thanks for that. Isn’t there a “N” variable that can be used? As in “do this N times up to 10″?
I just re-read this post. Should not post when delirious.
February 16th, 2007 at 9:30 am
I think I have another thing to try. I’ll e-mail you.
Turning in my geek card if this doesn’t work.
maerk’s thing is kind of what I was going for yesterday and using the “chain” [C] flag to make it repeat.
grumble.
Need a diet Coke now….
February 16th, 2007 at 9:44 am
You could try something like this…
In .htaccess:
RewriteEngine on
RewriteMap dash-map prg:/path/to/dasher.pl
RewriteRule (_*)$ ${dash-map:$1}
Create dasher.pl as follows and put change the RewriteMap line above as appropriate:
#!/usr/bin/perl
# disable buffered I/O which would lead
# to deadloops for the Apache server
$| = 1;
# read URLs one per line from stdin and
# generate substitution URL on stdout
while () {
tr/_/\-/;
print $_;
}
This should work — I tested the Perl script and know that it works. The one part I’m sketchy on is the RewriteRule line. I started with the example from the mod_rewrite instructions on the Apache Web site and customized a bit for your situation.
Don’t forget to change the #! line to point to the correct location for Perl. The location above is correct on most systems.
February 16th, 2007 at 9:47 am
etherdust,
It appears you want my URLs to have underscores? The final print $_; seems to indicate that. Or is _ what you are calling the variable? Very nutty, this mod_rewrite world we live in.
February 16th, 2007 at 10:17 am
etherdust…
I was trying to do something like that with RewriteMap yesterday, but he got server 500 errors. “RewriteMap not allowed here”
jon… You’d have to put the dasher.pl in the root dir (like the map.txt from yesterday) and then put your prg:/path/to/your/html/root/dasher.pl on line 3 of the .htaccess file.
his way *should* work, too, if you can do RewriteMap.
February 16th, 2007 at 10:29 am
Rewrite Map should work now.
UPDATE: it does not work. On the phone with Liquidweb.
UPDATE: RewriteMap should work, but perhaps not in the context it is being used… Senior Tech is in this afternoon. Why does mod_rewrite strike such fear in the great and mighty nerd expanse of the internet?
February 16th, 2007 at 1:40 pm
Hey John,
If this is just a temporary thing to fix the urls until google reindexes I have a sloppy but easy solution.
You could reroute all archived articles (ie starting with /site/archive or whatever) to a single php page using mod_rewrite (which as you know is easy). There you could do a string replace with php (which is way more straightforward) and spit the updated url into a js redirect that would go to the right page.
Super sloppy, but it would work, and it is way easier than spending the next week in apacheland.
February 16th, 2007 at 2:28 pm
oops.
As soon as I left that comment I noticed that you hint at a php script solution at the end of the post.
Friday Afternoon != Reading Comprehension.
February 16th, 2007 at 3:52 pm
I still don’t understand why this mess, placed in a .htaccess file, which lives in the http://www.blurbomat.com/archives/ directory, wouldn’t work:
RewriteEngine On
RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
RewriteRule ([^_]*)_(.*) $1-$2 [NC,C]
RewriteRule ([^_]*)_(.*) $1-$2 [NC,L]
I know, it’s messy and maybe I’m missing something critical. Perhaps someone could enlighten, fix it, or make it prettier?
February 16th, 2007 at 5:27 pm
Windershins, incense and blood. Really, it’s the only way.
February 16th, 2007 at 8:57 pm
Nothing to do with the above. Just wanted to share this information:
April 12th is NATIONAL LICORICE DAY: Gear up!
http://tinyurl.com/dcctb
Sorry for the interruption. — Hope your Apache Need gets solved.
February 16th, 2007 at 9:46 pm
Jon,
I can’t test this myself since I am also on LiquidWeb and wouldn’t you know it, RewriteMap is disabled on the box I’m hosted on too.
However, I wanted to point out that while etherdust’s solution _should_ work, it won’t if you just copy and paste the code because WordPress escaped some of the characters in the Perl script.
The while condition should read:
while([])
Except the square brackets should be replaced by angle brackets. Not sure if you are a Perl guy, but that tells the script to read from STDIN. If you just copy and paste, it’ll just loop forever, for obvious reasons.
Also you asked about $_, which is just the Perl syntax for “the last line that was read in” (from STDIN, in this case). So it is just printing out the post-translated URL.
February 17th, 2007 at 3:39 pm
Jon Deal, Askimet spammed your comments. I left the most recent one. That’s exactly what I did. I had to build real files though. Dynamic wouldn’t cut it for whatever reasons.