Loading...

Regex builder

Submitted by: etlgfx
Updated: ; Posted:

One of the changes I made to my parser is how smileys are handled. Whereas the old parser had a character tokenizer which made writing the logic for parsing smileys tedious to say the least, I felt it would be worth it to attempt to build an auto regex builder script.

Basically I wanted to be able to convert this:

array(
        ':)' => 'happy',
        ':(' => 'sad',
        ':/' => 'dunno',
        ':@' => 'yelling',
...

Into:

/(\:(\(|\)|\*|/|@|D|O|P)|;\)|\>\:(\(|D|\[)|B\)|O\:\))/

Since we're dealing with a set of tiny string which, most of which with similar prefixes, my original parsing logic looked a lot like a tree, or state machine if you will. Seeing as a regular expression is basically just a fancy hard to read state machine, I figured it would be a pretty easy way to represent my smileys.

So here's what we ended up with:

  1. Order smiley array so that all prefixes are in order

  2. Create a recursive array definition of the smileys

  3. Collapse the recursive array back into a regex

The intermediate representation looks like so:

array (
  '\\:' => 
  array (
    '\\(' => 
    array (),
    '\\)' => 
    array (),
    '\\*' => 
    array (),
  ),
  ';' => 
  array (
    '\\)' => 
    array (),
  ),
...

You can see the tree structure taking place. Note the extra backslashes though, these are for escaping all PCRE special characters.

And now, for your enjoyment, the full algorithm.

/** @class RegexUtil
 */
class RegexUtil {
	/**
	 * Convert an array of strings to a regular expression by finding common
	 * prefixes
	 * 
	 * @param $array
	 *
	 * @see self::flattenRegexArray()
	 *
	 * @returns string regular expression enclosed in hash marks '#'
	 */
	public static function arrayToRegex(array $array) {
		sort($array);

		$regex = array();

		foreach ($array as $match) {

			$current =& $regex;

			for ($i = 0; $i < strlen($match); $i++) {
				$c = preg_quote($match[$i], '#');

				if (isset($current[$c])) {
					$current =& $current[$c];
				}
				else {
					$current[$c] = array();
					$current =& $current[$c];
				}
			}
		}

		return '#'. self::flattenRegexArray($regex) .'#';
	}

	/**
	 * Flatten an array tree representation of a regular expression into
	 * a regex string
	 *
	 * @param $regex array tree representation of a regular expression
	 *
	 * @see self::arrayToRegex()
	 *
	 * @returns string regular expression
	 */
	private static function flattenRegexArray(array $regex) {
		if (!$regex) {
			$re = '';
		}
		else if (count($regex) == 1) {
			reset($regex);
			$re = key($regex) . self::flattenRegexArray(current($regex));
		}
		else {
			$re = '(';
			foreach ($regex as $char => $nextRegex) {
				$re .= $char;

				if ($nextRegex)
					$re .= self::flattenRegexArray($nextRegex);

				$re .= '|';
			}
			$re = substr($re, 0, -1);
			$re .= ')';
		}

		return $re;
	}
}

Now I don't pretend that this is by any means the most efficient way of doing this, but it's something that's relatively simple and took only an hour or two to come up with.

Questions? Thoughts?

3:50a Jan 21st, '11

discount ugg boots calf height boot

ugg classic cardy

features genuine twin-face sheepskin. All boots

ugg kids

the Classic Collection feature a soft foam insole

ugg boots sale

covered with genuine sheepskin and flexible outsole designed for refreshing comfort with every step. With a reinforced heel, raw seams, and signature heel

ugg boots cheap

label are all aplenty of the

ugg boots bailey button

precision craftsmanship of discount ugg boots . The discount ugg boots is built to be durable yet comfortable, rugged yet soft, and classic yet stylish. This boots, a signature UGG silhouette, features genuine Twin face sheepskin for incredible comfort.

ugg boots knit

The discount ugg boots

ugg boots kids

is a ust have for any guy's wardrobe. This boot contains materials that are not waterproof. Experience the tension of the day

ugg boots classic tall

release when you slip your feet into a pair of cheap discount ugg boots . The very original boots were disbursed by an Australian

ugg boots classic cardy

manufacturing company named UGG. These boots were released to the public known as Australian Sheepskin Boots.

ugg classic tall

UGG also makes sheepskin slippers and other footwear.

2:56a Feb 9th, '11
7:49a Apr 1st, '11