Sunday, October 13, 2013

Anagram: Erewhon is nowhere

Requirements: PHP 5.4+ and Apache web server - both installed and running

In this article, we will learn how to write a code to find anagrams. As a quick introduction, anagrams are the results of a specific word's reordering. For instance, "erewhon" and "nowhere" are anagrams. The original word or phrase is known as the subject of the anagram[1].
 After showing their basic implementation in PHP, we'll take a look at their real-world effects and implement some of them.
To start, create a folder in the root directory called "anagram" (you can call it whatever you want). This is where all the files created throughout this tutorial must be saved.
Then inside that folder, create an input text file called "test.txt" (you can call it whatever you want). This is the test containing the anagrams to evaluate and sort. For this tutorial, we'll be using the following sample lines:

pot,pot,opt,godi,dogi,cat,tac
pro,tro,franc,ranfc,rop,water,treaw
root,toor,tro,ort,lamp,foo,malp
kindle,xbox,playstation,gamecube,meucagbe,boxx,ledink,linked,stationplay
book,cook,hook,took,obok,ocok,kooh,off

Insert those lines in the text file "test.txt". Our program will traverse that file per line, and for each line, it will group anagrams together. The file in the program in charge of that job will be called "anagram.class.php" and will be instantiated in the file "index.php" to process the file "test.txt". For readability performance, we use the same color marker as geany.
------------------------------------------------------------------------------------------------------------------------------
anagram.class.php

<?php 
 namespace Anagram
 {
   class Program 
   {
     public function main()
     {
       try
       {
          $file = 'test.txt';
          $lines = @file($file, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
          foreach ($lines as $line)
            $this->cluster($line);
       }
       catch(Exception $e){
         echo "Couldn't read the text file";
         echo "Caught exception: ", $e->getMessage(),"\n";
       }
     
     }
     
     public function cluster($line)
     {
      $regexs = array();   //- to store the regular expressions(regex) or matching patterns of the subjects
      $a_group = array();  //- to store the set of anagrams 
      
      echo "Initial input: ".$line."<br>";
      $words = explode(",", $line);
      usort($words, array('Anagram\Program','cmp')); //- we 1st sort the input array by increasing string length to facilitate the search
      
      //- Initial values of the regex set and the anagram set
      $pattern = $words[0];  
      $regexs[$pattern] = $this -> regExp($pattern); 
      $a_group[$pattern] = $pattern;
      
        for ($i = 1; $i<count($words); $i++) {
       
            $found = 0;
           
            foreach ($regexs as $key =>$regex){   
                               
 //- we simply look through the existing set of regular expressions to see if the current value match one of those. If yes, we add it to the corresponding anagram set
                  if (preg_match($regex, $words[$i]) && (strlen($key) == strlen($words[$i])))
                        { $a_group[$key] .= ", ".$words[$i]; $found = 1; }
            }   
           
//- After the above traversal, if there is no match with existing set of regular expressions (thus with the existing set of anagrams), then we add the current value's regex to the set of regexs and to the set of anagrams
             if($found == 0) {
                  $pattern = $words[$i];
                  $tmp = $this -> regExp($words[$i]); 
                  if(!in_array($tmp, $regexs)) $regexs[$pattern] = $tmp;
                  $a_group[$pattern] =  $pattern;
             }

        }          
        //- Finally, we display the results       
       foreach ($a_group as $k => $anagram_set)
          echo "<u>Anagrams of ".$k."</u>: ".$anagram_set."<br>";
        echo "<br>";
     }
     
     private static function cmp($a, $b){
       return strlen($b)- strlen($a);
     }
     
     private static function regExp($string){
       $regchars = preg_split('//', $string, -1, PREG_SPLIT_NO_EMPTY); //- we split the pattern to find its corresponding matching patterns
       $regex = "/";
       foreach($regchars as $char) $regex .= "(?=.*".$char.")";  
       return $regex."/";
     }
   }
 }

?>

index.php
<?php 
include
"anagram.class.php";
use Anagram\Program as Groupname;

//-  after including the main piece of cake (anagram.class.php), we instantiate it as the variable "test" and called its main function to do the thing
$test = new Groupname;
$test ->main();
?>
-----------------------------------------------------------------------------------------------------------------------------------

View the results|Download source

At the first glance, some might think that anagram lie in the domain of intelligence testing, with no real-world application. Well, this is not true. Anagram applications are commonly found in games and security.
- In games, we have name anagrammers which are applications that find the anagrammed meaning of people's name. For example:  For "Tom Cruise", we have "So I'm cuter". Also, anagrams are also applied in the famous Scrabble.
- In security, they are used as permutation ciphers. Basically, the input text to protect is splitted into segments of size e (a random number) and the letters within each segment are permuted according to that number.
 Other applications of anagrams include : Establishment of priority, Puzzles[1], Internet Anagram Server to find character's names in movies/music/novels and synonyms for authors or novels' titles [2].

 In our next article, we will be implementing the anagram server and a permutation cipher, and furthermore, explain how anagrams are being incorporated in these systems.

References:
1) http://en.wikipedia.org/wiki/Anagram
2) http://www.wordsmith.org/anagram/literature.html
3) http://teachnet.com/lessonplans/language-arts/anagrams-beyond-computer-solutions/


Don't forget to comment and share this article, add me on G+ and subscribe to the blog's RSS feeds. See you in the next post.

No comments:

Post a Comment