ng" /> (6)

Here, the height of the tree represents the total time needed for performing calculation. As height of the tree represents the total number of steps required for calculation and each step is performed in a unit time. Hence we can say that the time complexity for loop 3 in part 2 of the proposed algorithm is O(log2m) using O(m) processors. Here m is length of the input word. Also it is easy to see that the remaining steps in part 2 will take unit time. Hence making the overall search time complexity of the algorithm in part 2 as O(log2m).

In case of loop 1, the operation performed in the inner loop will only take unit time. Which will make the overall time complexity of the loop 1 as O(1) using O(MV) processors [22] . Here M is length of the longest word present in the language set and V is total number of alphabets in the language set.

Like loop 3, loop 2 in part 1 of the algorithm will also take O(log2m) time using O(m) processors [22] . Now considering the time complexity of the loop 2 in part 1 and acknowledging the fact that other operations that are outside the loop 2 in part 1 will take only unit time. We can state that the overall time complexity for building the datastructure in part 1 is also O(log2m).

Now, for the deletion algorithm in part 3. The loop 4 will have the time complexity of O(log2m) using O(m) processors, as we have shown above for similar cases. The remaining steps of the part 3 will be executed in O(1) time. Hence for this part also the overall time complexity will be O(log2m).

Hence, we can state that the proposed Prime Box algorithm can be solved in O(log2m) time using at least O(m/log2m) processors [22] . Now it is also clear that by using this parallel approach we have improved the time complexity of the algorithm to O(log2m). Which in the case of sequential approach would have been O(m).

5. Proving Correctness of the Search Algorithm

We will use mathematical Induction to prove the correctness of this search algorithm. We have to start from loop 3 in part 2 of the proposed algorithm. First we have to prove the precondition, that is searching a word of unit length. When the length of the input word is one. The for loop will run for only one time and then terminate. In the run only one step will be evaluated in unit time. In which two operands the Unique_Prime_Index with value one and the prime number from the prime box will be multiplied with each other to get the unique array index for the word. Hence the overall time taken will be of the order O(1) for the loop 3 in this case. Also each step in the algorithm that are outside loop 3 will be evaluated in unit time. Hence we can state that we can search a word of unit length in O(1) time. This can again be verified, as we know that the overall search time complexity is O(log2m). The value of operands here are two. So the overall time complexity for this precondition will be O(log22) or O(1).

Now we will prove that the above search algorithm will also work for some word of length (K + 1). As the word has length of (K + 1). So the number of leaf nodes in the conceptual balanced binary tree will be (K + 1)/2, because multiplication is binary associative operation. Now using expression (2) and (3) the expression (1) can be modified and rewritten as,

or,

or,

or, (7)

Using expression (1) and (7), We conclude that,

or, (8)

Now taking log of both sides of the expression (8),

or,

or,

or, (9)

The for loop in loop 3 of part 2 will finally terminate after iterating for (K + 1) times, as (K + 1) is the length of the given input. Now, concluding from expression (9), we can say that time complexity in this case is log2(K + 1). We also know that here length of the word m = (K + 1). So again we have proven that the time complexity for the loop 3 in part 2 is O(log2m). Further it is easy to see that the remaining steps of the search algorithm will be finished in O(1) time. Resulting in the overall time complexity of the search algorithm as O(log2m). So we can see that a word of (K + 1) length can be searched in log2(m) time.

Hence by mathematical induction it is proved that for any given word of length m, we can search it in a time of O(log2m).

6. Applying Parallel Prime Box Algorithm

The Proposed algorithm has three parts. First is building the data structure, where insertion of words takes place, second is the search algorithm and third part is associated with the deletion of word from the datastructure. We can also update any word from the data structure if required. Actually this datastructure updating requires two functions working consecutively. First doing the deletion and then inserting the new word. We will see the working of the algorithm step by step starting with the insertion of the word “an”.

Before we start first in loop 1 the Prime box data structure is created that is a Two dimensional m x 26 array of unique prime numbers stored in it.

6.1. Building the Datastructure: Inserting the Word “an”

Now we enter loop 2. Here a prime number for each alphabet of the word is retrieved from the prime box datastructure and multiplied together to obtain the Unique_Prime_Index value. Then at that index in Location array we mark its entry true.

Here, using the prime box datastructure mentioned in Table 1. We will get the prime values for “a” as 2 and for “n” as 173. Then these values will be multiplied to get a Unique_Prime_Index value for the word “an” as 346. Then we set its entry true by marking Location [346] = 1 (See Figure 3).

6.2. Search Algorithm: Searching the Word “an”

We enter loop 3. Here a prime number for each alphabet of the word is retrieved from the prime box datastructure and multiplied together to obtain the Unique_Prime_Index value. Then we check at that index in Location array whether it is true or not. If it is true then we return that word is present otherwise we return false.

Here, using the prime box datastructure mentioned in Table 1. We will get the prime values for “a” as 2 and for “n” as 173. Then these values will be multiplied to get a Unique_Prime_Index value for the word “an” as 346. Then we check the entry at Location [346]. If it is “1” then return that the word is present otherwise return false (See Figure 4).

Figure 3. Building the datastructure: Inserting the word “an”.

Figure 4. Search Algorithm: Searching the word “an”.

6.3. Deletion Algorithm: Deleting the Word “an”

We enter loop 4. Here a prime number for each alphabet of the word is retrieved from the prime box datastructure and multiplied together to obtain the Unique_Prime_Index value. Then at that index in Location array we mark its entry False.

Here, using the prime box datastructure mentioned in Table 1. We will get the prime values for “a” as 2 and for “n” as 173. Then these values will be multiplied to get a Unique_Prime_Index value for the word “an” as 346. Then we set its entry false by marking Location [346] = 0 (See Figure 5).

7. Results

We have implemented the parallel approaches using C# and executed the code to get the data for different words with varying word length from the English dictionary. The column 'word' has the searched words and another column 'search time' has the respective listing of the time taken by the algorithm to search those words. Here we have not mentioned any fixed database size or total number of words in the dictionary, as our novel approach is independent of the total size.

With our current implementation in C#. It is only possible to lookup a word till word size of four. As increasing the size further will create the index values that are outside the current range of the arrays defined in C#.

We can see the data obtained after executing the code for Prime Box algorithm in Table 2 (See Table 2). The plot for the search time is logarithmic graph, with another curved line for C*log2m as its upper bound (See Figure 6). This confirms that the time complexity for the proposed parallel algorithm is O(log2m).

Eventually after analyzing the graph for the insertion time (See Figure 7) and deletion time (See Figure 8) of the algorithm. It is clear that the graph for C*log2(m) is the upper bound over both the graphs of insertion time and deletion time. This justifies their O(log2m) time complexity.

Further we have implemented the Trie search algorithm in C#. The search time values for Trie algorithm together with the search time values for Prime Box algorithm are presented in Table 3 (See Table 3). We have considered Trie algorithm for comparison with the proposed algorithm as Trie algorithm is also free from the total number of words present in the dictionary and hence is also suitable for dynamic dictionary search. In the plots for both the algorithms (See Figure 9). The Trie algorithm has a linear graph with positive slope justifying its O(m) time complexity and Prime Box algorithm has a logarithmic curve justifying its O(log2m) time complexity. It is clearly visible from the graphs in Figure 9 (See Figure 9) that the proposed Prime Box algorithm is more efficient than Trie search algorithm because of its smaller search time.

Figure 5. Deletion Algorithm: Deleting the word “an”.

Figure 6. Graph for search time of prime box Algorithm: Search time is O(lg m).

Figure 7. Graph for insertion time of prime box Algorithm: Insertion time is O (lg m).

Figure 8. Graph for deletion time of prime box Algorithm: Deletion time is O (lg m).

Figure 9. Comparing search time graphs for TRIE and prime box parallel search Algorithm.

Table 2. Search times for both sequential and parallel approaches.

Table 3. Comparing search time of TRIE with prime box algorithm.

8. Conclusions

Analyzing data and graphs provided, it is inferred that it is possible to lookup a word in a dynamic dictionary in O(log2m) time using the prime box parallel search algorithm. Unlike any other algorithm, this approach is independent of the total number of words present in the dictionary and hence well suits the need for searching a dynamic dictionary with increasing size.

Our main emphasis while designing this algorithm was to minimize the time complexity. Hence as future work, still there is scope for further improvement in the algorithm in terms of its space complexity.

Cite this paper
Pandey, A. , Wolde-Gabriel, B. and Jarso, E. (2016) Prime Box Parallel Search Algorithm: Searching Dynamic Dictionary in O(lg m) Time. Journal of Computer and Communications, 4, 134-145. doi: 10.4236/jcc.2016.44012.
References

[1]   Cormen, T.H., Leiserson, C.E., Rivest, R.L. and Stein, C. (2001) Introduction to Algorithms. 2nd Edition, The MIT Press, London.

[2]   Maass, M.G. (2006) Average-Case Analysis of Approximate Trie Search. Algorithmica, 46, 469-491.
http://dx.doi.org/10.1007/s00453-006-0126-4

[3]   Fredkin, E. (1960) Trie Memory. Communications of the ACM, 3, 490-499.
http://dx.doi.org/10.1145/367390.367400

[4]   Knuth, D.E. (1973) The Art of Computer Programming, Vol. III: Sorting and Searching.

[5]   Aoe, J.I. and Fujikawa, M. (1987) An Efficient Representation of Hierarchical Semantic Primitives—An Aid to Machine Translation Systems. Proceedings of Second International Conference on Supercomputing, 361-370.

[6]   Peterson, J.L. (1980) Computer Programs for Spelling Correction: An Experiment in Program Design. Springer Berlin Heidelberg, 1-129.
http://dx.doi.org/10.1007/3-540-10259-0_1

[7]   Aoe, J.I. (1989) An Efficient Implementation of Static String Pattern Matching Machines. IEEE Transactions on Software Engineering, 15, 1010-1016. http://dx.doi.org/10.1109/32.31357

[8]   Aoe, J., Yamamoto, Y. and Shimada, R. (1984) A Method for Improving String Pattern Matching Machines. IEEE Transactions on Software Engineering, 10, 116-120.
http://dx.doi.org/10.1109/TSE.1984.5010205

[9]   Stefanakis, G. (2009) Design and Implementation of a Range Trie for Address Lookup. Doctoral Dissertation, TU Delft, Delft University of Technology, Delft.
www.repository.tudelft.nl/assets/uuid:a1490c8b-35a3-4f41-a433-0e0d0899c833/thesis.pdf

[10]   Shang, H. (1995) Trie Methods for Text and Spatial Data on Secondary Storage.
www.cs.mcgill.ca/~tim/cv/theses/shang.ps.gz

[11]   Zhao, X. (2000) Trie Methods for Structured Data on Secondary Storage.
www.cs.mcgill.ca/~tim/cv/theses/zhao.ps.gz

[12]   De La Briandais, R. (1959) File Searching Using Variable Length Keys. Proceedings of Western Joint Computer Conference, ACM. 295-298.
http://dx.doi.org/10.1145/1457838.1457895

[13]   Aoe, J.I., Morimoto, K. and Sato, T. (1992) An Efficient Implementation of Trie Structures. Software: Practice and Experience, 22, 695-721.
http://dx.doi.org/10.1002/spe.4380220902

[14]   Comer, D.E. and Shen, V.Y. (1979) Hash-Binary Search: A Fast Technique for Searching an English Spelling Dictionary.

[15]   Pagh, R. (2001) Low Redundancy in Static Dictionaries with Constant Query Time. SIAM Journal on Computing, 31, 353-363.
http://dx.doi.org/10.1137/S0097539700369909

[16]   Belazzougui, D., Boldi, P., Pagh, R. and Vigna, S. (2009) Monotone Minimal Perfect Hashing: Searching a Sorted Table with O (1) Accesses. Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms, 785-794.
http://dx.doi.org/10.1137/1.9781611973068.86

[17]   Belazzougui, D., Boldi, P., Pagh, R. and Vigna, S. (2011) Theory and Practice of Monotone Minimal Perfect Hashing. Journal of Experimental Algorithmics (JEA), 16, 3-2.
http://dx.doi.org/10.1145/1963190.2025378

[18]   Botelho, F.C., Pagh, R. and Ziviani, N. (2007) Simple and Space-Efficient Minimal Perfect Hash Functions, Algorithms and Data Structures. Springer Berlin Heidelberg, 139-150.

[19]   Amir, A., Farach, M., Galil, Z., Giancarlo, R. and Park, K. (1994) Dynamic Dictionary Matching. Journal of Computer and System Sciences, 49, 208-222.
http://dx.doi.org/10.1016/S0022-0000(05)80047-9

[20]   Khancome, C. and Boonjing, V. (2014) A New Linear-Time Dynamic Dictionary Matching Algorithm. Computing and Informatics, 32, 897-923.

[21]   Chatterjee, S. and Prins, J. (2005) COMP 203: Parallel and Distributed Computing. PRAM Algorithms. Course Notes.
www.cs.yale.edu/homes/arvind/cs424/readings/pram.pdf

[22]   Xavier, C. and Iyengar, S.S. (1998) Introduction to Parallel Algorithms, Vol. 1. John Wiley & Sons, Hoboken.

 
 
Top