Adaptive Indexing of Documents Using Genetic Algorithms and Relevance Feedback
Main Article Content
Abstract
Background:
In this paper, the problem of retrieving the correct documents that satisfy the user's concerns is investigated. The main aim in information retrieval systems is to retrieve all and only relevant documents.
Materials and Methods:
The genetic algorithm is utilized to adapt and change the documents indexes, depending on relevance judgments collected from users. Genetic algorithm is a powerful tool that depends on the Darwinian principles and evolution techniques to search complex spaces. The use of genetic algorithm facilitates the adaptation of documents indexes. Sampling operation is performed using roulette wheel, roulette wheel with elitism and stochastic universal sampling. The fitness function is computed using Jaccard's coefficient that measure the closeness between query and document index.
Results:
The results show that the new descriptions are more efficient and closer to the population of users that use the information retrieval system. In addition, the stochastic universal sampling gave the best results.
Conclusion:
The keywords used to describe the content of documents have statistical dependencies among them. It is difficult to accommodate these dependencies in retrieval system. Genetic algorithm can consider these dependencies during its action. According to schema theorem and building block hypothesis [10], the fittest schemata are propagated from generation to generation, where they are sampled, recombined, mutated and resampled to form strings of potentially higher worth. Another aspect genetic algorithm can offer, is the reliance on the feedback provided by users of the retrieval system to adapt documents descriptions and selections variations were experimented with roulette sampling, with elitism, and with produce new set of descriptions closer to the population of users' needs.
Three fitness proportionate selection variations are used, roulette wheel sampling, roulette wheel with elitism and stochastic universal sampling. The results have indicated the superiority of the third over the first two.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.