Summary:
The technological advancements in computer networks and the substantial reduction of their production costs have caused a massive explosion of digitally stored information.
In particular, textual information is becoming increasingly available in electronic form.
Finding text documents dealing with a certain topic is not a simple task. Users need tools to sift through non-relevant information and retrieve only pieces of information relevant to their needs.
The traditional methods of information retrieval (IR) based on search term frequency have somehow reached their limitations, and novel ranking methods based on hyperlink information are not applicable to unlinked documents.
The retrieval of documents based on the positions of search terms in a document has the potential of yielding improvements, because other terms in the environment where a search term appears (i.e. the neighborhood) are considered. That is to say, the grammatical type, position and frequency of other words help to clarify and specify the meaning of a given search term.
However, the required additional analysis task makes position-based methods slower than methods based on term frequency and requires more storage to save the positions of terms. These drawbacks directly affect the performance of the most user critical phase of the retrieval process, namely query evaluation time, which explains the scarce use of positional information in contemporary retrieval systems.
This thesis explores the possibility of extending traditional information retrieval systems with positional information in an efficient manner that permits us to optimize the retrieval performance by handling term positions at query evaluation time.
To achieve this task, several abstract representation of term positions to efficiently store and operate on term positional data are investigated. In the Gauss model, descriptive statistics methods are used to estimate term positional information, because they minimize outliers and irregularities in the data. The Fourier model is based on Fourier series to represent positional information. In the Hilbert model, functional analysis methods are used to provide reliable term position estimations and simple mathematical operators to handle positional data.
The proposed models are experimentally evaluated using standard resources of the IR research community (Text Retrieval Conference). All experiments demonstrate that the use of positional information can enhance the quality of search results. The suggested models outperform state-of-the-art retrieval utilities.
The term position models open new possibilities to analyze and handle textual data. For instance, document clustering and compression of positional data based on these models could be interesting topics to be considered in future research.
Bibliographie / References
- P. Galeas, R. Kretschmer, and B. Freisleben. Information retrieval via truncated Hilbert space expansions. In Proceedings of the 9 th IEEE International Conference on Computer and Information Technology, page (accepted for publication). IEEE Press, 2010.
- J. C. Reynar. An automatic method of finding topic boundaries. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics, pages 331–333, Morristown, NJ, USA, 1994. Association for Computational Linguistics.
- M. A. Hearst. Multi-paragraph segmentation of expository text. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics, pages 9–16, Morristown, NJ, USA, 1994. Association for Computational Linguistics.
- Yaari. Segmentation of expository texts by hierarchical agglomerative clustering, 1997.
- L. A. F. Park, K. Ramamohanarao, and M. Palaniswami. Fourier domain scoring: A novel document ranking method. IEEE Trans. on Knowl. and Data Eng., 16(5):529– 539, 2004.
- M. Kaszkiel and J. Zobel. Effective ranking with arbitrary passages. J. Am. Soc. Inf. Sci. Technol., 52(4):344–364, 2001.
- K. Richmond, A. Smith, and E. Amitay. Detecting subject boundaries within text: a language independent statistical approach. In 2nd Conference on Empirical Methods in Natural Language Processing, pages 47–54, Providence, Rhode Island, USA, Aug 1997.
- R. Wilkinson. Effective retrieval of structured documents. In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, 3-6 July 1994 (Special Issue of the SIGIR Forum), pages 311–317. ACM/Springer, 1994.
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1-7):107–117, 1998.
- G. Salton, J. Allan, and C. Buckley. Approaches to passage retrieval in full text information systems. In SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages 49–58, New York, NY, USA, 1993. ACM.
- S. W. Golomb. Run-length encodings. IEEE Transactions on Information Theory, 12(3):399–401, 1966.
- G. V. Cormack, C. L. A. Clarke, C. R. Palmer, and S. S. L. To. Passage-based query refinement (multitext experiments for trec-6). Inf. Process. Manage., 36(1):133–153, 2000.
- M. Fernandez, ´ E. Villemonte de La Clergerie, and M. Vilares. Knowledge acquisi- tion through error-mining. In Proc. of International Conference on Recent Advances in Natural Language Processing (RANLP'07), pages 220–229, Borovets, Bulgaria, 2007.
- M. A. Razek, C. Frasson, and M. Kaltenbach. Context-based information agent for supporting intelligent distance learning environment. In Proc. of the Twelfth Interna- tional World Wide Web Conference, WWW03, page 968, Budapest, Hungary, 2003. Springer-Verlag.
- Y. Rasolofo and J. Savoy. Term proximity scoring for keyword-based retrieval sys- tems. In Advances in Information Retrieval, volume 2633/2003 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2003.
- A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst., 14(4):349–379, 1996.
- Dumais. Latent semantic indexing (lsi): Trec-3 report. In Proceedings of the Text Retrieval Conference (TREC-3), pages 219–230, 1995.
- J.-T. Sun, Z. Chen, H.-J. Zeng, Y.-C. Lu, C.-Y. Shi, and W.-Y. Ma. Supervised latent semantic indexing for document categorization. In ICDM '04: Proceedings of the Fourth IEEE International Conference on Data Mining, pages 535–538, Washington, DC, USA, 2004. IEEE Computer Society.
- G. Navarro and R. Baeza-Yates. Proximal nodes: a model to query document databases by content and structure. ACM Trans. Inf. Syst., 15(4):400–435, 1997.
- J. B. Lovins. Development of a stemming algorithm. Technical report, Massachusetts Institute Of Technology Cambridge Electronic Systems Lab, June 1968.
- L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, A. Solan, G. Wolfman, and E. Ruppin. Placing search in context: the concept revisited. ACM Transactions on Information Systems, 20(1):116–131, January 2002.
- S. Tellex, B. Katz, J. Lin, A. Fernandes, and G. Marton. Quantitative evaluation of passage retrieval algorithms for question answering. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 41–47, New York, NY, USA, 2003. ACM.
- J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, September 1999.
- C. Buckley, G. Salton, and J. Allan. The effect of adding relevance information in a relevance feedback environment. In 17th Annual International ACM-SIGIR Confer- ence on Research and Development in Information Retrieval, pages 292–300, Lon- don, July 1994.
- T. Tao and C. Zhai. An exploration of proximity measures in information retrieval. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 295–302, New York, NY, USA, 2007. ACM.
- G. Amati and C. J. V. Rijsbergen. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst., 20(4):357– 389, 2002.
- M. Hearst and G. Pedersen. Reexamining the cluster hypothesis: scatter/gather on retrieval results. In A. Press, editor, Proceedings of International ACM SIGIR Con- ference on Research and Development in IR, pages 76–84, New York, 1996.
- R. Papka and J. Allan. Why bigger windows are better than smaller ones. Technical report, Department of Computer Science, University of Massachusetts, 1997.
- J. P. Callan. Passage-level evidence in document retrieval. In SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and develop- ment in information retrieval, pages 302–310, New York, NY, USA, 1994. Springer- Verlag New York, Inc.
- A. Bookstein, S. Klein, and T. Raita. Clumping properties of content-bearing words. Journal of the American Society for Information Science, 49(2):102–114, 1998.
- T. Chunqiang, S. Dwarkadas, and Z. Xu. On scaling latent semantic indexing for large peer-to-peer systems. In SIGIR '04: Proceedings of the 27th Annual int. con- ference on research and development in information retrieval, pages 112–121, New York, NY, USA, 2004. ACM Press.
- C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using smart: TREC-3. In Overview of the 3rd Text Retrieval Conference, pages 69–80. NIST Special Publication, 1995.
- R. Krovetz. Viewing morphology as an inference process. In SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and develop- ment in information retrieval, pages 191–202, New York, NY, USA, 1993. ACM.
- E. Efthimiadis. Interactive query expansion and relevance feedback for document retrieval systems. PhD thesis, City University, London UK, 1992.
- W. B. Croft and J. Xu. Corpus-specific stemming using word form co-occurrence. In Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 147–159, 1995. BIBLIOGRAPHY
- G. O'Brien. Information management tools for updating an svd-encoded indexing scheme. Technical report, University of Tennessee, Knoxville, TN, USA, 1994.
- E. Efthimiadis and P. Biron. Ucla-okapi at TREC-2: Query expansion experiments. In Proceedings of the 2nd Text Retrieval Conference (TREC-2), pages 279–290. NIST Special Publication 500-215, 1994.
- J. Xu and W. Croft. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1):79–112, 2000.
- M. A. Hearst and C. Plaunt. Subtopic structuring for full-length document access. In SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages 59–68, New York, NY, USA, 1993. ACM.
- X. Shen and C. Zhai. Exploiting query history for document ranking in interac- tive information retrieval. In 26th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 377–378. ACM Press, 2003.
- G. Salton, A. Singhal, M. Mitra, and C. Buckley. Automatic text structuring and summarization. Inf. Process. Manage., 33(2):193–207, 1997.
- K. Coffman and A. M. Odlyzko. The size and growth rate of the internet. First Monday, 3(10), 1998.
- A. Folkers and H. Samet. Content-based image retrieval using fourier descriptors on a logo database. In ICPR '02: Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3, page 30521, Washington, DC, USA, 2002. IEEE Computer Society.
- S. Yu, D. Cai, J. Wen, and W. Ma. Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In Proceedings of the 12th In- ternational Conference on World Wide Web, pages 11–18, Budapest, 2003. ACM Press.
- W. Fan, M. Luo, L. Wang, W. Xi, and E. Fox. Tuning before feedback: combining ranking discovery and blind feedback for robust retrieval. In SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Develop- ment in Information Retrieval, pages 138–145, New York, NY, USA, July 2004. ACM.
- S. Hodges, L. Williams, E. Berry, S. Izadi, J. Srinivasan, A. Butler, G. Smyth, N. Ka- pur, and K. R. Wood. Sensecam: A retrospective memory aid. In Ubicomp, pages 177–193, 2006.
- L. A. F. Park, M. Palaniswami, and K. Ramamohanarao. Internet document filter- ing using fourier domain scoring. In PKDD '01: Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, pages 362–373, London, UK, 2001. Springer-Verlag.
- M. Fuller, L. Kelly, and G. Jones. Applying contextual memory cues for retrieval from personal information archives. In PIM 2008 -Proceedings of Personal Infor- mation Management, Workshop at CHI 2008, Florence, Italy, 2008.
- I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco, CA, 1999.
- D. Moore and G. McCabe. Introduction to the Practice of Statistics. W H Freeman and Co, 1nd edition, 1989.
- F. J. Burkowski. An algebra for hierarchically organized text-dominated databases. Inf. Process. Manage., 28(3):333–348, 1992.
- M. F. Porter. An algorithm for suffix stripping. Readings in information retrieval, pages 313–316, 1997.
- J. Li, M. Guo, and S. Tian. A new approach to query expansion. In Machine Learning and Cybernetics, pages 2302–2306, August 2005.
- M. Beigbeder and A. Mercier. An information retrieval model using the fuzzy prox- imity degree of term occurences. In SAC '05: Proceedings of the 2005 ACM Sympo- sium on Applied Computing, pages 1018–1022, New York, NY, USA, 2005. ACM. BIBLIOGRAPHY
- C. W. Cleverdon and M. Keen. Aslib cranfield research project -factors determining the performance of indexing systems; volume 2, test results. Technical report, Aslib Cranfield Research Project, 1966.
- E. S. Adams. A study of trigrams and their feasibility as index terms in a full text in- formation retrieval system. PhD thesis, George Washington University, Washington, DC, USA, 1992.
- V. Bush. As we may think. Interactions, 3(2):35–46, 1996.
- C. V. Rijsbergen. A theoretical basis for the use of cooccurrence data in information retrieval. Journal of Documentation, 33(2):106–119, 1977.
- G. Salton, C. S. Yang, and C. T. Yu. A theory of term importance in automatic text analysis. Technical report, Ithaca, NY, USA, 1974.
- M. D. Cooper, U. Nam, and J. Foote. Audio retrieval by rhythmic similarity. In 3rd International Conference on Music Information Retrieval, 2002.
- V. J. Frants, V. G. Voiskunski, and J. Shapiro. Automated Information Retrieval: Theory and Methods. Academic Press Inc, London, August 1997.
- G. Salton. Automatic Information Organization and Retrieval. McGraw Hill Text, 1968.
- G. Salton, J. Allan, and A. Singhal. Automatic text decomposition and structuring. Inf. Process. Manage., 32(2):127–138, 1996.
- G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613–620, 1975.
- C. Reed. Blended search results studyblended search results study. Technical report, Iprospect, Apr. 2008. BIBLIOGRAPHY
- D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Block-based web search. In SIGIR '04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 456–463, New York, NY, USA, 2004. ACM Press.
- E. Fox. Characteristics of two new experimental collections in computer and in- formation science containing textual and bibliographic concepts. Technical Report 83-561, Department of Computer Science, Comell University, Ithaca, NY, 1983.
- A. Reynolds and P. W. Flagg. Cognitive Psychology. Wintrop Publishers, 1977.
- Y. Qiu and H.-P. Frei. Concept based query expansion. In SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages 160–169, New York, NY, USA, 1993. ACM.
- M. Katz. Distribution of content words and phrases in text and language modelling. Natural Language Engineering, 2(1):15–59, 1996.
- P. Galeas, R. Kretschmer, and B. Freisleben. Document relevance assessment via term distribution analysis using fourier series expansion. In JCDL '09: Proceed- ings of the 2009 ACM/IEEE-CS Joint International Conference on Digital Libraries, pages 277–284, New York, NY, USA, 2009. ACM.
- T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, August 1991.
- C. J. Crouch and B. Yang. Experiments in automatic statistical thesaurus construc- tion. In SIGIR '92: Proceedings of the 15th annual international ACM SIGIR confer- ence on Research and development in information retrieval, pages 77–88, New York, NY, USA, 1992. ACM.
- R. Attar and A. Fraenkel. Experiments in local metrical feedback in full-text retrieval systems. Information Processing and Management, 17(3):115–126, 1981.
- J. W. Tukey. Exploratory Data Analysis. Series in Behavioral Science. Addison- Wesley, Juni 1977.
- J. Zobel and A. Moffat. Exploring the similarity space. SIGIR Forum, 32(1):18–34, 1998.
- W. N. Francis and H. Kucera. Frequency analysis of english usage: Lexicon and grammar. Journal of English Linguistics, 18(1):64–70, April 1985.
- N. Ramakrishnan. From the area editor: Frontiers of search. Computer, 38(10):26– 27, 2005.
- K. Yosida. Functional Analysis. Springer, 1980.
- S. Miyamoto. Fuzzy Sets in Information Retrieval and Cluster Analysis. Kluwer Academic Publishers Group, 1990.
- K. G. Coffman, K. G. Coffman, A. M. Odlyzko, and A. M. Odlyzko. Growth of the internet. In Utility, Utilization, and Quality of Service, Tech. Rep. 99-08, DIMACS, pages 17–56. Academic Press, 2001.
- C. D. Gull. Historical note: Information science and technology: From coordinate in- dexing to the global brain. Journal of the American Society for Information Science, 38(5):338–366, 1987.
- C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391–407, 1990.
- H. H. Wellisch. Indexing from A to Z. Niso Press, April 1996.
- J. W. Perry. In Encyclopedia of library and information science, volume 22, pages 66–68. New York: Marcel Dekker, Inc., 1977.
- D. A. Grossman and O. Frieder. Information Retrieval. Algorithms and Heuristics. Springer Netherlands, 2nd edition, January 2005.
- W. B. Frakes and R. Baeza-Yates. Information Retrieval Data Structures & Algo- rithms. Prentice Hall PTR, June 1992.
- F. W. Lancaster. Information retrieval systems: Characteristics, testing, and evalua- tion. Wiley, 1979. BIBLIOGRAPHY [91] T. Landauer and M. Littman. Fully automatic cross-language document retrieval using latent semantic indexing. In Proceedings of the 6th Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research, pages 31–38, Waterloo Ontario, 1990. UW Centre for the New OED and Text Research.
- K. S. Jones and C. V. Rijsbergen. Information retrieval test collections. Journal Of Documentation, 32(1):59–75, 1976.
- R. R. Korfhage. Information Storage and Retrieval. Wiley, June 1997.
- T. Cawkell and E. Garfield. Institute for scientific information. Information Services and Use, 21(2):79–86, 2001.
- G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw- Hill, Inc., New York, NY, USA, 1986.
- H. P. Luhn. Key word-in-context index for technical literature (kwic index). Ameri- can Documentation, 11(4):288–295, 1960.
- R. C. Bodner and F. Song. Knowledge-based approaches to query expansion in in- formation retrieval. In AI '96: Proceedings of the 11th Biennial Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence, pages 146–158, London, UK, 1996. Springer-Verlag.
- R. Attar and A. S. Fraenkel. Local feedback in full-text retrieval systems. Journal of the ACM, 24(3):397–417, 1977.
- O. de Kretser and A. Moffat. Locality-based information retrieval. In 10th Aus- tralasian Database Conference, pages 177–188, Auckland, New Zealand, January 1999.
- O. Gospodnetic and H. Hatcher. Lucene In Action. Manning Publications Co., 1st edition, 2005.
- R. A. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison- Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.
- K. D. Corbitt and E. Kaplan. Calvin N. Mooers Papers (CBI 81). Charles Babbage Institute, University of Minnesota, Minneapolis, 1992.
- S. Robertson, M. Porter, and C. van Rijsbergen. New models in probabilistic in- formation retrieval. Technical report, Computer Laboratory, Cambridge University, 1980.
- C. H. Fenichel. Online searching: Measures that discriminate among users with dif- ferent types of experiences. Journal of the American Society for Information Science, 32(1):23–32, Januar 1981.
- M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing and information retrieval. J. ACM, 7(3):216–244, 1960.
- X. Liu and W. B. Croft. Passage retrieval based on language models. In CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management, pages 375–382, New York, NY, USA, 2002. ACM.
- M. Kaszkiel and J. Zobel. Passage retrieval revisited. In SIGIR '97: Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, pages 178–185, New York, NY, USA, 1997. ACM.
- A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21–29, New York, NY, USA, 1996. ACM.
- R. Losee. Probabilistic retrieval and coordination level matching. American Society for Information Science, 38(4):239–244, 1987.
- H. Turtle and J. Flood. Query evaluation: strategies and optimizations. Inf. Process. Manage., 31(6):831–850, 1995.
- E. Efthimiadis. Query expansion. Annual Review of Information Science and Tech- nology (ARIST), (2):121–187, 1996.
- J. Xu and W. Croft. Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in information Retrieval, pages 4–11, New York, August 1996. ACM Press.
- J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313–323, Englewood Cliffs, NJ, 1971. Prentice-Hall.
- J. Allan. Relevance feedback with too much data. In SIGIR '95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 337–343, New York, NY, USA, 1995. ACM.
- S. Robertson and K. S. Jones. Relevance weighting of search terms. American Society for Information Sciences, 27(3):129–146, 1976.
- K. S. Jones and K. van Rijsbergen. Report on the need for and provision of an 'ideal' information retrieval test collection. BL R&D REPORT, 1(5266), 1975. Computer Laboratory, University of Cambridge.
- F. J. Burkowski. Retrieval activities in a database consisting of heterogeneous collec- tions of structured text. In SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pages 112–125, New York, NY, USA, 1992. ACM.
- M. R. Spiegel. Schaum's Outline of theory and problems of Fourier analysis. Mc- Graw Hill, New York, 1 edition, 1974.
- N. Carroll. Search engine optimization and user behavior. Technical report, Ency- clopedia of Library and Information Sciences, 2009.
- C. L. A. Clarke and G. V. Cormack. Shortest-substring retrieval and ranking. ACM Trans. Inf. Syst., 18(1):44–78, 2000.
- E. M. Keen. Some aspects of proximity searching in text retrieval systems. J. Inf. Sci., 18(2):89–98, 1992.
- Z. Gyöngyi and H. Garcia-Molina. Spam: It's not just for inboxes anymore. Com- puter, 38(10):28–34, 2005.
- L. A. F. Park. Spectral Based Information Retrieval. PhD thesis, Department of Elec- trical and Electronic Engineering, The University of Melbourne, December 2003.
- C. Stanfill and D. L. Waltz. Statistical methods, artificial intelligence, and informa- tion retrieval. pages 215–225, 1992.
- M. Taube. Studies in coordinate indexing. Washington Documentation Incorporated, 1953.
- G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA, 1987. [134] G. Salton and M. E. Lesk. The smart automatic document retrieval systems -an illustration. Commun. ACM, 8(6):391–398, 1965.
- I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: A high performance and scalable information retrieval platform. In Proceedings of ACM SIGIR'06 Workshop on Open Source Information Retrieval (OSIR 2006), 2006.
- C. Hamilton, R. Kimberley, and J. Rowley, editors. Text Retrieval: Directory of Software. Gower Publishing Ltd, 3rd edition, July 1990.
- J. M. Ponte and W. B. Croft. Text segmentation by topic. In ECDL '97: Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries, pages 113–125, London, UK, 1997. Springer-Verlag.
- D. Beeferman, A. Berger, and J. Lafferty. Text segmentation using exponential mod- els. In In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 35–46, 1997.
- H. P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2), 1958.
- W. J. Wilbur and K. Sirotkin. The automatic identification of stop words. J. Inf. Sci., 18(1):45–55, 1992.
- D. S. Moore. The Basic Practice Of Statistics. W H Freeman, 3rd edition, 2003.
- C. J. van Rijsbergen. The Geometry of Information Retrieval. Cambridge University Press, New York, NY, USA, 2004.
- P. Vaswani and J. Cameron. The national physical laboratory experiments in statis- tical word associations and their use in document indexing and retrieval. Technical report, National Physical Laboratory, Teddington, 1970.
- V. I. Frants and C. B. Brush. The need for information and some aspects of informa- tion retrieval systems construction. Journal of the American Society for Information Science, 39(2):86–91, March 1988.
- A. Jerri. The shannon sampling theoremits various extensions and applications: A tutorial review. In Proceedings of the IEEE, pages 1565–1596, Nov 1977.
- G. Salton and M. E. Lesk. The smart automatic document retrieval systems an illus- tration. Commun. ACM, 8(6):391–398, 1965.
- G. Salton. The SMART Retrieval System -Experiments in Automatic Document Pro- cessing. Prentice Hall Inc, Englewood Cliffs. NJ, 1971.
- E. M. Keen. The use of term position devices in ranked output experiments. J. Doc., 47(1):1–22, 1991.
- C. L. Borgman. The user's mental model of an information retrieval system: an experiment on a prototype online catalog. Int. J. Man-Mach. Stud., 24(1):47–64, 1986.
- E. Voorhees and D. Harman. Trec: Experiment and Evaluation in Information Re- trieval. The MIT Press, 1st edition, 2005.
- A. Arthur, editor. Ukolug Quick Guide to Online Commands. UK Online User Group, 2nd edition, April 1989.
- P. Elias. Universal codeword sets and representations of the integers. Information Theory, IEEE Transactions on, 21(2):194–203, 1975.
- X. Huang and Y. Huang. Using contextual information to improve retrieval perfor- mance. In Proceedings of 2005 IEEE International Conference on Granular Com- puting, pages 474–481, Beijing, China, July 2005.
- M. Berry, S. Dumais, and G. O'Brien. Using linear algebra for intelligent information retrieval. SIAM Review, 37(4):573–595, 1994.
- R. Wilkinson and P. Hingston. Using the cosine measure in a neural network for document retrieval. In SIGIR '91: Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, pages 202–210, New York, NY, USA, 1991. ACM.
- P. Galeas and B. Freisleben. Word distribution analysis for relevance ranking and query expansion. In Computational Linguistics and Intelligent Text Processing, num- ber 4919 in Lecture Notes in Computer Science, pages 500–511. Springer Berlin / Heidelberg, 2008.
- R. Krovetz and W. B. Croft. Word sense disambiguation using machine-readable dictionaries. SIGIR Forum, 23(SI):127–136, 1989.
- R. Sun, C.-H. Ong, and T.-S. Chua. Mining dependency relations for query ex- pansion in passage retrieval. In SIGIR '06: Proceedings of the 29th Annual In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval, pages 382–389, New York, NY, USA, 2006. ACM Press.
- A. S. of Indexers. How information retrieval started. Internet, October 2005. http://www.asindexing.org/site/history.shtml.
- R. V. Williams and M. E. Bowden. Chronology of chemical information science, ab- stracts, reviews, compilations, and indexes stored and retrieved using electronic com- puters, 2000. http://www.chemheritage.org/ explore/ timeline/ MACHINE.HTM.
- S. Soy. Class lecture notes: H. p. luhn and automatic indexing – references to the early years of automatic indexing and information retrieval. Internet, 2003. http://www.gslis.utexas.edu/ ssoy/organizing/l391d2c.htm.
- S. Teasdale. Flame and Shadow, volume 591. Project-Gutenberg, 1996.
- M. Krajewski. Zettelwirtschaft. die geburt der kartei aus dem geist der bibliothek. Internet, February 2003. http://www.kunsttexte.de/ download/ bwt/ Kuehl.PDF.
- R. V. Williams. Chronology of information science and technology. Internet, 2002. http://www.libsci.sc.edu/ BOB/ istchron/ ISCNET/ ISCHRON.HTM.
- E. Ide. The SMART Retrieval System Experiments in Automatic Document Process- ing, chapter Relevance Feedback in Information Retrieval, pages 337–354. Prentice Hall, 1971. [75] G. Ingersoll. Getting started with payloads. Website, Aug. 2009. http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads.
- E. Garcia. The classic vector space model. Information Retrieval Intelli- gence: http://www.miislita.com/term-vector/term-vector-3. html, 2006. BIBLIOGRAPHY
- B. Billerbeck, F. Scholer, H. E. Williams, and J. Zobel. Query expansion using asso- ciated queries. In CIKM '03: Proceedings of the 12th Int. Conference on Information and Knowledge Management, pages 2–9, New York, NY, USA, 2003. ACM Press.
- S. P. Harter. Online Information Retrieval: Concepts, Principles, and Techniques. Academic Press, Inc., Orlando, FL, USA, 1986.