Information Retrieval: Data Structures and Algorithms
Format: PDF / Kindle (mobi) / ePub
Information retrieval is a sub-field of computer science that deals with the automated storage and retrieval of documents. Providing the latest information retrieval techniques, this guide discusses Information Retrieval data structures and algorithms, including implementations in C. Aimed at software engineers building systems with book processing components, it provides a descriptive and evaluative explanation of storage and retrieval systems, file structures, term and query operations, document operations and hardware. Contains techniques for handling inverted files, signature files, and file organizations for optical disks. Discusses such operations as lexical analysis and stoplists, stemming algorithms, thesaurus construction, and relevance feedback and other query modification techniques. Provides information on Boolean operations, hashing algorithms, ranking algorithms and clustering algorithms. In addition to being of interest to software engineering professionals, this book will be useful to information science and library science professionals who are interested in text retrieval technology.
Baeza-Yates and Larson 1989). This technique file:///C|/E%20Drive%20Data/My%20Books/Algorithm/DrDo...Books_Algorithms_Collection2ed/books/book5/chap02.htm (7 of 15)7/3/2004 4:19:26 PM Information Retrieval: CHAPTER 2: INTRODUCTION TO DATA STRUCTURES AND does not deteriorate update time. Adaptive splits: two bucket sizes of relative ratios 1/2 are used. However, splits are not symmetric (balanced), and they depend on the insertion point. This technique achieves 77 percent average storage.
Average Height of Trees in Digital Search and Dynamic Hashing." Inf. Proc. Letters, 13, 64-66. ULLMAN, J. 1972. "A Note on the Efficiency of Hashing Functions." JACM, 19(3), 569-75. WEINER, P. 1973. "Linear Pattern Matching Algorithm," in FOCS, vol. 14, pp. 1-11. WILLIAMS, F. 1959. "Handling Identifiers as Internal Symbols in Language Processors." CACM, 2(6), 21-24. YAO, A. 1979. "The Complexity of Pattern Matching for a Random String." SIAM J. Computing, 8, 36887. Go to Chapter 3 Back to Table.
Archive. The smallest unit of registration in the CDFS organization is the file. The basic unit of organization in the CDFS is called a "transaction." A transaction results from the process of writing a complete group of files on the optical disk. All the files in a transaction group are placed on the disk immediately adjacent to the position of the previous transaction. Each individual file is stored contiguously. At the end of a transaction, an updated directory list for the entire file system.
Defect). An error such as this would require the entire time-stamp list to be copied and a scheme for identifying and locating alternate time stamp lists on the disk. The utility of the Optical File Cabinet file system is also difficult to establish. As a replacement for magnetic disks it is an expensive substitute, as the write-once disks eventually fill up and must be replaced. The most appropriate application seems to be in fault-tolerant systems that require their entire file systems to be.
Recall), b = 1 (precision equals recall in importance), b = 2 ( recall twice as important as precision). The data analysis showed that searchers truncated at or near the root morpheme boundaries. The mean number of characters that terms truncated by searchers varied from root boundaries in a positive direction was .825 with a standard deviation of .429. Deviations in a negative direction were even smaller with a mean of .035 and a standard deviation of .089. The tests of correlation revealed no.