Amazon.com
Of all the tasks programmers are asked to perform, storing, compressing, and retrieving information are some of the most challenging--and critical to many applications. Managing Gigabytes: Compressing and Indexing Documents and Images is a treasure trove of theory, practical illustration, and general discussion in this fascinating technical subject.
Ian Witten, Alistair Moffat, and Timothy Bell have updated their original work with this even more impressive second edition. This version adds recent techniques such as block-sorting, new indexing techniques, new lossless compression strategies, and many other elements to the mix. In short, this work is a comprehensive summary of text and image compression, indexing, and querying techniques. The history of relevant algorithm development is woven well with a practical discussion of challenges, pitfalls, and specific solutions.
This title is a textbook-style exposition on the topic, with its information organized very clearly into topics such as compression, indexing, and so forth. In addition to diagrams and example text transformations, the authors use "pseudo-code" to present algorithms in a language-independent manner wherever possible. They also supplement the reading with mg--their own implementation of the techniques. The mg C language source code is freely available on the Web.
Alone, this book is an impressive collection of information. Nevertheless, the authors list numerous titles for further reading in selected topics. Whether you're in the midst of application development and need solutions fast or are merely curious about how top-notch information management is done, this hardcover is an excellent investment. --Stephen W. Plain
Topics covered: Text compression models, including Huffman, LZW, and their variants; trends in information management; index creation and compression; image compression; performance issues; and overall system implementation.
Book Description
In this fully updated second edition of the highly acclaimed
Managing Gigabytes, authors Witten, Moffat, and Bell continue to provide unparalleled coverage of state-of-the-art techniques for compressing and indexing data. Whatever your field, if you work with large quantities of information, this book is essential reading--an authoritative theoretical resource and a practical guide to meeting the toughest storage and access challenges. It covers the latest developments in compression and indexing and their application on the Web and in digital libraries. It also details dozens of powerful techniques supported by mg, the authors' own system for compressing, storing, and retrieving text, images, and textual images. mg's source code is freely available on the Web.
* Up-to-date coverage of new text compression algorithms such as block sorting, approximate arithmetic coding, and fat Huffman coding
* New sections on content-based index compression and distributed querying, with 2 new data structures for fast indexing
* New coverage of image coding, including descriptions of de facto standards in use on the Web (GIF and PNG), information on CALIC, the new proposed JPEG Lossless standard, and JBIG2
* New information on the Internet and WWW, digital libraries, web search engines, and agent-based retrieval
* Accompanied by a public domain system called MG which is a fully worked-out operational example of the advanced techniques developed and explained in the book
* New appendix on an existing digital library system that uses the MG software
Customer Reviews:
one of the best book on search engineering.......2007-04-20
It has been 8 years since it was published and I could see it is still one of the best in IR field. Without much long magic equations, it is not hard for common user to pick it up. There are mainly 2 parts in the book, the first book is compression, most of them are just principle introduction since it does not make sense for the read to invent or implement an algorithm. The second part is indexing (plus some query) which I highly recommended because it is "practical".
The authors are smart guys who could do sth, google mg for their website and mg4j for the ported java implementation.
A Comprehensive Introduction To Text Retrieval Systems.......2005-07-30
A wonderful feature of this book spans out practicality for various topics including compresion algorithms and theory, document and imaging system and information retrieval. On my personal interest, the authors highlight a vast list of not only the theory but present it in a simple common sense logic.
There are several examples that break down complex processes into simple and easy to understand logic and the pages provides a smooth flow of the structured topics. Well organised, presented and fully informative.
Truly an ideal book. This serves as a superior text for students studying document and imaging systems, processing and information and multimedia retrieval subjects. Beautiful!!!
Just on a personal note, it would be great to see some emphasis in the future editions in regards to web mining applications.
Great Book on Information Retrieval.......2004-05-03
Managing Gigabytes is the best book out there on information retrieval. If you're interested in implementing your own IR system, there's nothing available that comes close to this book. But the book is good not just because it's the only one out there: the writing is excellent, the algorithms are presented clearly and explained well, and the coverage is thorough. Additionally, the coverage of compression algorithms is the best I've found in any book. All algorithms and pseudo-code in the book are presented clearly enough such that any competent programmer should be able to implement them. If all else fails, however, the free downloadable source code for the mg system can fill in any gaps.
All in all, this is the best computer science book I've purchased in years. I wish all CS books were written like this one: it doesn't skimp on the theory or on the implementation details.
The Wonderful Thing Is: It's the Only One.......2001-12-21
This is the only book there is that will actually teach you how to build an information retrieval system (aka search engine). It discusses all the algorithms and tradeoffs, and comes with free downloadable source code to experiment with. Some of the material is standard, but covered in more implementation detail here than anywhere else. Some of the material is novel: you won't find better coverage of compression unless you hand-assemble twenty research papers, and reverse-engineer them to figure out how they're implemented. But with "Managing Gigabytes", it's all here. (Although, after a particularly envigorating discussion of how to string together a bunch of techniques to compress their corpus and save a couple 100MB, I did a check and found you could buy 512MB of RAM for less than the cost of the book. Knowledge is Power, but sometimes a little cash is more powerful.) The only negative is that this book is not called "Managing Terabytes", as the first edition promised/threatened it might be. RAM and disk are cheap, but not that cheap, and for now terabytes (and sometimes petabytes) are managed only by NASA, Google, and a few others. I can't wait to see the third edition!
Very clear, but misses some key real-world issues.......2001-08-15
As others have said, MG is a good introductory text for Information Retrieval. However I think it spends a little too much time on compression techniques and lacks a good discussion of incremental or on-line indexing. The book tends to assume that the set of texts to be searched is static - if new documents can be added or old ones deleted it makes the whole problem much harder and many of MG's techniques are no longer relevant. That said, I strongly look forward to Managing Terabytes (if it ever appears).
Book Description
This will be the third edition of the highly successful Text Information Retrieval Systems. The book's purpose is to teach people who will be searching or designing text retrieval systems how the systems work. For designers, it covers problems they will face and reviews currently available solutions to provide a basis for more advanced study. For the searcher its purpose is to describe why such systems work as they do. The book is primarily about computer-based retrieval systems, but the principles apply to nonmechanized ones as well. The book covers the nature of information, how it is organized for use by a computer, how search functions are carried out, and some of the theory underlying these functions. As well, it discusses the interaction between user and system and how retrieved items, users, and complete systems are evaluated. A limited knowledge of mathematics and of computing is assumed.
This third edition will be updated to include coverage of the WWW and current search engines. In many cases, examples of non-web searching will be replaced with web-based illustrations. Coverage of interfaces, various features available to assist searchers, and areas in which search assistance is not available will also be covered. In addition, the book will have a web dimension which will include relevant material available online, to be used in conjunction with the text.
*Follow-up to the award winning 2nd Edition
*Focuses on computer-based sytem but basic principles can be applied to any information seeking context
Customer Reviews:
not [much] about web searching.......2007-06-03
The book takes the reader through a quick summary of the history of text IRS. Mostly, the readership is assumed to be librarians. Whose task is to search for information. Much of the book has a traditional feel, describing a discipline that strives to be precise and orderly. Most famously, with the imposition of a cataloging system, like the Dewey or Library of Congress methods.
The book also deals with recent changes. Most notably the Web. There is some consideration of the problem of dealing with and trying to classify web sites and web pages. But this is not a text on web search engines, per se. That has proved to be a vast economically important field. It's just not covered much here.
Important ideas are still explained, that are also germane to those readers involved in web searching. Like having an ontology of well defined terms. Or having a consistent metadata schema, as with the Dublin Core.
This book reminds me of texts in the early 90s, that covered SGML. Mostly for publishers. Just as the SGML-inspired HTML started taking off with the new Web. The SGML books were correct, but limited in their audience, while a much larger world of HTML was emerging. Likewise here. The ideas bubbling around the Dublin Core and ontologies are really not being driven by traditional printed texts, or even the databases that exist, but are not on the web.
Book Description
Integrative Document and Content Management: Strategies for Exploiting Enterprise Knowledge blends theory and practice to provide practical knowledge and guidelines to enterprises wishing to understand the importance of managing documents to their operations along with presentation of document content to facilitate business planning and operations support. This book gives extensive pointers to those who propose to embark upon the implementation of integrated document management systems.
Customer Reviews:
Integrative Document & Content Management book review.......2005-08-08
Covers all the relevant aspects of DMS and more. Very comprehensive.
Book Description
The extensively revised and completely updated second edition of this popular textbook provides LIS practitioners and students with a vital guide to the organization of information. After a broad overview of the concept and its role in human endeavors, Taylor proceeds to a detailed and insightful discussion of such basic retrieval tools as bibliographies, catalogs, indexes, finding aids, registers, databases, major bibliographic utilities, and other organizing entities. After tracing the development of the organization of recorded information in Western civilization from 2000 B.C.E. to the present, the author addresses topics that include encoding standards (MARC, SGML, and various DTDs), metadata (description, access, and access control), verbal subject analysis including controlled vocabularies and ontologies, classification theory and methodology, arrangement and display, and system design.
Customer Reviews:
Solid introductory textbook .......2006-08-22
I used this textbook for a core cataloging course for my LIS degree. It provides a good foundation for different information systems. The author includes important discussion of Web technologies, like XML, and how they are used in conjunction with traditional library encoding systems like MARC. It includes sections on systems design that are not deep, but appropriate given the introductory nature of this book. Unlike some other reviewers, I had no trouble with acronyms, and other definitions: the book has a thorough index, in addition to a glossary.
Very Technical and Difficult to Understand.......2005-12-14
This book was the required textbook for a Master's Degree course in library science which I recently completed. I wish the instructor would have chosen a different book. This book is full of technical jargon that is very difficult to understand. Having worked in a library for the past ten years, I still found the terminology used in this book to be very hard to follow and comprehend. The examples that are given are difficult to understand, and the text itself is extremely difficult to read and digest.
For an introductory textbook, this book is very difficult to read and understand. I got very little out of reading it, and I sometimes found myself having more questions after reading the book than I did before. If you are a student that is required to use this book, I can only hope that you get more from it than I did.
Disorganized Information........2005-11-07
This text was required reading for an introductory cataloging class. I have found it to be difficult to navigate as an introductory piece. The writing is sub par and it is clear that this is a second edition work.
For example, the author will introduce an acronym and fail to identify what it stands for, or you will find materials that should be restricted to later chapters incorporated into earlier chapters, which creates a scattered organization feel.
Her work in Wynar's Introduction to Cataloging and Classification is of significantly higher quality and is more apropos for an introductory level class than this text.
All and all, I would not recommend this text fro an introduction to knowledge organization. Hope you never have to work with it in class... and if you do, pick up supplementary materials, you are going to need it.
library school text.......2005-10-09
This is a decent book. Not that exciting, but required reading for me.
Excellent textbook!.......2005-09-21
This is an excellent textbook. It is well organized and clearly written, explaining difficult concepts in understandable language for students. It covers the history of information organization in libraries, archives, and museums, as well as tools such as inventories, bibliographies, catalogs and indexes, and methods and standards of codification; reviews the history and development of the internet; describes different kinds of databases; and discusses metadata as a theoretical and practical concept, reviewing different kinds of metadata schemes. I highly recommend it.
Book Description
The Text REtrieval Conference (TREC), a yearly workshop hosted by the US government's National Institute of Standards and Technology, provides the infrastructure necessary for large-scale evaluation of text retrieval methodologies. With the goal of accelerating research in this area, TREC created the first large test collections of full-text documents and standardized retrieval evaluation. The impact has been significant; since TREC's beginning in 1992, retrieval effectiveness has approximately doubled. TREC has built a variety of large test collections, including collections for such specialized retrieval tasks as cross-language retrieval and retrieval of speech. Moreover, TREC has accelerated the transfer of research ideas into commercial systems, as demonstrated in the number of retrieval techniques developed in TREC that are now used in Web search engines.
This book provides a comprehensive review of TREC research, summarizing the variety of TREC results, documenting the best practices in experimental information retrieval, and suggesting areas for further research. The first part of the book describes TREC's history, test collections, and retrieval methodology. Next, the book provides "track" reports -- describing the evaluations of specific tasks, including routing and filtering, interactive retrieval, and retrieving noisy text. The final part of the book offers perspectives on TREC from such participants as Microsoft Research, University of Massachusetts, Cornell University, University of Waterloo, City University of New York, and IBM. The book will be of interest to researchers in information retrieval and related technologies, including natural language processing.
Average customer rating:
- Too bad Ventana got hold of it!
- Good book, faulty CD-ROM
- Excellent book of all the printing ins and outs
|
Looking Good in Print: Deluxe Cd-Rom Edition (Looking Good in Print)
Roger C. Parker , and
Carrie Beverly
Manufacturer: Ventana Communications Group
ProductGroup: Book
Binding: Paperback
General
| Drawing
| Arts & Photography
| Subjects
| Books
Web Development
| Computers & Internet
| Subjects
| Books
| Content Management
| E-commerce
| Programming
| Security & Encryption
| Web 2.0
| Web Design
| Web Servers
| Web Services
| Website Analytics
| Website Architecture & Usability
Printing
| Graphic Design
| Computers & Internet
| Subjects
| Books
General
| Computers & Internet
| Subjects
| Books
General
| Word Processors & Editors
| Software
| Computers & Internet
| Subjects
| Books
Engineering
| Professional & Technical
| Subjects
| Books
| Aerospace
| Automotive
| Bioengineering
| Chemical
| Civil
| Computer Technology
| Design
| Economics
| Education
| Electrical & Electronics
| Energy
| General
| Industrial, Manufacturing & Operational Systems
| Management
| Marine
| Materials
| Materials Science
| Mechanical
| Nuclear
| Patents & Inventions
| Petroleum, Mining & Geological
| Power Systems
| Reference
| Research
| Special Topics
| Telecommunications
| Welding
Engineering
| Specialty Stores
| Books
| Aerospace
| Automotive
| Bioengineering
| Chemical
| Civil
| Computer Technology
| Design
| Economics
| Education
| Electrical & Electronics
| Energy
| General
| Industrial, Manufacturing & Operational Systems
| Management
| Materials
| Materials Science
| Mechanical
| Nuclear
| Patents & Inventions
| Petroleum, Mining & Geological
| Power Systems
| Reference
| Research
| Special Topics
| Telecommunications
| Welding
Look Inside Computer Books
| Trip
| Specialty Stores
| Books
ASIN: 1566044715 |
Customer Reviews:
Too bad Ventana got hold of it!.......2003-10-03
This book is a solid presentation on print design principles and techniques with great examples -- it has been since Parker's first edition. The CD-ROM is falsely advertised as providing "examples, templates, tips and shareware." It has some marginal shareware and overdone advertising for Ventana, period. STRONGLY recommend keeping the older editions.
Good book, faulty CD-ROM.......2001-12-28
I found this book unused in one of our offices. Apparently purchased a couple of years ago. The content was good to great, many of the ideas are common knowledge to an experienced designer but it's good to have it put all together in one place.
Unfortunately the CD-ROM contained only its Windows files and all the Mac content was missing. I wonder if this is the case with all of the first printing ? This is not new to me. I have found other CD-ROMs that claim to have Mac files as well as PC files but somewhere during the production process the Mac stuff gets lost. From my studio, I produce for both platforms and find it hard to believe that more care isn't taken by some publishers.
So then, in conclusion... the book is a solid piece of work. Hopefully the cross platform clitch was fixed in the new edition.
Excellent book of all the printing ins and outs.......1998-09-03
This book is a must-have for all designers. Expert to Novice can find information you need to make sure your project looks as good in print as it does on the screen.
Average customer rating:
- This is the best book I read in this field
- Very readable book with good coverage of topics
|
Automatic Text Processing: The Transformation Analysis and Retrieval of Information by Computer (Addison-Wesley series in computer science)
Gerard Salton
Manufacturer: Addison-Wesley Pub (Sd)
ProductGroup: Book
Binding: Hardcover
General
| Programming
| Computers & Internet
| Subjects
| Books
Word Processing
| Microsoft
| Computers & Internet
| Subjects
| Books
General
| Computers & Internet
| Subjects
| Books
General
| Operating Systems
| Computers & Internet
| Subjects
| Books
General
| Word Processors & Editors
| Software
| Computers & Internet
| Subjects
| Books
Mathematics
| Professional Science
| Professional & Technical
| Subjects
| Books
| Applied
| Chaos & Systems
| Geometry & Topology
| Mathematical Analysis
| Mathematical Physics
| Number Systems
| Pure Mathematics
| Transformations
| Trigonometry
Look Inside Computer Books
| Trip
| Specialty Stores
| Books
ASIN: 0201122278 |
Customer Reviews:
This is the best book I read in this field.......2005-03-20
Comparing with other similar books, this is the best book I read in this field. You have to read every sentance in this book very carefully, then you get the ideal on how to do or how to implement it. If author can update it with his new research result, it could be better.
Very readable book with good coverage of topics.......2002-11-05
It's a pity that most of the author's books seem to be out of print, since he's a very clear writer. This book isn't necessarily accessible to the complete beginner--the intended audience is advanced computer science students, computational linguists/natural language processing people, and library/information science students--but if you fall into one of those categories, you should find this a very readable book. The coverage of topics is heavily slanted towards practical applications--editting and formatting, compression, encryption, file access, information retrieval, indexing, abstracting, spell checking, syntax and style checking--rather than towards theoretical background. That's not a bad thing, though--for many of those topics, this might be the most accessible resource you'll find.
The book was published in the late 80's, and hence is a bit dated by now--for instance, the statistical revolution in NLP pretty much isn't covered (Bayes doesn't even show up in the index). However, that in no way detracts from the value of what IS covered.
Average customer rating:
|
Multimedia Information Retrieval: Content-Based Information Retrieval from Large Text and Audio Databases (The Springer International Series in Engineering and Computer Science)
Peter Schäuble
Manufacturer: Springer
ProductGroup: Book
Binding: Hardcover
Multimedia
| Databases
| Computers & Internet
| Subjects
| Books
General
| Databases
| Computers & Internet
| Subjects
| Books
Information Systems
| Software Engineering
| Computer Science
| Computers & Internet
| Subjects
| Books
Multimedia Information Systems
| Software Engineering
| Computer Science
| Computers & Internet
| Subjects
| Books
General
| Computers & Internet
| Subjects
| Books
General
| Web Design
| Web Development
| Computers & Internet
| Subjects
| Books
Library Management
| Library & Information Science
| Social Sciences
| Nonfiction
| Subjects
| Books
General
| Library & Information Science
| Social Sciences
| Nonfiction
| Subjects
| Books
Automation
| Library & Information Science
| Social Sciences
| Nonfiction
| Subjects
| Books
General
| Science
| Subjects
| Books
Mathematics
| Professional Science
| Professional & Technical
| Subjects
| Books
| Applied
| Chaos & Systems
| Geometry & Topology
| Mathematical Analysis
| Mathematical Physics
| Number Systems
| Pure Mathematics
| Transformations
| Trigonometry
Database Storage & Design
| Computer Science & Information Systems
| New & Used Textbooks
| Stores
| Books
Software
| Information Systems
| Computer Science & Information Systems
| New & Used Textbooks
| Stores
| Books
All Titles
| Qualifying Textbooks - Fall 2007
| Stores
| Books
Computers & Internet
| Qualifying Textbooks - Fall 2007
| Stores
| Books
Nonfiction
| Qualifying Textbooks - Fall 2007
| Stores
| Books
Professional
| Qualifying Textbooks - Fall 2007
| Stores
| Books
Science
| Qualifying Textbooks - Fall 2007
| Stores
| Books
ASIN: 0792398998 |
Book Description
Multimedia Information Retrieval: Content-Based Information Retrieval
from Large Text and Audio Databases addresses the future need for sophisticated search techniques that will be required to find relevant information in large digital data repositories, such as digital libraries and other multimedia databases. Because of the dramatically increasing amount of multimedia data available, there is a growing need for new search techniques that provide not only fewer bits, but also the most relevant bits, to those searching for multimedia digital data. This book serves to bridge the gap between classic ranking of text documents and modern information retrieval where composite multimedia documents are searched for relevant information.
Multimedia Information Retrieval: Content-Based Information Retrieval
from Large Text and Audio Databases begins to pave the way for speech retrieval; only recently has the search for information in speech recordings become feasible. This book provides the necessary introduction to speech recognition while discussing probabilistic retrieval and text retrieval, key topics in classic information retrieval. The book then discusses speech retrieval, which is even more challenging than retrieving text documents because word boundaries are difficult to detect, and recognition errors affect the retrieval effectiveness. This book also addresses the problem of integrating information retrieval and database functions, since there is an increasing need for retrieving information from frequently changing data collections which are organized and managed by a database system.
Multimedia Information Retrieval: Content-Based Information Retrieval
from Large Text and Audio Databases serves as an excellent reference source and may be used as a text for advanced courses on the topic.
Average customer rating:
- Absolutely not up to date!
|
Portable Document Format Reference Manual (APL)
Tim Bienz ,
Richard Cohn , and
Adobe Systems Inc.
Manufacturer: Addison-Wesley (C)
ProductGroup: Book
Binding: Paperback
Graphic Design
| Computers & Internet
| Subjects
| Books
| 3D Graphics
| Adobe FrameMaker
| Adobe Illustrator
| Adobe InDesign
| Adobe PageMaker
| CAD
| Desktop Publishing
| Electronic Documents
| General
| Information Visualization
| Interface Design
| Printing
| Reference
| Rendering & Ray Tracing
| Scanning
| Typography
| Web Design
General
| Programming
| Computers & Internet
| Subjects
| Books
General
| Computers & Internet
| Subjects
| Books
General
| Software
| Computers & Internet
| Subjects
| Books
Mathematics
| Professional Science
| Professional & Technical
| Subjects
| Books
| Applied
| Chaos & Systems
| Geometry & Topology
| Mathematical Analysis
| Mathematical Physics
| Number Systems
| Pure Mathematics
| Transformations
| Trigonometry
ASIN: 0201626284 |
Customer Reviews:
Absolutely not up to date!.......1998-04-30
While in structure and content up to par with other Adobe books, this book should be taken off the Amazon shelves! It is completely out of date, and you can download the newest version of the PDF specification from the Adobe web site -- free of charge!
Book Description
This book constitutes the thoroughly refereed postproceedings of the 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005, held in Vienna, Austria in September 2005.
The 111 revised papers presented together with an introduction were carefully reviewed and selected for inclusion in the book. The papers are organized in topical sections on multilingual textual document retrieval, cross-language and more, monolingual experiments, domain-specific information retrieval, interactive cross-language information retrieval, multiple language question answering, cross-language retrieval in image collections, cross-language speech retrieval, multilingual Web track, cross-language geographical retrieval, and evaluation issues.
Books:
- Managing the Risks of Organizational Accidents
- Managing Transitions: Making the Most of Change
- Momentum: Igniting Social Change in the Connected Age
- .NET and COM: The Complete Interoperability Guide (2 Volume Set)
- New Life Insurance Investment Advisor: Achieving Financial Security for You and your Family Through Today's Insurance Products
- Nurse Practitioner's Business Practice and Legal Guide, Second Edition
- Open Source Solutions For Small Business Problems (Networking Series)
- Oracle E-Business Suite Manufacturing & Supply Chain Management
- Organizing Business Knowledge: The MIT Process Handbook
- Outsourcing to India: The Offshore Advantage
Books Index
Books Home
Recommended Books
- The Power of Focus: How to Hit Your Business, Personal and Financial Targets with Absolute Certainty
- Nature Designs Stained Glass Pattern Book
- Creating Effective Boards for Private Enterprises: Meeting the Challenges of Continuity and Competit
- Electrodiagnosis In Disease Of Nerve And Muscle
- History: Fiction or Science
- La Profecia Celestina: Una Aventura
- Japanimals: History And Culture in Japan's Animal Life
- Asterix the Gaul
- Business Portfolio Management: Valuation, Risk Assessment, and EVA Strategies
- Have Spacesuit, Will Travel