from Wiktionary, Creative Commons Attribution/Share-Alike License.
- noun information retrieval A mathematical approximation to the
importanceof a particular word in a given piece of text.
from Wiktionary, Creative Commons Attribution/Share-Alike License
If something like “bebo” was never searched for before this year and then reasonably high in the top searches this year, TF-IDF would rank it very highly. reply
If you have a collection of documents (or popular search terms for different years), TF-IDF gives you the terms that help to best differentiate one document from the rest.
This looks a lot like TF-IDF which is commonly used in data mining to uncover terms that help to best differentiate one “document” from another, which seems to me to be what something like Zeitgeist should be going for.
Search Basics • Goal: Identify documents that are similar to input query d1 • Lucene uses a modified Vector Space Model (VSM) - Boolean + VSM q1 Θ - TF-IDF - The words in the document and the query each define a Vector in an n-dimensional space dj = - Sim (q1, d1) = cos Θ q = - In Lucene, boolean approach restricts what documents to w = weight assigned to term score
\'Traditional web page search does IR / TF-IDF / page rank stuff pretty well on the Web at large, but if you want to do a specific type of search, for restaurants, images, etc., web search isn't necessarily the best option.