Definitions

from Wiktionary, Creative Commons Attribution/Share-Alike License.

  • noun information retrieval A mathematical approximation to the importance of a particular word in a given piece of text.

Etymologies

from Wiktionary, Creative Commons Attribution/Share-Alike License

Abbreviation of "term frequency over inverse document frequency".

Support

Help support Wordnik (and make this page ad-free) by adopting the word TF-IDF.

Examples

  • If something like “bebo” was never searched for before this year and then reasonably high in the top searches this year, TF-IDF would rank it very highly. reply

    Google Top Searches: Based on Everything and Nothing Michael Arrington 2005

  • If you have a collection of documents (or popular search terms for different years), TF-IDF gives you the terms that help to best differentiate one document from the rest.

    Google Top Searches: Based on Everything and Nothing Michael Arrington 2005

  • This looks a lot like TF-IDF which is commonly used in data mining to uncover terms that help to best differentiate one “document” from another, which seems to me to be what something like Zeitgeist should be going for.

    Google Top Searches: Based on Everything and Nothing Michael Arrington 2005

  • Search Basics • Goal: Identify documents that are similar to input query d1 • Lucene uses a modified Vector Space Model (VSM) - Boolean + VSM q1 Θ - TF-IDF - The words in the document and the query each define a Vector in an n-dimensional space dj = - Sim (q1, d1) = cos Θ q = - In Lucene, boolean approach restricts what documents to w = weight assigned to term score

    Recently Uploaded Slideshows 2009

  • \'Traditional web page search does IR / TF-IDF / page rank stuff pretty well on the Web at large, but if you want to do a specific type of search, for restaurants, images, etc., web search isn't necessarily the best option.

    Recently Uploaded Slideshows 2009

Comments

Log in or sign up to get involved in the conversation. It's quick and easy.