Definitions

from Wiktionary, Creative Commons Attribution/Share-Alike License

  • n. A mathematical approximation to the importance of a particular word in a given piece of text.

Etymologies

from Wiktionary, Creative Commons Attribution/Share-Alike License

Abbreviation of "term frequency over inverse document frequency".

Examples

  • If something like “bebo” was never searched for before this year and then reasonably high in the top searches this year, TF-IDF would rank it very highly. reply

    Google Top Searches: Based on Everything and Nothing

  • If you have a collection of documents (or popular search terms for different years), TF-IDF gives you the terms that help to best differentiate one document from the rest.

    Google Top Searches: Based on Everything and Nothing

  • This looks a lot like TF-IDF which is commonly used in data mining to uncover terms that help to best differentiate one “document” from another, which seems to me to be what something like Zeitgeist should be going for.

    Google Top Searches: Based on Everything and Nothing

  • Search Basics • Goal: Identify documents that are similar to input query d1 • Lucene uses a modified Vector Space Model (VSM) - Boolean + VSM q1 Θ - TF-IDF - The words in the document and the query each define a Vector in an n-dimensional space dj = - Sim (q1, d1) = cos Θ q = - In Lucene, boolean approach restricts what documents to w = weight assigned to term score

    Recently Uploaded Slideshows

  • \'Traditional web page search does IR / TF-IDF / page rank stuff pretty well on the Web at large, but if you want to do a specific type of search, for restaurants, images, etc., web search isn't necessarily the best option.

    Recently Uploaded Slideshows

Comments

Log in or sign up to get involved in the conversation. It's quick and easy.