474 words so far. Help to achieve the 500 mark!
To my knowledge there is no way of extracting these words from any text automatically. This remains a manual job.
Remember: the compound must have a figurative meaning.
The word "poppywash" is a new entry (as of today by myself) to Wordnik. Stephen Fry uses the word "poppywash" in the sense of "bullshit" on page 6 of the 2011 Penguin edition of "The Fry Chronicles". Urban dictionary defines the word as "bullshit, old English slang for expressing anger".
In the age of artificial intelligence and the semantic web, collocations sorted by thematic categories are useful contributions to people who work in the business of CAT and strive to make machine translation tools based on semantically annotated statistical information work better.
Here is a word cloud of about 200 nouns that have proved to be the most productive constituents of noun-noun collocations. The bigger the print the more frequently the noun enters into collocational relationship with others. With these 200 nouns you can compose more than 10000 valid collocations. How can I tell? Because they were derived from more than 10000 noun-noun collocations: http://www.wordle.net/show/wrdl/5929881/Most_active_noun-noun_collocates
Thank you for your comments and corrections.
Answering your question:
The purpose of my investigation is having fun with a language I learn and love. Undersanding why certain nouns are more active collocates than others. In other cases: collecting words and collocations that I find useful in my work as an interpreter (for interpreters collocations are a lot more interesting than words, because each collocation represents a neuronal connection).
No, vocabgrabber wasn't my source, only the tool with which I sorted a big collection (mostly hand-picked) of noun-noun collocations that I had put together over the years. Oy yes, another motivation: I love text-mining as a pastime (I am crazy). However strange it might sound to you: I don't give a (what's the proper word here?) whether I am first or last on this list. That's the point I have always wanted to make. I just like Wordnik and playing around. Maybe too much for my health, but it has little to do with adrenalin. Hope you liked my reply.
(I have no idea why Wordnik made two files from one)
Apart from the fact that uploading frequency lists to Wordnik is a rather unimaginative practice, you should at least take care to clean your lists from rubbish before you do so.
Wordnik is “all the WORDS”, so uploading prime numbers in batches and frequency lists with the words ordinal numbers does not make much sense to me.
But I may be wrong. So please feel free to download and upload the file I have cleaned for you under your name again (I will certainly delete it from among my lists, once you have done so.) I think it’s just silly to compete for primacy by any means, “just because”.
I have certainly no intention to compete with you in any way, and if I upload or do not upload files in the future will have nothing to do with your existence.
Text mining and putting together lists should be fun or could be a job-related obligation, but it is certainly not something which should motivate anyone to fight. Feel free to use Wordnik and be the FIRST, if this is your passion, but stop uploading numbers. Or have you thought of going swimming, instead?
Being a lexicographer has definitely something in common with being a thief, but even criminals of our sort may take pride in adding some value manually to the stuff that we have stolen.
"curly quotation mark" and "forward slash" were "quotation mark" and "slash" before typewriters and computers invented new breeds of these characters. Can you think of other traditional characters that need an adjective in front, because they were made obsolete by modern character sets?
This list contains the words that can be defined as follows: - the person upon whom one coughs at
- appalled over how much weight you have gained
- to give up all hope of ever having a flat stomach
- to attempt an explanation while drunk
- describes a condition in which you absentmindedly answer the door in your nightgown
- to walk with a lisp
- olive-flavored mouthwash
- emergency vehicle that picks you up after you are run over
- a rapidly receding hairline
- a humorous question on an exam
- the formal, dignified bearing adopted by proctologists
- a Rastafarian proctologist
- a person who sprinkles his conversation with Yiddishisms
- the belief that when you die, your soul flies up onto the roof and gets stuck there
- an opening in the front of boxer shorts worn by Jewish men
Reversable collocations. Some are perfect matches, for others you have to use your imagination. They have one thing in common: reversed, they are just as valid collocations, sometimes with a surprising meaning. Reversing collocations reveals a lot about their inherent semantic structure.
Using an adequate set of collocates (in this case 20 words) and a permutation generator, you can easily generate hundreds of collocations. Possibly, the number of collocations in a given language is at least one if not two orders bigger than the number of words. If the number of distinct words in EN is estimated to be in the billions, the number of EN collocations must be in the ten to hundred billion range. Is there a point in collecting and recording collocations? I am open for interesting arguments pro and con.
My hand-made glossary of all Odyssean terms including our childhood's favourites like "bright-eyed Athene", "wine-dark sea", "rosy-fingered Dawn", "long suffering Odysseus"... Enjoy and add more if any else springs to your mind.
Thankyou for the deep concern you have for my lists.
Actions speak louder than words.I have removed all unimaginative lists.
How true that lexicographers are thieves/kidnappers.Until the next
wordhiccupnik ....keep collocating
Ha csak nem vagy te is angol szakos az ELTE-n, akkor nem vagyok biztos abban, hogy ismerjük egymást. Ettől függetlenül köszi az üdvözlést :) Még csak ismerkedem a Wordnikkel, úgyhogy lehet, hogy majd hozzád fordulok 1-2 kérdéssel.