TagCrowd is a web application for visualizing word frequencies in any text by creating what is popularly known as a word cloud, text cloud or tag cloud.
It was created by Daniel Steinbock while a PhD student at Stanford University.
A word cloud is a beautiful, informative image that communicates much in a single glance. TagCrowd specializes in making word clouds easy to read, analyze and compare, for a variety of useful purposes:
The list goes on and continues to grow.
There are three ways of entering text into TagCrowd to generate a word cloud:
After providing your text source, hit the Visualize! button to see the result with the default options. You now have several options available to tweak your cloud into a form you're happy with.
Choose the written language of the text you are visualizing. TagCrowd maintains a list of common words (called a 'stop list') for each supported language so these words won't show up in your word cloud. If you wish to turn this function off, select 'none' for the language. If there are additional words you want to remove from your word cloud, see "Don't show these words" below.
TagCrowd works by counting the frequency of every word in your source text and visualizing the top N of these as a word cloud. You set the value of N with this field. The appropriate value will depend on your application and the size of your source text. In general, it's better to use smaller clouds for shorter source texts and larger clouds for longer source texts.
Words must appear at least this number of times in order to show up in the word cloud. For example, if you enter '2' for the minimum frequency, only words that appear at least twice in your text source will be included in your word cloud.
Marking 'yes' here will display the actual number of times each word appears in your text source.
TagCrowd uses the standard Porter Stemming Algorithm to detect and combine similar words. For example, the words 'teachers', 'teaching' and 'teach' will all be combined so your word cloud is less redundant. The most frequent of the variants is chosen to represent them all. In the case of a tie, the shorter variant is used.
You may see words in your cloud that are irrelevant. Type those words here to remove them from the cloud.
TagCrowd word clouds are free to use under a Creative Commons Attribution License. That means you can use them for commercial and non-commercial purposes as long as you attribute TagCrowd.com with a name and a link.
If you find TagCrowd valuable in your business and would like to ensure its continued existence, you can buy the developer a cup of coffee.
Because this is a personal project, I don't currently have the resources or server capacity to support an external API.
The text you enter into TagCrowd is not stored anywhere, nor is it ever shared with anyone. You are the only one who will ever see what you put in and get out of TagCrowd. That said, data transfers on unencrypted channels are by nature insecure. So you can have about as much confidence in the privacy of your data with TagCrowd as you do with sending unencrypted email.
Save your word cloud as PDF by clicking on the 'Save as...' button under the cloud, then choosing PDF. You'll get a download link.
To make an image of the word cloud, take a screenshot. Here are screenshot instructions for Windows. On a Mac, just hit Apple-Shift-4 and drag a box around the cloud you want to save; you'll save a screenshot image to your Desktop. If you use Linux, you probably already know how to do this.
You’ll find a “CUSTOMIZE” section near the top of the HTML Embed code where you can customize some of the CSS styles to suit the style of your webpage. Custom styles include font and font size, overall cloud size, margins, padding, borders and background color.
In the future we’ll introduce controls for changing the color and fonts without having to edit CSS.
You can of course edit the CSS and HTML that lies outside the customize section, although it’s advanced and we can’t provide support for that.
Use a tilde character ~ between words you want to keep together. To do this, run a find & replace on the original text file and insert a ~ (tilde character) between the words you want to group. For example: replace 'New York' with 'New~York', 'word cloud' with 'word~cloud', etc. The resulting cloud will have non-breaking spaces inserted for the tilde.
TagCrowd uses language-specific lists of common words to keep word clouds relevant. You can always disable this feature by setting the Language to 'none'. To prevent particular words from being removed, add a ~ (tilde character) to the end of any word you want to preserve.
For example, 'IT' is an acronym for 'information technology', but it's also the common English word 'it'. Replace all occurrences of 'IT' with 'IT~' to keep it in the cloud. Just be careful you're only marking the words you actually want to keep. In this example, don't mark the common word 'it' as well in your text.
TagCrowd is Unicode compliant and offers basic support for many languages. Choose the language of your source text in the Options section of the TagCrowd application. "Basic support" currently means languages based on the Latin alphabet (i.e. most of Europe), and all accented characters are converted to plain characters. For example, the characters é, ä, ç become e, a, c. Since this is the first international version of TagCrowd, there will certainly be some bugs. Please let us know if you find any.
TagCrowd can only support languages for which we have a list of common words, known as a 'stop list' or 'stop words'. Currently supported languages include Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish and Swedish. (Click on your language to see the stop words we are using -- be sure to view them with Unicode UTF8 text encoding.)
TagCrowd reads the web document at the address you provide and extracts the text, ignoring the HTML code that surrounds it. Because of the wide variation in web standards compliance, this process isn't 100% reliable. Note: TagCrowd only crawls a single web document, not the whole domain. Also, TagCrowd can't discriminate between content text and extraneous text like navigation menus, so if you want a more focused cloud, try copy/pasting the exact text you want to visualize.
Upload the file to TagCrowd or link to it wherever it's posted on the web. Only plain text files are accepted (or HTML files if providing a web page URL). The max file size is 5 megabytes. Max for pasting is 500 kilobytes but depends more on your browser and computer than TagCrowd servers.
After you generate your word cloud, click the 'Save as...' button underneath. Click the option for 'HTML embed' and a box of HTML code will appear. Copy and paste the code into any web page that allows in-line stylesheets. Feel free to modify the colors and font sizes in the stylesheet to customize your cloud, as long as you keep the reference to TagCrowd. You can also add URLs into the links so the words in the cloud link somewhere. This code should work with most blog software -- but not all. If it doesn't, try moving the style information from the code into your blog's external stylesheet. We're working on a way to improve compatibility.
TagCrowd software only runs on TagCrowd.com. You are free to use our HTML and PDF embedding options, and save screenshots. The application itself is not currently for sale.