Algorithm to implement a word cloud like Wordle

LayoutFontsTypographyTag CloudWordle Word-Cloud

Layout Problem Overview


Context

My Questions

  • Is there an algorithm available that does what Wordle does?
  • If no, what are some alternatives that produces similar kinds of output?

Why I'm asking

  • just curious
  • want to learn

Layout Solutions


Solution 1 - Layout

I'm the creator of Wordle. Here's how Wordle actually works:

Count the words, throw away boring words, and sort by the count, descending. Keep the top N words for some N. Assign each word a font size proportional to its count. Generate a Java2D Shape for each word, using the Java2D API.

Each word "wants" to be somewhere, such as "at some random x position in the vertical center". In decreasing order of frequency, do this for each word:

place the word where it wants to be
while it intersects any of the previously placed words
    move it one step along an ever-increasing spiral

That's it. The hard part is in doing the intersection-testing efficiently, for which I use last-hit caching, hierarchical bounding boxes, and a quadtree spatial index (all of which are things you can learn more about with some diligent googling).

Edit: As Reto Aebersold pointed out, there's now a book chapter, freely available, that covers this same territory: Beautiful Visualization, Chapter 3: Wordle

Solution 2 - Layout

Here's a really nice javascript one from Jason Davies that uses d3. You can even use webfonts with it.

Demo: http://www.jasondavies.com/wordcloud/

Github: https://github.com/jasondavies/d3-cloud

Solution 3 - Layout

I've implemented an algorithm as described by Jonathan Feinberg using python to create a tag cloud. It is far away from the beautiful clouds of wordle.net but it gives you an idea how it could be done.

You can find the project here.

Solution 4 - Layout

I've created a Silverlight component that uses the algorithm Jonathan suggests here. The source code and example projects are all available on my blog:

http://whydoidoit.com

Color word cloud

My cloud lets you color and size words based on different weightings and it supports word selection (from a coordinate) and selected word highlighting. The source is yours to use as you see fit.

Example Word Cloud

Solution 5 - Layout

I'm working on WordCram, a Processing library for making word clouds. It's pretty heavily influenced by Wordle, and is informed by the same PDF aeby linked to above. It handles the collision detection for you, and lets you focus on how you want your words laid out, colored, rotated, etc.

Solution 6 - Layout

http://code.google.com/apis/visualization/documentation/gallery.html

Check out the word cloud visualization. Not as fancy as wordle.net but real easy to add to your site.

Solution 7 - Layout

I was looking for a wordle-like visualization which would allow to assign color, initial position and size of a String related to other data, such as the relevance within a text - didn't find anything, but thanks to the information I found here (Especially Jonathan's explanation and aeby's link), I could finally implement 'Cloudio', which comes relatively close to wordle (at least I think so...) and offers the features I was looking for.

It is implemented with SWT and JFace, and I tried to integrate it into the MVC-model of JFace, such that you can set content- and label-providers to modify the layout of a cloud and add it to other Eclipse-plugins or RCP apps. You can also modify the way the initial position of a string is calculated, such that is not difficult to use it for cluster visualization or else. It is still poorly documented and limited in some ways (and I did the initial upload a few hours ago, so it might still be a bit buggy), but if you're interested, here's the link:

And here's a link to some created clouds, in case you want a quick impression: https://github.com/sschwieb/Cloudio/wiki/Example-Clouds

Cheers, Stephan

Solution 8 - Layout

Here see my implementation of Wordle like cloud. It uses the same spiral algorithm and the QuadTree data structure.

http://sourcecodecloud.codeplex.com

or

http://www.codeproject.com/Articles/224231/Word-Cloud-Tag-Cloud-Generator-Control-for-NET-Win

Solution 9 - Layout

Lion and Lamb is an open-source iOS app that creates word clouds using the most frequent words from a chosen book of the Bible.

It's based on the algorithm as described by Jonathan Feinberg. Hit testing does utilize a quad tree, but the bounding boxes are based on the glyph's bounding rectangle. I want to break the glyph down into many smaller bounding rects to enable word placement within a glyph's bounding box.

GitHub: https://github.com/PetahChristian/LionAndLamb

A word cloud of the Bible book of Revelation

Solution 10 - Layout

I have a Tag Cloud generator here, which I call Disorganizer :)

Sources TagCloudService and the razor markup control and a WinForm for testing purposes that you can put in your blog, profile etc, with a little wrapper around it. It uses C# 4.0 & System.Drawing namespace heavily.

I created it because with the other cloud generators you cannot click on tags to navigate and cannot create hover animations, to show that they are clickable. Since showing hover animation in HTML is necessary for me (I'm doing this with overlay-ed, absolutely-positioned <a> tags) I haven't developed any-angle word display - they are either vertical or horizontal.

Warning :The above links may go invalid in a few months, I plan to slowly untie it from the surrounding project into a separate project.

You can see a working demo on this sample blog post, but it is incomplete, and in an incomplete site. Contact me if anyone wants to contribute, I will get on with separating it out asap.

Solution 11 - Layout

Here is yet another end-to-end implementation of wordle in Python 3 largely based on the initial outline by Jonathan Feinberg (QuadTrees, spirals, etc.).

The code (commented, with detailed ReadMe file) is freely available at this Github repository and this is a sample wordle created with the code.

Macbeth

Solution 12 - Layout

I've implemented a word cloud generator called WordCloud.jl in Julia language. A brief description about its algorithm can be found here.
Unlike most other implementations, I designed it based on gradient optimization. It’s a non-greedy algorithm in which words can be further moved after they are positioned. Thus the size of the words and the shape and size of the background mask can be kept unchanged in the generation process. This makes the outputs more accurate and easy to customize. Furthermore, we can also generate some fancy outputs like these:
Comparison of Obama's and Trump's inaugural address and Wikipedia: Julia
comparison wordcloud

julia wordcloud

Solution 13 - Layout

There is a pretty nice little JavaScript library made by Tim Dream:

https://github.com/timdream/wordcloud2.js/blob/gh-pages/API.md

It can create a word cloud on a canvas or with HTML tags with a lot of options to modify the result. It comes really close to wordle's output.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionnamenlosView Question on Stackoverflow
Solution 1 - LayoutJonathan FeinbergView Answer on Stackoverflow
Solution 2 - LayoutjohnpolacekView Answer on Stackoverflow
Solution 3 - LayoutReto AebersoldView Answer on Stackoverflow
Solution 4 - LayoutMike TalbotView Answer on Stackoverflow
Solution 5 - LayoutDan BernierView Answer on Stackoverflow
Solution 6 - LayoutWavelView Answer on Stackoverflow
Solution 7 - LayoutsschwiebView Answer on Stackoverflow
Solution 8 - LayoutGeorge MamaladzeView Answer on Stackoverflow
Solution 9 - Layoutuser4151918View Answer on Stackoverflow
Solution 10 - LayoutZaszView Answer on Stackoverflow
Solution 11 - LayoutHaykView Answer on Stackoverflow
Solution 12 - LayoutguoyongzhiView Answer on Stackoverflow
Solution 13 - Layoutn.r.View Answer on Stackoverflow