Word Clouds and Topic Modeling – Are they the same?

Word Clouds and Topic Modeling – Are they the same?

smwordcloudWord Clouds have become a powerful visualization tool in many social analytics tools. They are often used to covey what is commonly referred to as the “topics” of discussion in blog posts, documents, social media conversations, etc. A word cloud is essentially a visual representation of the term frequency count of words in the data set, while filtering out commonly used verbs, adjectives, pronouns, etc. A larger visual representation of a particular word, implies that it was used more frequently in the corpus of data being analyzed.  Some tools further classify these words into a predefined set of topics such as technology, music, sports, or politics and present these as topics to a user.

Word clouds tend to obscure the underlying topics of discussion, and only allow the most frequently used words to surface. There is no insight into the relationship between the words and the whether or not they appear in a related set of conversations. Topic modeling, on the other hand, is focused on discovering abstract topics in a collection of data, so as to discover the hidden semantic structure in a body of text. Natural Language Processing (NLP) techniques such as Latent Semantic Analysis and Latent Direchlet Analysis are examples of techniques used to perform topic modeling. When correctly implemented, topic modeling can be a very powerful tool in social media analytics. Topic modeling enables the partitioning of conversations, that all might have all referenced the same hashtag or keywords, into distinct threads or conversations that are focused on a distinct set of topics. Since the topics are abstract and machine-derived they are presented as a set of most commonly used words that are representative of the topic and a score that in some sense represents the importance of word in the topic. Note that the same word can appear in different topics with different scores.


As an example, consider a large corpus of Twitter posts from September containing the words “Galaxy S3 edge.” The figure below shows an example of the top 4 topics and a list of words used to describe each topic (words with a higher score are shown in a proportionally larger in font). Conversations associated with these topics are also separated into different buckets. A conversation may be associated with one or more topics. Topic 1 in this example reflects a set of conversations (see tweets on the right) that were discussing giveaway deals for the Galaxy Edge, iPhone exchange deals, and screen protectors. Similarly, Topic 4 indicates that there were conversations focused on a wireless charging vertical dock. Note that the word Samsung, which obviously has a high term frequency count, appears in multiple topics.




Most social media analytics tools currently available in the market today, do not perform topic modeling as defined in the machine learning and NLP literature. Word clouds are used as proxies for topics. For social media feeds such as Twitter, this is driven in part by the complexity associated with performing topic modeling on short text of the posts and the complexity associated with the content and structure of posts (mix of URLs, hashtags, symbols, etc.). Putting theory into practice can be a challenge.


Scraawl, a social media analytics tool developed by Intelligent Automation, Inc., offer topic modeling as one of its advanced analytics. Scraawl topic modeling uses a language-independent probabilistic model to discover patterns of word use within and across tweets.  The model uses this information to identify up to 8 abstract “topics” that occur in the collection; the probabilistic scores that represent the likelihood of these topics and up to 15 top words are displayed for each topic. Unique to Scraawl is its ability to use the topic modeling analytic as a filter. Users can filter their datasets based on topics of interest and then apply further analytics such as community detection or influence discovery to the subset of posts that discuss the topics of interest.


For more information on Scraawl, to request a demo, or to learn more about the wide range of advanced analytics offered in Scraawl’s professional, premium, or enterprise packages, visit www.scraawl.com. You can also sign up for a free personal account and start exploring some of Scraawl’s basic search and analytics capabilities.

Related Posts---

Back to top
Cheap NFL Jerseys Free Shipping