python词云可视化方法总结记录【简单词云+背景图片词云+自定义字体颜色词云】

系统 2644 0

       词云是一种非常漂亮的可视化展示方式,正所谓一图胜过千言万语,词云在之前的项目中我也有过很多的使用,可能对于我来说,一种很好的自我介绍方式就是词云吧,就像下面这样的:

python词云可视化方法总结记录【简单词云+背景图片词云+自定义字体颜色词云】_第1张图片

     个人觉还是会比枯燥的文字语言描述性的介绍会更吸引人一点吧。

      今天不是说要怎么用词云来做个人介绍,而是对工作中使用到比较多的词云计较做了一下总结,主要是包括三个方面:

1、诸如上面的简单形式矩形词云

2、基于背景图片数据来构建词云数据

3、某些场景下不想使用类似上面的默认的字体颜色,这里可以自定义词云的字体颜色

      接下来对上面三种类型的词云可视化方法进行demo实现与展示,具体如下:

      这里我们使用到的测试数据如下:

            
              The Zen of Python, by Tim Peters
            Beautiful is better than ugly.
        Explicit is better than implicit.
        Simple is better than complex.
        Complex is better than complicated.
        Flat is better than nested.
        Sparse is better than dense.
        Readability counts.
        Special cases aren't special enough to break the rules.
        Although practicality beats purity.
        Errors should never pass silently.
        Unless explicitly silenced.
        In the face of ambiguity, refuse the temptation to guess.
        There should be one-- and preferably text one --obvious way to do it.
        Although that way may not be obvious at first unless you're Dutch.
        Now is better than never.
        Although never is often better than *right* now.
        If the implementation is hard to explain, it's a bad idea.
        If the implementation is easy to explain, it may be a good idea.
        Namespaces are one honking great idea -- let's do more of those!
            
          


1、简单形式矩形词云实现如下:

            
              def simpleWC1(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):
    '''
    词云可视化Demo
    '''
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict=freDictpath
    wc=WordCloud(font_path='font/simhei.ttf',#设置字体  #simhei
                background_color=back, #背景颜色
                max_words=1300,# 词云显示的最大词数
                max_font_size=120, #字体最大值
                margin=3,  #词云图边距
                width=1800,  #词云图宽度
                height=800,  #词云图高度
                random_state=42)
    wc.generate_from_frequencies(fre_dict)  #从词频字典生成词云
    plt.figure()  
    plt.imshow(wc)
    plt.axis("off")
    wc.to_file(savepath)
            
          

      图像数据结果如下:
 

python词云可视化方法总结记录【简单词云+背景图片词云+自定义字体颜色词云】_第2张图片

2、 基于背景图像数据的词云可视化具体实现如下:

    先贴一下背景图像:

python词云可视化方法总结记录【简单词云+背景图片词云+自定义字体颜色词云】_第3张图片

     这也是一个比较经典的图像数据了,下面来看具体的实现:

            
              def simpleWC2(sep=' ',back='black',backPic='a.png',freDictpath='data_fre.json',savepath='res.png'):
    '''
    词云可视化Demo【使用背景图片】
    '''
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict=freDictpath
    back_coloring=imread(backPic)
    wc=WordCloud(font_path='simhei.ttf',#设置字体  #simhei
                background_color=back,max_words=1300,
                mask=back_coloring,#设置背景图片
                max_font_size=120, #字体最大值
                margin=3,width=1800,height=800,random_state=42,)
    wc.generate_from_frequencies(fre_dict)  #从词频字典生成词云
    wc.to_file(savepath)
            
          

      结果图像数据如下:

python词云可视化方法总结记录【简单词云+背景图片词云+自定义字体颜色词云】_第4张图片

3、 自定义词云字体颜色的具体实现如下:
 

            
              #自定义颜色列表
color_list=['#CD853F','#DC143C','#00FF7F','#FF6347','#8B008B','#00FFFF','#0000FF','#8B0000','#FF8C00',
            '#1E90FF','#00FF00','#FFD700','#008080','#008B8B','#8A2BE2','#228B22','#FA8072','#808080']


def simpleWC3(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):
    '''
    词云可视化Demo【自定义字体的颜色】
    '''
    #基于自定义颜色表构建colormap对象
    colormap=colors.ListedColormap(color_list)  
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict=freDictpath
    wc=WordCloud(font_path='font/simhei.ttf',#设置字体  #simhei
                background_color=back,  #背景颜色
                max_words=1300,  #词云显示的最大词数
                max_font_size=120,  #字体最大值
                colormap=colormap,  #自定义构建colormap对象
                margin=2,width=1800,height=800,random_state=42,
                prefer_horizontal=0.5)  #无法水平放置就垂直放置
    wc.generate_from_frequencies(fre_dict)
    plt.figure()  
    plt.imshow(wc)
    plt.axis("off")
    wc.to_file(savepath)
            
          

      结果图像数据如下:

python词云可视化方法总结记录【简单词云+背景图片词云+自定义字体颜色词云】_第5张图片

      上述三种方法就是我在具体工作中使用频度最高的三种词云可视化展示方法了,下面贴出来完整的代码实现,可以直接拿去跑的:

            
              #!usr/bin/env python
#encoding:utf-8
from __future__ import division

'''
__Author__:沂水寒城
功能: 词云的可视化模块
'''

import os
import sys
import json
import numpy as np
from PIL import Image
from scipy.misc import imread
from matplotlib import colors
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
from wordcloud import WordCloud,ImageColorGenerator,STOPWORDS

reload(sys)
sys.setdefaultencoding('utf-8')

#自定义颜色列表
color_list=['#CD853F','#DC143C','#00FF7F','#FF6347','#8B008B','#00FFFF','#0000FF','#8B0000','#FF8C00',
            '#1E90FF','#00FF00','#FFD700','#008080','#008B8B','#8A2BE2','#228B22','#FA8072','#808080']



def simpleWC1(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):
    '''
    词云可视化Demo
    '''
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict=freDictpath
    wc=WordCloud(font_path='font/simhei.ttf',#设置字体  #simhei
                background_color=back, #背景颜色
                max_words=1300,# 词云显示的最大词数
                max_font_size=120, #字体最大值
                margin=3,  #词云图边距
                width=1800,  #词云图宽度
                height=800,  #词云图高度
                random_state=42)
    wc.generate_from_frequencies(fre_dict)  #从词频字典生成词云
    plt.figure()  
    plt.imshow(wc)
    plt.axis("off")
    wc.to_file(savepath)


def simpleWC2(sep=' ',back='black',backPic='a.png',freDictpath='data_fre.json',savepath='res.png'):
    '''
    词云可视化Demo【使用背景图片】
    '''
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict=freDictpath
    back_coloring=imread(backPic)
    wc=WordCloud(font_path='simhei.ttf',#设置字体  #simhei
                background_color=back,max_words=1300,
                mask=back_coloring,#设置背景图片
                max_font_size=120, #字体最大值
                margin=3,width=1800,height=800,random_state=42,)
    wc.generate_from_frequencies(fre_dict)  #从词频字典生成词云
    wc.to_file(savepath)


def simpleWC3(sep=' ',back='black',freDictpath='data_fre.json',savepath='res.png'):
    '''
    词云可视化Demo【自定义字体的颜色】
    '''
    #基于自定义颜色表构建colormap对象
    colormap=colors.ListedColormap(color_list)  
    try:
        with open(freDictpath) as f:
            data=f.readlines()
            data_list=[one.strip().split(sep) for one in data if one]
        fre_dict={}
        for one_list in data_list:
            fre_dict[unicode(one_list[0])]=int(one_list[1])
    except:
        fre_dict=freDictpath
    wc=WordCloud(font_path='font/simhei.ttf',#设置字体  #simhei
                background_color=back,  #背景颜色
                max_words=1300,  #词云显示的最大词数
                max_font_size=120,  #字体最大值
                colormap=colormap,  #自定义构建colormap对象
                margin=2,width=1800,height=800,random_state=42,
                prefer_horizontal=0.5)  #无法水平放置就垂直放置
    wc.generate_from_frequencies(fre_dict)
    plt.figure()  
    plt.imshow(wc)
    plt.axis("off")
    wc.to_file(savepath)



if __name__ == '__main__':
    text="""
        The Zen of Python, by Tim Peters
        Beautiful is better than ugly.
        Explicit is better than implicit.
        Simple is better than complex.
        Complex is better than complicated.
        Flat is better than nested.
        Sparse is better than dense.
        Readability counts.
        Special cases aren't special enough to break the rules.
        Although practicality beats purity.
        Errors should never pass silently.
        Unless explicitly silenced.
        In the face of ambiguity, refuse the temptation to guess.
        There should be one-- and preferably text one --obvious way to do it.
        Although that way may not be obvious at first unless you're Dutch.
        Now is better than never.
        Although never is often better than *right* now.
        If the implementation is hard to explain, it's a bad idea.
        If the implementation is easy to explain, it may be a good idea.
        Namespaces are one honking great idea -- let's do more of those!
        """
    word_list=text.split()
    fre_dict={}
    for one in word_list:
        if one in fre_dict:
            fre_dict[one]+=1
        else:
            fre_dict[one]=1
    simpleWC1(sep=' ',back='black',freDictpath=fre_dict,savepath='simpleWC1.png')
    simpleWC2(sep=' ',back='black',backPic='backPic/A.png',freDictpath=fre_dict,savepath='simpleWC2.png')
    simpleWC3(sep=' ',back='black',freDictpath=fre_dict,savepath='simpleWC3.png')
            
          

       记录一下,欢迎交流学习。


更多文章、技术交流、商务合作、联系博主

微信扫码或搜索:z360901061

微信扫一扫加我为好友

QQ号联系: 360901061

您的支持是博主写作最大的动力,如果您喜欢我的文章,感觉我的文章对您有帮助,请用微信扫描下面二维码支持博主2元、5元、10元、20元等您想捐的金额吧,狠狠点击下面给点支持吧,站长非常感激您!手机微信长按不能支付解决办法:请将微信支付二维码保存到相册,切换到微信,然后点击微信右上角扫一扫功能,选择支付二维码完成支付。

【本文对您有帮助就好】

您的支持是博主写作最大的动力,如果您喜欢我的文章,感觉我的文章对您有帮助,请用微信扫描上面二维码支持博主2元、5元、10元、自定义金额等您想捐的金额吧,站长会非常 感谢您的哦!!!

发表我的评论
最新评论 总共0条评论