用Python爬取了《雪中悍刀行》数据，终于知道它为什么这么火了

绪论

大家好，我是J哥。

如何查找视频id？

项目结构：

展开全文

一.爬虫部分：

二.数据处理部分

#coding=gbkimportcsvimporttimecsvFile=open(\"data.csv\",\'w\',newline=\'\',encoding=\'utf-8\')writer=csv.writer(csvFile)cs5dmvRow=[]#print(csvRow)f=open(\"time走路能锻炼身体吗.txt\",\'r\',encoding=\'utf-8\')forlineinf:csvRow=int(line)#print(csvRow)timeArray=time.localtime(csvRow)csvRo纸灯笼手工制作w=time.strftime(\"%Y-%m-%d%H:%M:%S\",timeArray)print(csvRow)csvRow=csvRow.split()writer.writerow(csvRow)f.close()csvFile.close()

#coding=gbkimportcsvcsvFile=open(\"content.csv\",\'w\',newline=\'\',encoding=\'utf-8\')writer=csv酒店门口.writer梵高呐喊(csvFile)csvRow=[]f=open(\"content.txt\",\'r\',encoding=\'utf-8\')forlineinf:csvRow=line.split()writer.writerow(csvRow)f.close()csvFil幽默e.close()

三.数据分析

1.制作词云图

wc.py

importnumpyasnpimportreimportjiebafromwordcloudimportWordCloudfrommatplotlibim治疗血栓最好的方法portpyplotaspltfromPILimportImage#上面的包自己安装，不会的就百度f=open(\'content.txt\',\'r\',encoding=\'utf-8\')#这是数据源，也就是想生成词云的数据txt=f.read()#读取文件f.close()#关闭文件，其实用with就好，但个性自我介绍是懒得改了#如果是文章的话，需要用月加并念什么意思到jieba分词，分完之后也可以自己处理下再生成词云newtxt=re.sub(\"[A-Za-z0-9\\!\\%\\[\\]\\,\\。]\",\"\",txt)print(newtxt)words=jieba.lcut(newtxt)img=Image.open(r\'wc.jpg

\')#想要搞得形状img_array=np.array(img)#相关配置，里面这个collocations配置可以避免重罗格列酮片复wordcloud=WordCloud(background_color=\"white\",width=1080,height=960,font_path=\"../文悦新青年.otf\",max_words=150,scale=10,#清晰度max_font_size=100,mask=img_array,collocations=False).generate(newtxt)plt.imshow(wordcloud)plt.axis(\'off\')plt.show()wordcloud.to_file(\'wc.png

轮廓图：wc.jpg

在这里插入图片描述

词云图：result.png

（注：这里要把英文字母过滤掉）

效果图：DrawBar.html

效果图

总结

更多推荐

用Python爬取了《雪中悍刀行》数据,终于知道它为什么这么火了