textstat:文本可读性计算包

2015 年 4 月 20 日

Python网络爬虫与文本数据分析

textstat是python的文本可读性计算包，可以计算 文章层面、段落层面·句子层面 的文本的

音节统计syllable_count
词汇数统计lexicon_count
句子数统计sentence_count
各种可读性算法

目前支持的语言有英语en、德语de、西班牙语es、法语fr、意大利语it、荷兰语nl、波兰语pl、俄语ru， 目前不支持中文 呢。

可读性计算方法有

The Flesch Reading Ease formula
Flesch-Kincaid Grade Level
The Fog Scale (Gunning FOG Formula)
The SMOG Index
Automated Readability Index
The Coleman-Liau Index
Linsear Write Formula
Dale-Chall Readability Score

安装

音节统计

textstat.syllable_count(text)

Run

词汇统计

textstat.lexicon_count(text, removepunct=True)

Run

可读性

输入text，返回可读性值。

textstat.flesch reading ease(text)
textstat.smog_index(text)
textstat.flesch kincaid grade(text)
textstat.coleman liau index(text)
textstat.automated readability index(text)
textstat.dale chall readability_score(text)
textstat.difficult_words(text)
textstat.linsear write formula(text)
textstat.gunning_fog(text)
textstat.text_standard(text)

每种算法大家请移步到github项目链接

https://github.com/shivam5992/textstat

查看计算原理及得分的解读。

Run

Run

About The Author

bjmayor

程序员，码农，php,python,ios,android,go，产品经理，创业。