形態素数え上げプログラムの修正 - 駆け出しエンジニアの作業ノート

前に書いた記事に載せた、形態素ごとに数え上げるプログラムが早速動かなくなったので、修正します。

psyduck-take-it-easy.hatenablog.com

import MeCab
import io
import pandas

def text_analysis():
    text = ""
    m = MeCab.Tagger("-Ochasen -u /usr/local/lib/mecab/dic/ipadic/user.dic")
    sentence = m.parse(text)
    sentence = io.StringIO(sentence)
    sentence = pandas.read_csv(sentence, sep='\t', header=None)
    sentence = sentence[0].value_counts()
    print(sentence)

if __name__ == '__main__':
    text_analysis()

textに解析したい本文を代入するのですが、そのまま入れると以下のようなエラーが発生します。

SyntaxError: Non-UTF-8 code starting with '\xe3' in file [ファイル名] on line 6, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

現状として、最初に"\xe3"を本文の最初に入れて、

text = "\xe3[テキスト本文]"

となるようにしてみました。