小農的家: 寫個聊天機器人吧

在 2019 年 01 月 15 日的時空下， Anaconda 3 (2018.12) 的 Python 3.7.1 為了相容 PyAudio 0.2.11，必須建立 Python 3.6 虛擬環境。

建立虛擬環境

conda create -n python36env python=3.6 anaconda

安裝 SpeechRecognition 套件 (此時版本為 3.8.1)

pip install SpeechRecognition

安裝 PyAudio 套件 (此時版本為 0.2.11)

pip install PyAudio

安裝 gTTS 套件 (此時版本為 2.0.3)

pip install gTTS

安裝 pygame 套件 (此時版本為 1.9.4)

pip install pygame

第一版－機器鸚鵡

import speech_recognition as sr
recogn = sr.Recognizer()
with sr.Microphone() as source:
speech = recogn.listen(source)
text = recogn.recognize_google(speech, language='zh-tw')
print(text)

from gtts import gTTS
tts = gTTS(text, lang='zh-tw')

import tempfile
temp = tempfile.TemporaryFile().name + '.mp3'
tts.save(temp)

from pygame import mixer
mixer.init()
mixer.music.load(temp)
mixer.music.play()
tempfile.TemporaryFile().close()

recognize_google() 函式可以將聲音轉換成 language 參數指定的語言，translate() 函式與 gTTS() 函式亦是如此。

**********************************************************************************************************************

接著把只會複誦的機器鸚鵡，使用 googletrans 套件改成機器英語翻譯。

安裝 googletrans 套件 (此時版本為 2.4.0)

pip install googletrans

第二版－機器英語翻譯(使用 googletrans 套件)

import speech_recognition as sr
recogn = sr.Recognizer()
with sr.Microphone() as source:
speech = recogn.listen(source)
text = recogn.recognize_google(speech, language='zh-tw')
print(text)

from googletrans import Translator
ts = Translator()
text = ts.translate(text, dest='en').text
print(text)

from gtts import gTTS
tts = gTTS(text, lang='en')

import tempfile
temp = tempfile.TemporaryFile().name + '.mp3'
tts.save(temp)

from pygame import mixer
mixer.init()
mixer.music.load(temp)
mixer.music.play()
tempfile.TemporaryFile().close()

**********************************************************************************************************************

接著換利用 TextBlob 套件，改成機器日語翻譯。

安裝 TextBlob 套件 (此時版本為 0.15.2)

pip install textblob

第三版－機器日語翻譯(使用 TextBlob 套件)

import speech_recognition as sr
recogn = sr.Recognizer()
with sr.Microphone() as source:
speech = recogn.listen(source)
text = recogn.recognize_google(speech, language='zh-tw')
print(text)

from textblob import TextBlob
blob = TextBlob(text)
text = blob.translate(to='ja')
print(text)

from gtts import gTTS
tts = gTTS(text.raw, lang='ja')

import tempfile
temp = tempfile.TemporaryFile().name + '.mp3'
tts.save(temp)

from pygame import mixer
mixer.init()
mixer.music.load(temp)
mixer.music.play()
tempfile.TemporaryFile().close()

**********************************************************************************************************************

接著使用第一版的機器鸚鵡將識別出的關鍵字以 Wikipedia 查詢，再將結果以正規式處理後唸出來。

第四版－聊天機器人(使用 Wikipedia)

import speech_recognition as sr
recogn = sr.Recognizer()
with sr.Microphone() as source:
speech = recogn.listen(source)
text = recogn.recognize_google(speech, language='zh-tw')
print(text)

from bs4 import BeautifulSoup
import requests
response = requests.get('https://zh.wikipedia.org/zh-tw/'+text)
bs = BeautifulSoup(response.text, 'lxml')
p_list = bs.find_all('p')
for p in p_list:
if text in p.text[0:10]:
content = p.text
else:
content = '你的問題連維基百科都沒有'

import re
text = re.sub(r'\[[^\]]+\]', '', content)
print(text)

from gtts import gTTS
tts = gTTS(text, lang='zh-tw')

import tempfile
temp = tempfile.TemporaryFile().name + '.mp3'
tts.save(temp)

from pygame import mixer
mixer.init()
mixer.music.load(temp)
mixer.music.play()
tempfile.TemporaryFile().close()

這個範例中的正規式處理應該放在前一個 if statements 的最後，以避免 else statements 多做不必要的正規式處理；現在的寫法則是為了方便擴充 else statements 的處理能力。

**********************************************************************************************************************

接著加入對話字典，因為只有 '你是誰'、'我是誰'、'你幾歲' 三句對話，所以命名為我是聊三句，這三句之外的，就送去查詢維基百科。

第五版－我是聊三句

import speech_recognition as sr
recogn = sr.Recognizer()
with sr.Microphone() as source:
speech = recogn.listen(source)
text = recogn.recognize_google(speech, language='zh-tw')
print(text)

QA = {'你是誰':'我是聊三句', '我是誰':'你是超級大帥哥', '你幾歲':'這是秘密'}
if text in QA:
text = QA[text]
print(text)
else:
from bs4 import BeautifulSoup
import requests
response = requests.get('https://zh.wikipedia.org/zh-tw/'+text)
bs = BeautifulSoup(response.text, 'lxml')
p_list = bs.find_all('p')
for p in p_list:
if text in p.text[0:10]:
content = p.text
else:
content = '你的問題連維基百科都沒有'
import re
text = re.sub(r'\[[^\]]+\]', '', content)
print(text)

from gtts import gTTS
tts = gTTS(text, lang='zh-tw')

import tempfile
temp = tempfile.TemporaryFile().name + '.mp3'
tts.save(temp)

from pygame import mixer
mixer.init()
mixer.music.load(temp)
mixer.music.play()
tempfile.TemporaryFile().close()

**********************************************************************************************************************

最後打包聊天機器人為獨立執行檔

安裝 pyinstaller 套件 (此時版本為 3.4)

conda install -c conda-forge pyinstaller

打包為單一執行檔 chatbot.exe

pyinstaller -F chatbot.py

完成 ^_^

6 則留言:

崛智科技2019年2月11日下午5:51
原本使用 TemporaryFile 的方法並沒有真正將所暫存的 mp3 檔刪除, 建議改為下面寫法:

text = '你好嗎? 你叫甚麼名字? 6'
with tempfile.TemporaryFile(dir='C:/Home/Projects/AI/speech/',delete=True) as fp:
tts = gTTS(text, lang='zh-tw')
tts.write_to_fp(fp)
print(fp.name)
mixer.init()
fp.seek(0)
mixer.music.load(fp)
mixer.music.play()
input('Press (Enter) to continue>')
回覆刪除
回覆
崛智科技2019年2月11日晚上7:10
from pygame import time

def speak2(text, lang='en'):
try:
tts = gTTS(text,lang=lang)
mixer.init()
with tempfile.TemporaryFile(dir='C:/Home/Projects/AI/speech/',delete=True) as sf:
print('%s'%(sf.name))
tts.write_to_fp(sf)
sf.seek(0)
clock = time.Clock()
mixer.music.load(sf)
mixer.music.play()
while mixer.music.get_busy():
# check if playback has finished
clock.tick(30)
print('DONE...')
except Exception:
raise

speak2('你好嗎? 你叫甚麼名字?',lang='zh-tw')
回覆刪除
回覆
shihnung2019年2月14日凌晨12:14
謝謝您的回饋，又多了實用的方法 ^_^
程式測試時，我有特別觀察 tempfile 隨機產生的檔名，以及確實有自動 close，您可以再試試。
回覆刪除
回覆
xxx 開發團隊2020年4月22日凌晨4:58
我安裝這個版本Anaconda3-2020.02-Windows-x86_64.exe
建立的環境為 conda create -n python36env python=3.6 anaconda
執行這個指令是錯的 pip install PyAudio
是不是一定要照你的python 版本才能安裝pyaudio
回覆刪除
回覆

新增留言

2019年1月15日 星期二

寫個聊天機器人吧

6 則留言:

2019年1月15日星期二