To build a Q&A bot that can answer local files, you can follow these steps:
(1)a question-and-answer dataset
(2)Choose a machine learning model–>bert-base-nli-mean-tokens
# 讀取數據集
# df = pd.read_csv('chatbot_data.csv')
df = pd.read_csv('chatbot_data.csv', encoding='utf-8')#
print(df.shape)
print(df)
# 獲取預訓練BERT模型來生成句向量
model = SentenceTransformer('bert-base-nli-mean-tokens')
(3)Training model and save model
# 對訓練集生成句子向量
corpus_embeddings = model.encode(df['Questions'])
# pickle保存模型
with open('chatbot_model.pkl', 'wb') as f:
pickle.dump((model, df), f)
(4)get the answer
def get_answer(query):
# 對查詢生成向量
query_vec = model.encode([query])
# 計算相似度
sim_scores = cosine_similarity(query_vec, corpus_embeddings)
# 取最相似的下標
index = sim_scores.argmax()
return df['Answers'].iloc[index]
(5)Test