Kun Shan University Institutional Repository:Item 987654321/30758
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 26442/27038 (98%)
Visitors : 11838976      Online Users : 278
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version

    Please use this identifier to cite or link to this item: http://ir.lib.ksu.edu.tw/handle/987654321/30758

    Title: 基於檢索生成式架構之電影討論語料庫開發
    Other Titles: Development of the film discussion corpus based on retrieval generative architecture
    Authors: 何應承
    He, Ying-Cheng
    advisor: 鄭朝榮
    Cheng, Chao-Jung
    Keywords: 序列到序列;深度學習;檢索式模型;生成式模型;聊天機器人
    Seq2Seq;Deep Learning;Retrieval-Based Model;Generative Model;Chatbot
    Date: 2019
    Issue Date: 2020-03-24 11:34:59 (UTC+8)
    Abstract: 目前聊天機器人的對話設計,因為成本因素大多無使用語料庫,而直接套用預設的問答對話,當使用者詢問相關的關鍵字,聊天機器人才會回應,對消費者來說較無吸引力及實用性。若能結合用戶常用的通訊軟體,與顧客24小時隨時互動,店家將可以透過聊天機器人與顧客聊天並探知使用者的喜好。因此,本論文將開發基於自然語言處理(Natural Language Processing)的電影語料庫及電影知識庫,以PTT電影看板為例,使用網路爬蟲程式,將網友們討論電影主題的內容爬取下來,先以Jieba斷詞演算法處理後,電影語料庫部份採用Seq2Seq模型訓練,訓練好的Seq2Seq模型即為聊天機器人的電影問答模組。為了提高系統的精確性,本論文結合檢索式與生成式架構所組成的語料庫,有兩種模式,預設先進入檢索模式,當使用者詢問PTT電影看板討論的相關電影主題,經過檢索式模型問答配對,使用BM25適用性判斷是否輸出檢索式電影知識庫中對應的詞句。若沒有達到BM25判斷的條件,則系統進入Seq2Seq模式,直接讀取訓練好的電影問答模組,生成回答給使用者。檢索生成式架構的聊天機器人能與使用者以互動方式討論更多的電影知識話題,相較於舊版聊天機器人使用Dialogflow、wit.ai自訂模組的Q&A問答,本論文可減少設定意圖(Intents)和關鍵字(Entities)規則等的繁複性工作。
    Most of the current chatbot's dialogue design do not use a corpus because the cost is very high. When the user asks the chatbot some kind of related keywords, the chatbot usually directly responds with a dialogue answer which has existed in the database even if it is less attractive to consumers. In fact, if the stores can provide the customer service software to chat with the customers in anytime, they will be able to find out the user's preferences. In this paper, PTT movie bulletin board is used as a resource to carry out Natural Language Processing to obtain a film corpus. The web crawler is used to crawl the contents of the movie themes discussed by the netizens. The contents were first processed by the Jieba word-breaking algorithm to produce the film corpus. In order to improve the accuracy of the system, this paper combines the corpus of search and generative architecture. There are two modes. The search model is the default mode. When the user asks about the related movie theme derived from the PTT movie board discussion, the model question-and-answer pairing uses the BM25 applicability judgment method to determine whether to output the corresponding words in the search model corpus. If the condition of BM25 judgment is not met, the Seq2Seq model is adopted and the trained movie question answering module will provide the sentence derived from the generated corpus. In brief, the search-generation architecture dialogue system allows chatbots and users to discuss more movie knowledge topics interactively. Furthermore, compared to the old version of the chatbot in which Dialogflow and wit.ai custom modules are required to produce Q&A, this paper can reduce the setting process about the tedious work of intents and entities rules.
    Appears in Collections:[Department of Information Engineering & Graduate School of Digital Life Technology] Dissertations and Theses

    Files in This Item:

    File Description SizeFormat
    107KSUT0392003-001.pdf5802KbAdobe PDF8View/Open

    All items in KSUIR are protected by copyright, with all rights reserved.

    ©Kun Shan University Library and Information Center
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback