This project is a chatterbot built by Abhimanyu Deora and Caleb Scott (NCSSM '15) during the 2015 NCSSM Miniterm. Built in Java, the chatbot uses Markov modeling to generate responses.
Two 5th-order Markov models are used - one modeling the text forward, and the other modeling the text backwards. The Markov models are initially generated from a given set of training data, and periodically re-generated from the user's inputs.
When a user's input is receieved, the following steps are taken to generate a response:
- Extraneous words (the, an, a, etc.) stripped from response
- Certain words transitioned to alternate forms (you -> I, your -> my, why -> because, etc.)
- Keywords identified based on known list of verbs and nouns
- Keywords used to seed Markov generation of set of candidate responses
- Candidate responses evaluated based on number of keywords from original input present in response
- Optimal response returned to user
ElizaImp3.java
- the core of the chatbot: processes user input, gets response from Markov3.java, gets keywords from KeyFinder.javaMarkov3.java
- the Markov modeling engine: trains itself on given source data, generates responses based on given keywords, selects most appropriate responseKeyFinder.java
- keyword identifier: identifies keys based on a given set of verbs and nounsSlangBanger.java
- slang identifier: an in-progress tool to generate associations between words and phrases in order to identify slang termsconversation-data.txt
- training data: a formatted list of possible responses - not used by the chatterbotconversation-data-single-line.txt
- training data:conversation-data.txt
formatted onto a single line (\n
characters replaced with spaces) - used by the chatterbot to train itselfnouns.txt
- noun list: a list of nouns used byKeyFinder.java
to identify keywordsverbs.txt
- verb list: a list of verbs used byKeyFinder.java
to identify keywords