Final Writeup: https://docs.google.com/document/d/e/2PACX-1vS5y_uyHZ650D299lw0y8cNAyFLzbQh0f8F_aOlV-sOv_KBlzLkJFOzMIvxPfHvpDoPU4sRgk0RaI2C/pub
Title: Summarizes the main idea of your project.
You can never have enough friends, hence, we decided to take this opportunity to build a friend, and as such, we have called our project Chat Buddy.
Who: Names and logins of all your group members.
Luke McDevitt – lwmcde || Jonathan Dou – xdou3 ||
Introduction: What problem are you trying to solve and why? o If you are implementing an existing paper, describe the paper’s objectives and why you chose this paper. o If you are doing something new, detail how you arrived at this topic and what motivated you. What kind of problem is this? Classification? Regression? Structured prediction? Reinforcement Learning? Unsupervised Learning? etc.
We’re trying to make a model to tackle the issue of simulating realistic human speech and interaction via text. The field offers a plethora of interesting subjects, but when we were initially discussing potential project topics, this one stuck out as distinct from many of the other ideas. We had both extensively enjoyed AIDungeon when we came across it earlier and thought we might take a somewhat similar track with this one. We do not plan to follow a set model or research paper for our work here, and will instead be trying to create what we deem an acceptable chatbot conversationalist. There isn’t a particularly easy evaluation method, so we aren’t going to be stuck fine-tuning parameters for a 1% boost to accuracy. Instead, we’ll be looking at the architecture and trying to elicit human responses from a fundamentally inhuman system. It should be interesting.
This is closest to a prediction problem. Given an input prompt, we are predicting and emulating a natural response to it(Our tentative plan may be subject to some change as we delve deeper into the actual implementation).
Related Work: Are you aware of any, or is there any prior work that you drew on to do your project? o Please read and briefly summarize (no more than one paragraph) at least one paper/article/blog relevant to your topic beyond the paper you are re-implementing/novel idea you are researching. o In this section, also include URLs to any public implementations you find of the paper you’re trying to implement. Please keep this as a “living list”--if you stumble across a new implementation later down the line, add it to this list. https://chatbotslife.com/replicate-your-friend-with-transformer-bc5efe3a1596
This article describes using a fairly basic transformer model to analyze a dataset of several thousand emails between a pair of individuals to model their interactions. The transformers are able to predict an acceptable response to a given input. Although we will be implementing a multi-headed transformer approach, different model, and utilizing a different dataset, this article does describe the rough outline of the our idea. It demonstrates the outlines of a similar implementation to the one we plan to take.
Data: What data are you using (if any)? o If you’re using a standard dataset (e.g. MNIST), you can just mention that briefly. Otherwise, say something more about where your data come from (especially if there’s anything interesting about how you will gather it). o How big is it? Will you need to do significant preprocessing?
We are going to use conversational datasets like NPS Chat corpus and other existing datasets out there that are suitable for training conversational AI. Dataset size is going to be significant, but I've never encountered a case where preprocessing time was even remotely comparable to the training/testing time of the model. Hence, we aren't particularly concerned about this aspect of our project.
Methodology: What is the architecture of your model? • How are you training the model? • If you are implementing an existing paper, detail what you think will be the hardest part about implementing the model here. • If you are doing something new, justify your design. Also note some backup ideas you may have to experiment with if you run into issues.
We are going to write a transformer model using tensorflow with multi-headed attention. We can going to compare the results with this with an implementation of GPT-2 (maybe 3 if accessible). If we have extra time we might write an LSTM model for another comparison.
Metrics: What constitutes “success?” o What experiments do you plan to run? o For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate? o If you are implementing an existing project, detail what the authors of that paper were hoping to find and how they quantified the results of their model. o If you are doing something new, explain how you will assess your model’s performance. o What are your base, target, and stretch goals?
Success is a blurry definition here. For us, success is having a reasonable coherent response output to a comprehensible input response. Accuracy is not the most applicable metric here, and honestly there isn’t a particularly obvious metric that comes to mind other than subjective interpretation and evaluation of our responses. We could use sine/cosine to grade our deviation from other responses, but it isn't ideal as there are numerous correct responses for an input in this case as opposed to a singular one in standard classification or translation problems.
The most feasible way to evaluate our work is, again, with subjective evaluation of the model’s outputs. We can point out obvious flaws such as grammatical errors and incoherent sentences, but it is quite difficult to have a benchmark assessment of our model’s progression and accuracy. Perhaps the usage of sin/cos vector to describe distance might work but there are too many accurate answers for this to prove a particularly feasible method as we see it.
Base goals – Have a working chatbot model
Target goals – Chatbot produces relatively comprehensible and coherent responses
Stretch goals – Chatbot has some memory and can hold a semi-extended conversation
Ethics: Choose 2 of the following bullet points to discuss; not all questions will be relevant to all projects so try to pick questions where there’s interesting engagement with your project. (Remember that there’s not necessarily an ethical/unethical binary; rather, we want to encourage you to think critically about your problem setup.) o What broader societal issues are relevant to your chosen problem space? o Why is Deep Learning a good approach to this problem? o What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain? o Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm? o How are you planning to quantify or measure error or success? What implications does your quantification have? o Add your own: if there is an issue about your algorithm you would like to discuss or explain further, feel free to do so. These two questions seemed to be fairly thought-provoking concepts regarding our project.
The primary broader societal issues regarding chatbots are increased connectivity, overreliance on technology, and the replacement of human interaction with digitized methods. These are not insignificant issues and if sufficiently accurate and captivating chatbots were created, perhaps human interaction might diminish significantly as a result. Increased prevalence of chatbots would ostensibly have no positive effects for society as a broader whole. This said, it is an interesting concept to discuss and an interesting enough project topic for us to work on.
Quantifying success in this project will likely be fairly subjective. We will certainly be able to grade it to some degree based on grammatical accuracy of its outputs and their coherency, but at the end of the day, this is not going to be an objective evaluation process. As such, we hope our final commit will have a respectable result, but having not delved particularly deep into the actual project yet, we are uncertain of the ease with which this can be accomplished.
Division of labor: Briefly outline who will be responsible for which part(s) of the project. As there are only 2 of us in our group, division should be roughly equal. We’ll end up hammering out the specifics later, but we plan on having
Challenges: What has been the hardest part of the project you’ve encountered so far? Our most significant issue so far has been addressing the issue of the lack of clear labels for the decoder to use when evaluating inputs. The inputs obviously go into the encoder, but there isn't really an easy way to pass anything into the decoder save an empty sentence which seems flawed.
What do you need to dedicate more time to?
Figuring out how to handle saving model weights and interacting with the saved weights easily.
What are you thinking of changing, if anything?
Possibly switching to a different approach such as using LSTM cells with persistent memory to avoidf some issues that we will likely face down the road. We'll see.
Built With
- keras
- python

Log in or sign up for Devpost to join the conversation.