Demo Mode

Project 4 of 7

Project 4: Chatbot

Apply your knowledge to build something amazing!

Chatbot Project Part 1

:information_source: Project Overview Difficulty Level: Intermediate
Estimated Time: 3-4 hours (Part 1) + 2-3 hours (Part 2)
Skills Practiced:

Natural Language Processing (NLP)

Text preprocessing and feature extraction

Machine Learning model training

JSON data handling

User interface development with Streamlit

In this project, you will use your knowledge in Natural Language Processing and Machine Learning to create a chatbot. You will need to create a new Google Colab Notebook named "P4_Chatbot.ipynb" before coding.

For Part 1, our focus would be text processing & model training using all the techniques that we have learned before in previous lessons.

:emoji:️ Project Roadmap

mermaid

graph TD
    A[Start: Download Dataset] --> B[Phase 1: Setup Environment]
    B --> C[Phase 2: Text Preprocessing]
    C --> D[Phase 3: Process Dataset]
    D --> E[Phase 4: Feature Extraction]
    E --> F[Phase 5: Train ML Models]
    F --> G[Phase 6: Test Models]
    G --> H[Phase 7: Build UI]
    H --> I[Part 2: Streamlit Web App]
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style I fill:#9f9,stroke:#333,stroke-width:2px

Project Overview

In our daily conversation, each sentence and words we choose to speak has its own intentions. There are situations where our sentences are different but the meaning we want to convey is the same. For example,

Sentence	Intent
Good Morning!	Greeting
Hi! How are you today?	Greeting
I am very sorry for troubling you.	Apology
I apologize for my mistake.	Apology

The chatbot is then programmed to provide an appropriate response in regards to the intents without considering how the sentences are formed. As example:

User Query	Chatbot Response
Good Morning!	Hi Human. How are you today?
Hello there
Hi!

By training a model that is able to recognize these intents behind every sentence, it can then be used by the chatbot to provide an appropriate response to the user's query.

In our dataset, the intents and chatbot response are provided in a json format as shown below:

text - the list of user / text inputs

intent - the intent of the user inputs

responses - the list of responses that the chatbot can provide based on identified intent

json

\{
    "intent": "Greeting",
    "text": [
        "Hi",
        "Hi there",
        "Hola",
        "Hello",
    ],
    "responses": [
        "Hi human, please tell me your Alex user",
        "Hello human, please tell me your Alex user",
        "Hola human, please tell me your Alex user"
    ]
\}

You may learn more about how a chatbot is created using this video: Link to Video

To learn more about the dataset, feel free to visit this link: https://www.kaggle.com/elvinagammed/chatbots-intent-recognition-dataset

Click on this link to download the dataset file in the form of json "intents.json": https://drive.google.com/file/d/1kd1J5KX5v6FEjr6sahivHFY38BwRJ-r-/view?usp=sharing

Click on the download icon to download the dataset:

For Google Colab users, upload your dataset into your colab file.

:warning: Common Pitfall Make sure the intents.json file is uploaded to your Colab environment before proceeding. If you skip this step, you'll get a "FileNotFoundError" when trying to read the JSON file!

Phase One: Download intents.json file

Make a copy of the template file found here and rename the copied file as "P4: Chatbot".
Run the code here to download the dataset onto your colab file. Wait for the download process to complete before proceeding to the next step.

If the download fails, you may import the dataset "intents.json" into your file. For collab users, run the following codes to import json files:
python
```
from google.colab import files
upload = files.upload()
```
Import the necessary libraries which are numpy and pandas.

:bulb: Best Practice Always import your libraries at the beginning of your notebook! This makes it easy to see all dependencies at a glance and helps avoid "NameError" issues later.

:white_check_mark: Milestone One: Environment setup complete! You should now have the dataset loaded and libraries imported.

Phase 2: Declaration of Text Preprocessing

Please refer to Chapter 10: Text Preprocessing to complete the following codes.

Import nltk package
Download punkt tokenizer from nltk.
Import the class snowball from nltk.stem.
Declare snowballStemmer using the imported snowball function.
Within the function text_preprocessing, code out the following steps:
:bulb: Understanding Text Preprocessing Text preprocessing is like cleaning and organizing your words before analysis:
- Tokenization: Breaking sentences into individual words
- Stemming: Reducing words to their root form (e.g., "running" -> "run")
- Cleaning: Removing punctuation and special characters
1. Tokenize sentence into a list of tokens.
2. Use snowballStemmer to stem each token in tokens and append it into stem_tokens.
3. Remove punctuations and special characters from stem_tokens.
4. Join stem_tokens into a string and return it as the function output.
Test the function text_preprocessing by running the function with the following sentence:
python
```
'We all agreed, it was a magnificent evening.'
```
Expected Output:
css
```
we all agre it was a magnific even
```

:warning: Debugging Tip If your output doesn't match, check:

Did you convert all words to lowercase?

Did you remove ALL punctuation (including commas and periods)?

Is your stemmer initialized correctly?

:white_check_mark: Milestone 2: Text preprocessing function working correctly!

Phase 3: Text Preprocessing

Run the following codes to extract the dataset from "intents.json".

python

# Import JSON package and extract all the data from the dataset.
import json
with open("intents.json") as f:
    data = json.load(f)

Declare multiple lists as shown below. (Remarks: You may copy the following code)
1. intent_list - stores all types of intents that exists in the dataset
2. train_data - stores the training dataset
3. train_label - stores the labels of the training dataset
4. responses - stores all responses corresponding to their intent in a key value pair format
python
```
intent_list = []
train_data = []
train_label = []
responses = {}
```
Within the following loop, write the codes:
:bulb: Loop Structure The loop provided in the template iterates through each intent in the dataset. For each intent, you need to:
- Process all the example texts
- Store the processed texts and their labels
- Build the responses dictionary
1. Preprocess text using our declared function text_preprocessing and save it as preprocessed_text
2. Append preprocessed_text to train_data
3. Append the corresponding intent of the preprocessed_text to train_label. (Remarks: You may obtain its intent using intent['intent'])
Print out the following values:
1. intent_list
2. train_data[2:8]
3. train_label[2:8]
The dictionary responses contain all the corresponding replies that the chatbot will provide to the user according to the intent in the user's query. Print out the list of responses for the intent "Thanks" by indexing the dictionary responses.

:white_check_mark: Milestone 3: Dataset loaded and preprocessed successfully!

Phase 4: Feature Extraction

Please refer to Chapter 10: Text Preprocessing to complete the following codes.

Import the class CountVectorizer from Scikit-Learn library.
Declare the vectorizer object using the imported CountVectorizer.
Create a vocabulary for vectorizer by using the function fit with train_data as one of its parameters.
Assign vectorizer.get_feature_names_out() to a new array list_of_words.
Print out list_of_words.
By using the vectorizer, convert train_data into a bag of words train_data_bow.

:bulb: Understanding Bag of Words The Bag of Words (BoW) representation converts text into numbers:

Each word in the vocabulary gets a position

Each text is represented as a count of how many times each word appears

This creates a numerical representation that ML models can understand

Now you can make a comparison between train_data and train_data_bow by trying to print out the value at index = 1 in train_data, the value at index = 1 in train_data_bow and its label in train_label.
python
```
print(train_data[1])
print(train_data_bow[1])
print(train_label[1])
```
Note that in the list_of_words, the word "hi" is positioned at index position 52 while the word "there" is positioned at index position 112. You can find out the words at each position with the following code.
python
```
print(list_of_words[52])
print(list_of_words[112])
```

:white_check_mark: Milestone 4: Feature extraction complete! Your text data is now in numerical format.

Phase 5: Train ML Model

Please refer to Chapter 7: Classification to complete the following codes.

:bulb: Why Three Different Models? We're training three different classifiers to compare their performance:

KNN: Simple and intuitive, but can be slow with large datasets

Decision Tree: Fast and interpretable, good for understanding decisions

Naive Bayes: Excellent for text classification, often the best choice for chatbots

Import KNN classifier from Scikit-Learn library.
Declare a KNN classifier clf_knn. Set the value of n_neighbors to 5.
Fit the classifier using train_data_bow and train_label.
Import Decision Tree classifier from Scikit-Learn library.
Declare a Decision Tree classifier clf_dt. Set the value of random_state to 33.
Fit the classifier using train_data_bow and train_label.
Import Multinomial Naive Bayes classifier from Scikit-Learn library
Declare a Naive Bayes classifier clf_nb.
Fit the classifier using train_data_bow and train_label.

:white_check_mark: Milestone 5: All three models trained successfully!

Phase 6: Testing Model with User Input

Please refer to Chapter 10: Text Preprocessing and Chapter 7: Classification to complete the following codes.

Test all 3 models using the example sentence "Hello There".

python

test_sentence = "Hello there"

Before doing prediction, remember to:

:warning: Important Processing Steps The test sentence must go through the EXACT same preprocessing as the training data:

Apply text preprocessing (tokenize, stem, clean)

Convert to Bag of Words using the SAME vectorizer

The input to transform() must be a list!

Preprocess test_sentence using the declared function text_preprocessing.
Transform the test_sentence into bag of words test_sentence_bow. (Remarks: test_sentence should be in a list [] before transforming it)
Print out test_sentence_bow[0] to confirm that it has been successfully converted to a bag of words.

:bulb: Testing Your Models After preprocessing, use each classifier's predict() method to see which intent each model predicts. Compare their results!

:white_check_mark: Milestone 6: Models tested and ready for integration!

Phase 7: Chatbot UI Configuration

In this phase, we will develop a simple interface for our chatbot.

Import package random and package datetime from datetime.
python
```
import random
from datetime import datetime
```
In general, the flow of the bot_respond function is as follows:
1. The user input is passed into the function.
2. Text processing is then done onto the user input, it includes:
  1. Text Preprocessing
  2. Feature Extraction
3. Then, one of the models from Phase 1 - Phase 3 will be used to predict the intent of user input.
4. Based on the intent, the function randomly selects a response from the responses dictionary that has the same intent.
With this flow, complete the codes stated in step 2.
Declare a function bot_respond that receives a parameter named user_query. In the function bot_respond:
1. Use function text_preprocessing to tokenize and stem user_query.
2. After stemming, transform user_query into a bag of words named user_query_bow. Remember to store user_query as a list before applying the transform function.
3. From the three classifiers, clf_knn, clf_dt and clf_nb, select and assign clf_nb as clf.
4. Use the selected classifier clf to predict the intent of user_query_bow. Store the predicted intent in a variable predicted. Note that the predicted result is in the form of a Numpy Array.
5. Insert the following code into the function. The codes below will act to return a default response if it does not know the intent behind user input. (Remarks: The codes are already available in the template)
  python
```
# Return default response if chatbot does not know what intent the user_query is about
max_proba = max(clf.predict_proba(user_query_bow)[0])
if max_proba < 0.08 and clf == clf_nb:
    predicted = ['noanswer']
elif max_proba < 0.3 and not clf == clf_nb:
    predicted = ['noanswer']
```
6. Declare an empty string bot_response.
7. For each intent, there are a number of different responses that the chatbot can choose from. Randomly generate a number chosenResponse that is within the range from 0 to the number of responses for the intent - 1. (Remarks: The codes are already available in the template)
  python
```
# Randomly generate a number chosenResponse that is within the range from 0 to the number of responses for the intent
numOfResponses = len(responses[predicted[0]])
chosenResponse = random.randint(0, numOfResponses-1)
```
8. Based on chosenResponse, select the response from responses and assign it to bot_response. (Remark: The codes are already available in the template)
  python
```
# Select the response from responses and assign it to bot_response
if predicted[0] == "TimeQuery":
    bot_response = eval(responses[predicted[0]][chosenResponse])
else:
    bot_response = responses[predicted[0]][chosenResponse]
```
9. Return bot_response.

Create a simple interface that accepts user input. (Remarks: The codes are already available in the template.)

python

# Simple interface for chatbot
print("This is Alex the chatbot. Say something!!")
while True:
    try:
        bot_input = input("You  : ")
        print("Alex :", bot_respond(bot_input))
    except KeyboardInterrupt:
        print("Alex : Thank you for choosing us. See you again soon!!")
        break

Now it's time to test out your chatbot! Run the code and try to type in your query:

:bulb: Testing Your Chatbot Try these test phrases:

Greetings: "Hello", "Hi there", "Good morning"

Questions: "What time is it?", "What's your name?"

Thanks: "Thank you", "Thanks a lot"

Unknown: Try something not in the training data!

:white_check_mark: Milestone 7: Basic chatbot complete and working!

:rocket: Extension Challenges

Advanced Challenge One: Personalized Responses

We had successfully implemented and developed a simple chatbot that could identify the intentions behind a user query and provide an appropriate answer. However, there are still some other features that we can implement into our chatbot. Using everything you have learnt so far, try to implement a feature that allows the chatbot to prompt the user for his username. After that, modify the function bot_respond to replace the word <HUMAN> with the username.

:bulb: Implementation Hints

Ask for username before the main chat loop starts

Store the username in a variable

Use string .replace() method in bot_respond function

Test with responses that contain <HUMAN> placeholder

Advanced Challenge 2: Expand the Knowledge Base

As it stands, the chatbot is only able to recognize the intents that are stored in the dataset provided. As a bonus challenge, try to modify the dataset intents.json to include more queries with different intentions. One example of intents are as follows: (Remarks: make sure that the syntax is correct)

intent	FavouriteFood
user query	What is your favourite food ?
	What do you like to eat ?
	What food do you like ?
	What do you eat ?
responses	My favourite food is Nasi Lemak
	I love Nasi Lemak
	I can eat Nasi Lemak everyday

To access intents.json, you may open the file using the left navigation bar as shown below.

The file will be opened up and you may then modify the intents.json file.

After modifications, rerun the codes from phase 2 onwards to renew the chatbot model with the newly modified data.

Advanced Challenge 3: Model Comparison

:bulb: Research Challenge Create a comparison table showing:

Accuracy of each model (KNN, Decision Tree, Naive Bayes)

Response time for each model

Which model works best for your chatbot and why?

Advanced Challenge 4: Context Awareness

Try to implement a simple context system where the chatbot remembers the last intent and can respond accordingly. For example:

User: "What time is it?"
Bot: "It's 3:00 PM"
User: "Thanks!"
Bot: "You're welcome! Is there anything else about the time you'd like to know?"

Chatbot Project Part 2

In this project, you will use your knowledge in Natural Language Processing and Scikit-Learn to create a chatbot.

For Part 2, our focus would be to build a UI webpage for our chatbot using the package streamlit.

:bulb: What is Streamlit? Streamlit is a Python library that makes it super easy to create web applications! Think of it as turning your Python code into an interactive website with just a few lines of code. Perfect for showcasing your chatbot to friends and family!

Getting Started

** You may skip this section if you have downloaded the dataset from Part 1

Click on this link to download the dataset file in the form of json "intents.json": https://drive.google.com/file/d/1kd1J5KX5v6FEjr6sahivHFY38BwRJ-r-/view?usp=sharing
Click on the download icon to download the dataset:
Make sure you have downloaded and installed Visual Studio Code.
Watch this video to set up your VSCode.
On your laptop, create one new folder and name it "P3: Chatbot (Part 3)".
Move your intents.json file in that folder.
Create a new file and name it "chatbot.py".
Open the folder using VSCode.
Install all the libraries needed for this project using the Visual Studio Code command.

:warning: Installation Tips

Make sure you're in the correct directory in your terminal before installing! Use cd to navigate to your project folder first.

For streamlit library use the command "py -m pip install streamlit" in the terminal to install the library.

For pandas library use the command "py -m pip install pandas" in the terminal to install the library.

For numpy library use the command "py -m pip install numpy" in the terminal to install the library.

For nltk library use the command "py -m pip install nltk" in the terminal to install the library.

For sklearn library use the command "py -m pip install sklearn" in the terminal to install the library.

This project will have 2 parts:

Creating the chatbot model.

Designing the UI using streamlit.

Phase One: Import Packages

Import streamlit library as st

Import pandas library as pd

Import the package numpy as np

Import all the libraries needed to perform text processing that we have learned in Part 2. Copy and paste the code below in VS Code.
python
import json 
import nltk 
from nltk.stem import snowball 
from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.tree import DecisionTreeClassifier 
import random 
from datetime import datetime
Phase 2: Declaration of Text Preprocessing Function

You may copy the code below to declare list and dictionaries.
python
intent_list = [] 
train_data = [] 
train_label = [] 
responses = {} 
list_of_words = []
You may copy the code below to declare text processing function
python
nltk.download('punkt') 
snowballStemmer = snowball.SnowballStemmer("english") 
def text_preprocessing(sentence): 
    # tokenize the sentences 
    tokens = nltk.word_tokenize(sentence) 
    # check the word is alphabet or number 
    for token in tokens: 
        if not token.isalnum(): 
            tokens.remove(token) 
    stem_tokens = [] 
    for token in tokens: 
        stem_tokens.append(snowballStemmer.stem(token.lower())) 
    return " ".join(stem_tokens)
:white_check_mark: Milestone 1 (Part 2): Environment set up with all necessary imports!

Phase 3: Feature Extraction and Decision Tree Model

Declare the vectorizer object using the imported CountVectorizer and declare a Decision Tree classifier clf_dt. Set the value of random_state to 33. You may copy the code below.
python
# Feature Extraction 
vectorizer = CountVectorizer() 
# Build NLP Model 
clf_dt = DecisionTreeClassifier(random_state=33)
Phase 4: Generate response

Create a function bot_respond that receives a parameter named user_query. The function bot_respond will process the text, analyze and give response to user. You may copy the code below.
python
def bot_respond(user_query): 
    # what user say 
    user_query = text_preprocessing(user_query) 
    user_query_bow = vectorizer.transform([user_query]) 
    clf = clf_dt 
    predicted = clf.predict(user_query_bow)  # predict the intents 
    # When model don't know the intent 
    max_proba = max(clf.predict_proba(user_query_bow)[0]) 
    if max_proba < 0.08 and clf == clf_nb: 
        predicted = ['noanswer'] 
    elif max_proba < 0.3 and not clf == clf_nb: 
        predicted = ['noanswer'] 
    bot_response = "" 
    numOfResponses = len(responses[predicted[0]]) 
    chosenResponse = random.randint(0, numOfResponses-1) 
    if predicted[0] == "TimeQuery": 
        bot_response = eval(responses[predicted[0]][chosenResponse]) 
    else: 
        bot_response = responses[predicted[0]][chosenResponse] 
    return bot_response
:white_check_mark: Milestone 2 (Part 2): Core functions ready!

Phase 5: Function to load model

Create another function load_model() that will load training data from the intents.json file, extract the feature and train the data. We will call this function to process the user input later.

:bulb: Understanding load_model() This function does all the heavy lifting:

Loads the intents from JSON

Preprocesses all training data

Creates the vocabulary (vectorizer)

Trains the model It's like preparing your chatbot's brain before it starts chatting!

python

def load_model(): 
    # import training data 
    with open("intents.json") as f: 
        data = json.load(f) 
    # load training data 
    for intent in data['intents']: 
        for text in intent['text']: 
            # Save the data sentences 
            preprocessed_text = text_preprocessing(text) 
            train_data.append(preprocessed_text) 
            # Save the data intent 
            train_label.append(intent['intent']) 
        intent_list.append(intent['intent']) 
        responses[intent['intent']] = intent["responses"] 
    # Feature Extraction 
    vectorizer.fit(train_data) 
    list_of_words = vectorizer.get_feature_names_out() 
    train_data_bow = vectorizer.transform(train_data) 
    # Train the model 
    clf_dt.fit(train_data_bow, train_label)

:white_check_mark: Milestone 3 (Part 2): Model loading function complete!

Phase 6: Streamlit UI Design

:bulb: Running Streamlit To run your Streamlit app, use this command in terminal:
bash
streamlit run chatbot.py
Your browser will automatically open with your chatbot!

Display text in title formatting. The title is Chatbot.
Create sidebar
1. with the title "Sidebar".
2. with a subheader "Pages".
3. Add selectbox to your sidebar.
  1. Create 2 different pages (Home, Chatbot) for the user to choose from the selectbox.
  2. Assign the selectbox to variable app_mode which will be used to display different page.
Design for the Home page using an if-elif statement. (Hint: app_mode === "Home")
1. Display string formatted as Markdown. The text should be "Chat with me if you feel bored".
2. Display a video from youtube. Copy the link of the youtube video.
Create the Chatbot page (Hint: app_mode === "Chatbot")
1. Display a simple text of "Please talk to me".
2. Call back the function load_model() that we created previously.
3. Receive user response using text input and assign it to variable text.
4. Using st.write, display the response of the chatbot by calling the function bot_respond(user_query) and pass in variable text as the argument. You may copy the code below for this step.

python

if text: 
    st.write('Chatbot:') 
    with st.spinner('Loading...'): 
        st.write(bot_respond(text))

:warning: Common Streamlit Issues

If you get "No module named streamlit", make sure you installed it with pip

If the page refreshes when you type, that's normal! Streamlit reruns the entire script

Use st.session_state to maintain conversation history if needed

:white_check_mark: Milestone 4 (Part 2): Congratulations! Your chatbot now has a web interface!

:art: Bonus Challenge: Make It Your Own!

Try to create one more page in the sidebar. You can give the page any name and display information about the programmer of the webpage. You can insert your own image and add any elements that can make your webpage look pretty and interesting.

:bulb: Creative Ideas

Add a chat history that shows previous conversations

Include fun animations or GIFs

Add sound effects when the bot responds

Create a theme switcher (light/dark mode)

Add a feedback system where users can rate responses

:dart: Project Completion Checklist

Before submitting your project, make sure you've completed:

Part One: Basic chatbot working in Colab
Part 2: Streamlit web interface running
At least one Advanced Challenge attempted
Code is well-commented and organized
Chatbot responds appropriately to various inputs
UI is user-friendly and visually appealing

Great job completing this project! You've learned how to combine NLP, Machine Learning, and web development to create a real AI application. Keep experimenting and adding new features to make your chatbot even more impressive! :rocket:

Project 4 of 7

Project 4: Chatbot

Apply your knowledge to build something amazing!

Chatbot Project Part 1

:information_source: Project Overview Difficulty Level: Intermediate
Estimated Time: 3-4 hours (Part 1) + 2-3 hours (Part 2)
Skills Practiced:

Natural Language Processing (NLP)

Text preprocessing and feature extraction

Machine Learning model training

JSON data handling

User interface development with Streamlit

For Part 1, our focus would be text processing & model training using all the techniques that we have learned before in previous lessons.

:emoji:️ Project Roadmap

mermaid

graph TD
    A[Start: Download Dataset] --> B[Phase 1: Setup Environment]
    B --> C[Phase 2: Text Preprocessing]
    C --> D[Phase 3: Process Dataset]
    D --> E[Phase 4: Feature Extraction]
    E --> F[Phase 5: Train ML Models]
    F --> G[Phase 6: Test Models]
    G --> H[Phase 7: Build UI]
    H --> I[Part 2: Streamlit Web App]
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style I fill:#9f9,stroke:#333,stroke-width:2px

Project Overview

Sentence	Intent
Good Morning!	Greeting
Hi! How are you today?	Greeting
I am very sorry for troubling you.	Apology
I apologize for my mistake.	Apology

The chatbot is then programmed to provide an appropriate response in regards to the intents without considering how the sentences are formed. As example:

User Query	Chatbot Response
Good Morning!	Hi Human. How are you today?
Hello there
Hi!

By training a model that is able to recognize these intents behind every sentence, it can then be used by the chatbot to provide an appropriate response to the user's query.

In our dataset, the intents and chatbot response are provided in a json format as shown below:

text - the list of user / text inputs

intent - the intent of the user inputs

responses - the list of responses that the chatbot can provide based on identified intent

json

\{
    "intent": "Greeting",
    "text": [
        "Hi",
        "Hi there",
        "Hola",
        "Hello",
    ],
    "responses": [
        "Hi human, please tell me your Alex user",
        "Hello human, please tell me your Alex user",
        "Hola human, please tell me your Alex user"
    ]
\}

You may learn more about how a chatbot is created using this video: Link to Video

To learn more about the dataset, feel free to visit this link: https://www.kaggle.com/elvinagammed/chatbots-intent-recognition-dataset

Click on this link to download the dataset file in the form of json "intents.json": https://drive.google.com/file/d/1kd1J5KX5v6FEjr6sahivHFY38BwRJ-r-/view?usp=sharing

Click on the download icon to download the dataset:

For Google Colab users, upload your dataset into your colab file.

:warning: Common Pitfall Make sure the intents.json file is uploaded to your Colab environment before proceeding. If you skip this step, you'll get a "FileNotFoundError" when trying to read the JSON file!

Phase One: Download intents.json file

Make a copy of the template file found here and rename the copied file as "P4: Chatbot".
Run the code here to download the dataset onto your colab file. Wait for the download process to complete before proceeding to the next step.

If the download fails, you may import the dataset "intents.json" into your file. For collab users, run the following codes to import json files:
python
```
from google.colab import files
upload = files.upload()
```
Import the necessary libraries which are numpy and pandas.

:bulb: Best Practice Always import your libraries at the beginning of your notebook! This makes it easy to see all dependencies at a glance and helps avoid "NameError" issues later.

:white_check_mark: Milestone One: Environment setup complete! You should now have the dataset loaded and libraries imported.

Phase 2: Declaration of Text Preprocessing

Please refer to Chapter 10: Text Preprocessing to complete the following codes.

Import nltk package
Download punkt tokenizer from nltk.
Import the class snowball from nltk.stem.
Declare snowballStemmer using the imported snowball function.
Within the function text_preprocessing, code out the following steps:
:bulb: Understanding Text Preprocessing Text preprocessing is like cleaning and organizing your words before analysis:
- Tokenization: Breaking sentences into individual words
- Stemming: Reducing words to their root form (e.g., "running" -> "run")
- Cleaning: Removing punctuation and special characters
1. Tokenize sentence into a list of tokens.
2. Use snowballStemmer to stem each token in tokens and append it into stem_tokens.
3. Remove punctuations and special characters from stem_tokens.
4. Join stem_tokens into a string and return it as the function output.
Test the function text_preprocessing by running the function with the following sentence:
python
```
'We all agreed, it was a magnificent evening.'
```
Expected Output:
css
```
we all agre it was a magnific even
```

:warning: Debugging Tip If your output doesn't match, check:

Did you convert all words to lowercase?

Did you remove ALL punctuation (including commas and periods)?

Is your stemmer initialized correctly?

:white_check_mark: Milestone 2: Text preprocessing function working correctly!

Phase 3: Text Preprocessing

Run the following codes to extract the dataset from "intents.json".

python

# Import JSON package and extract all the data from the dataset.
import json
with open("intents.json") as f:
    data = json.load(f)

Declare multiple lists as shown below. (Remarks: You may copy the following code)
1. intent_list - stores all types of intents that exists in the dataset
2. train_data - stores the training dataset
3. train_label - stores the labels of the training dataset
4. responses - stores all responses corresponding to their intent in a key value pair format
python
```
intent_list = []
train_data = []
train_label = []
responses = {}
```
Within the following loop, write the codes:
:bulb: Loop Structure The loop provided in the template iterates through each intent in the dataset. For each intent, you need to:
- Process all the example texts
- Store the processed texts and their labels
- Build the responses dictionary
1. Preprocess text using our declared function text_preprocessing and save it as preprocessed_text
2. Append preprocessed_text to train_data
3. Append the corresponding intent of the preprocessed_text to train_label. (Remarks: You may obtain its intent using intent['intent'])
Print out the following values:
1. intent_list
2. train_data[2:8]
3. train_label[2:8]
The dictionary responses contain all the corresponding replies that the chatbot will provide to the user according to the intent in the user's query. Print out the list of responses for the intent "Thanks" by indexing the dictionary responses.

:white_check_mark: Milestone 3: Dataset loaded and preprocessed successfully!

Phase 4: Feature Extraction

Please refer to Chapter 10: Text Preprocessing to complete the following codes.

Import the class CountVectorizer from Scikit-Learn library.
Declare the vectorizer object using the imported CountVectorizer.
Create a vocabulary for vectorizer by using the function fit with train_data as one of its parameters.
Assign vectorizer.get_feature_names_out() to a new array list_of_words.
Print out list_of_words.
By using the vectorizer, convert train_data into a bag of words train_data_bow.

:bulb: Understanding Bag of Words The Bag of Words (BoW) representation converts text into numbers:

Each word in the vocabulary gets a position

Each text is represented as a count of how many times each word appears

This creates a numerical representation that ML models can understand

Now you can make a comparison between train_data and train_data_bow by trying to print out the value at index = 1 in train_data, the value at index = 1 in train_data_bow and its label in train_label.
python
```
print(train_data[1])
print(train_data_bow[1])
print(train_label[1])
```
Note that in the list_of_words, the word "hi" is positioned at index position 52 while the word "there" is positioned at index position 112. You can find out the words at each position with the following code.
python
```
print(list_of_words[52])
print(list_of_words[112])
```

:white_check_mark: Milestone 4: Feature extraction complete! Your text data is now in numerical format.

Phase 5: Train ML Model

Please refer to Chapter 7: Classification to complete the following codes.

:bulb: Why Three Different Models? We're training three different classifiers to compare their performance:

KNN: Simple and intuitive, but can be slow with large datasets

Decision Tree: Fast and interpretable, good for understanding decisions

Naive Bayes: Excellent for text classification, often the best choice for chatbots

Import KNN classifier from Scikit-Learn library.
Declare a KNN classifier clf_knn. Set the value of n_neighbors to 5.
Fit the classifier using train_data_bow and train_label.
Import Decision Tree classifier from Scikit-Learn library.
Declare a Decision Tree classifier clf_dt. Set the value of random_state to 33.
Fit the classifier using train_data_bow and train_label.
Import Multinomial Naive Bayes classifier from Scikit-Learn library
Declare a Naive Bayes classifier clf_nb.
Fit the classifier using train_data_bow and train_label.

:white_check_mark: Milestone 5: All three models trained successfully!

Phase 6: Testing Model with User Input

Please refer to Chapter 10: Text Preprocessing and Chapter 7: Classification to complete the following codes.

Test all 3 models using the example sentence "Hello There".

python

test_sentence = "Hello there"

Before doing prediction, remember to:

:warning: Important Processing Steps The test sentence must go through the EXACT same preprocessing as the training data:

Apply text preprocessing (tokenize, stem, clean)

Convert to Bag of Words using the SAME vectorizer

The input to transform() must be a list!

Preprocess test_sentence using the declared function text_preprocessing.
Transform the test_sentence into bag of words test_sentence_bow. (Remarks: test_sentence should be in a list [] before transforming it)
Print out test_sentence_bow[0] to confirm that it has been successfully converted to a bag of words.

:bulb: Testing Your Models After preprocessing, use each classifier's predict() method to see which intent each model predicts. Compare their results!

:white_check_mark: Milestone 6: Models tested and ready for integration!

Phase 7: Chatbot UI Configuration

In this phase, we will develop a simple interface for our chatbot.

Import package random and package datetime from datetime.
python
```
import random
from datetime import datetime
```
In general, the flow of the bot_respond function is as follows:
1. The user input is passed into the function.
2. Text processing is then done onto the user input, it includes:
  1. Text Preprocessing
  2. Feature Extraction
3. Then, one of the models from Phase 1 - Phase 3 will be used to predict the intent of user input.
4. Based on the intent, the function randomly selects a response from the responses dictionary that has the same intent.
With this flow, complete the codes stated in step 2.
Declare a function bot_respond that receives a parameter named user_query. In the function bot_respond:
1. Use function text_preprocessing to tokenize and stem user_query.
2. After stemming, transform user_query into a bag of words named user_query_bow. Remember to store user_query as a list before applying the transform function.
3. From the three classifiers, clf_knn, clf_dt and clf_nb, select and assign clf_nb as clf.
4. Use the selected classifier clf to predict the intent of user_query_bow. Store the predicted intent in a variable predicted. Note that the predicted result is in the form of a Numpy Array.
5. Insert the following code into the function. The codes below will act to return a default response if it does not know the intent behind user input. (Remarks: The codes are already available in the template)
  python
```
# Return default response if chatbot does not know what intent the user_query is about
max_proba = max(clf.predict_proba(user_query_bow)[0])
if max_proba < 0.08 and clf == clf_nb:
    predicted = ['noanswer']
elif max_proba < 0.3 and not clf == clf_nb:
    predicted = ['noanswer']
```
6. Declare an empty string bot_response.
7. For each intent, there are a number of different responses that the chatbot can choose from. Randomly generate a number chosenResponse that is within the range from 0 to the number of responses for the intent - 1. (Remarks: The codes are already available in the template)
  python
```
# Randomly generate a number chosenResponse that is within the range from 0 to the number of responses for the intent
numOfResponses = len(responses[predicted[0]])
chosenResponse = random.randint(0, numOfResponses-1)
```
8. Based on chosenResponse, select the response from responses and assign it to bot_response. (Remark: The codes are already available in the template)
  python
```
# Select the response from responses and assign it to bot_response
if predicted[0] == "TimeQuery":
    bot_response = eval(responses[predicted[0]][chosenResponse])
else:
    bot_response = responses[predicted[0]][chosenResponse]
```
9. Return bot_response.

Create a simple interface that accepts user input. (Remarks: The codes are already available in the template.)

python

# Simple interface for chatbot
print("This is Alex the chatbot. Say something!!")
while True:
    try:
        bot_input = input("You  : ")
        print("Alex :", bot_respond(bot_input))
    except KeyboardInterrupt:
        print("Alex : Thank you for choosing us. See you again soon!!")
        break

Now it's time to test out your chatbot! Run the code and try to type in your query:

:bulb: Testing Your Chatbot Try these test phrases:

Greetings: "Hello", "Hi there", "Good morning"

Questions: "What time is it?", "What's your name?"

Thanks: "Thank you", "Thanks a lot"

Unknown: Try something not in the training data!

:white_check_mark: Milestone 7: Basic chatbot complete and working!

:rocket: Extension Challenges

Advanced Challenge One: Personalized Responses

:bulb: Implementation Hints

Ask for username before the main chat loop starts

Store the username in a variable

Use string .replace() method in bot_respond function

Test with responses that contain <HUMAN> placeholder

Advanced Challenge 2: Expand the Knowledge Base

intent	FavouriteFood
user query	What is your favourite food ?
	What do you like to eat ?
	What food do you like ?
	What do you eat ?
responses	My favourite food is Nasi Lemak
	I love Nasi Lemak
	I can eat Nasi Lemak everyday

To access intents.json, you may open the file using the left navigation bar as shown below.

The file will be opened up and you may then modify the intents.json file.

After modifications, rerun the codes from phase 2 onwards to renew the chatbot model with the newly modified data.

Advanced Challenge 3: Model Comparison

:bulb: Research Challenge Create a comparison table showing:

Accuracy of each model (KNN, Decision Tree, Naive Bayes)

Response time for each model

Which model works best for your chatbot and why?

Advanced Challenge 4: Context Awareness

Try to implement a simple context system where the chatbot remembers the last intent and can respond accordingly. For example:

User: "What time is it?"
Bot: "It's 3:00 PM"
User: "Thanks!"
Bot: "You're welcome! Is there anything else about the time you'd like to know?"

Chatbot Project Part 2

In this project, you will use your knowledge in Natural Language Processing and Scikit-Learn to create a chatbot.

For Part 2, our focus would be to build a UI webpage for our chatbot using the package streamlit.

:bulb: What is Streamlit? Streamlit is a Python library that makes it super easy to create web applications! Think of it as turning your Python code into an interactive website with just a few lines of code. Perfect for showcasing your chatbot to friends and family!

Getting Started

** You may skip this section if you have downloaded the dataset from Part 1

Click on this link to download the dataset file in the form of json "intents.json": https://drive.google.com/file/d/1kd1J5KX5v6FEjr6sahivHFY38BwRJ-r-/view?usp=sharing
Click on the download icon to download the dataset:
Make sure you have downloaded and installed Visual Studio Code.
Watch this video to set up your VSCode.
On your laptop, create one new folder and name it "P3: Chatbot (Part 3)".
Move your intents.json file in that folder.
Create a new file and name it "chatbot.py".
Open the folder using VSCode.
Install all the libraries needed for this project using the Visual Studio Code command.

:warning: Installation Tips

Make sure you're in the correct directory in your terminal before installing! Use cd to navigate to your project folder first.

For streamlit library use the command "py -m pip install streamlit" in the terminal to install the library.

For pandas library use the command "py -m pip install pandas" in the terminal to install the library.

For numpy library use the command "py -m pip install numpy" in the terminal to install the library.

For nltk library use the command "py -m pip install nltk" in the terminal to install the library.

For sklearn library use the command "py -m pip install sklearn" in the terminal to install the library.

This project will have 2 parts:

Creating the chatbot model.

Designing the UI using streamlit.

Phase One: Import Packages

Import streamlit library as st

Import pandas library as pd

Import the package numpy as np

Import all the libraries needed to perform text processing that we have learned in Part 2. Copy and paste the code below in VS Code.
python
import json 
import nltk 
from nltk.stem import snowball 
from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.tree import DecisionTreeClassifier 
import random 
from datetime import datetime
Phase 2: Declaration of Text Preprocessing Function

You may copy the code below to declare list and dictionaries.
python
intent_list = [] 
train_data = [] 
train_label = [] 
responses = {} 
list_of_words = []
You may copy the code below to declare text processing function
python
nltk.download('punkt') 
snowballStemmer = snowball.SnowballStemmer("english") 
def text_preprocessing(sentence): 
    # tokenize the sentences 
    tokens = nltk.word_tokenize(sentence) 
    # check the word is alphabet or number 
    for token in tokens: 
        if not token.isalnum(): 
            tokens.remove(token) 
    stem_tokens = [] 
    for token in tokens: 
        stem_tokens.append(snowballStemmer.stem(token.lower())) 
    return " ".join(stem_tokens)
:white_check_mark: Milestone 1 (Part 2): Environment set up with all necessary imports!

Phase 3: Feature Extraction and Decision Tree Model

Declare the vectorizer object using the imported CountVectorizer and declare a Decision Tree classifier clf_dt. Set the value of random_state to 33. You may copy the code below.
python
# Feature Extraction 
vectorizer = CountVectorizer() 
# Build NLP Model 
clf_dt = DecisionTreeClassifier(random_state=33)
Phase 4: Generate response

Create a function bot_respond that receives a parameter named user_query. The function bot_respond will process the text, analyze and give response to user. You may copy the code below.
python
def bot_respond(user_query): 
    # what user say 
    user_query = text_preprocessing(user_query) 
    user_query_bow = vectorizer.transform([user_query]) 
    clf = clf_dt 
    predicted = clf.predict(user_query_bow)  # predict the intents 
    # When model don't know the intent 
    max_proba = max(clf.predict_proba(user_query_bow)[0]) 
    if max_proba < 0.08 and clf == clf_nb: 
        predicted = ['noanswer'] 
    elif max_proba < 0.3 and not clf == clf_nb: 
        predicted = ['noanswer'] 
    bot_response = "" 
    numOfResponses = len(responses[predicted[0]]) 
    chosenResponse = random.randint(0, numOfResponses-1) 
    if predicted[0] == "TimeQuery": 
        bot_response = eval(responses[predicted[0]][chosenResponse]) 
    else: 
        bot_response = responses[predicted[0]][chosenResponse] 
    return bot_response
:white_check_mark: Milestone 2 (Part 2): Core functions ready!

Phase 5: Function to load model

Create another function load_model() that will load training data from the intents.json file, extract the feature and train the data. We will call this function to process the user input later.

:bulb: Understanding load_model() This function does all the heavy lifting:

Loads the intents from JSON

Preprocesses all training data

Creates the vocabulary (vectorizer)

Trains the model It's like preparing your chatbot's brain before it starts chatting!

python

def load_model(): 
    # import training data 
    with open("intents.json") as f: 
        data = json.load(f) 
    # load training data 
    for intent in data['intents']: 
        for text in intent['text']: 
            # Save the data sentences 
            preprocessed_text = text_preprocessing(text) 
            train_data.append(preprocessed_text) 
            # Save the data intent 
            train_label.append(intent['intent']) 
        intent_list.append(intent['intent']) 
        responses[intent['intent']] = intent["responses"] 
    # Feature Extraction 
    vectorizer.fit(train_data) 
    list_of_words = vectorizer.get_feature_names_out() 
    train_data_bow = vectorizer.transform(train_data) 
    # Train the model 
    clf_dt.fit(train_data_bow, train_label)

:white_check_mark: Milestone 3 (Part 2): Model loading function complete!

Phase 6: Streamlit UI Design

:bulb: Running Streamlit To run your Streamlit app, use this command in terminal:
bash
streamlit run chatbot.py
Your browser will automatically open with your chatbot!

Display text in title formatting. The title is Chatbot.
Create sidebar
1. with the title "Sidebar".
2. with a subheader "Pages".
3. Add selectbox to your sidebar.
  1. Create 2 different pages (Home, Chatbot) for the user to choose from the selectbox.
  2. Assign the selectbox to variable app_mode which will be used to display different page.
Design for the Home page using an if-elif statement. (Hint: app_mode === "Home")
1. Display string formatted as Markdown. The text should be "Chat with me if you feel bored".
2. Display a video from youtube. Copy the link of the youtube video.
Create the Chatbot page (Hint: app_mode === "Chatbot")
1. Display a simple text of "Please talk to me".
2. Call back the function load_model() that we created previously.
3. Receive user response using text input and assign it to variable text.
4. Using st.write, display the response of the chatbot by calling the function bot_respond(user_query) and pass in variable text as the argument. You may copy the code below for this step.

python

if text: 
    st.write('Chatbot:') 
    with st.spinner('Loading...'): 
        st.write(bot_respond(text))

:warning: Common Streamlit Issues

If you get "No module named streamlit", make sure you installed it with pip

If the page refreshes when you type, that's normal! Streamlit reruns the entire script

Use st.session_state to maintain conversation history if needed

:white_check_mark: Milestone 4 (Part 2): Congratulations! Your chatbot now has a web interface!

:art: Bonus Challenge: Make It Your Own!

:bulb: Creative Ideas

Add a chat history that shows previous conversations

Include fun animations or GIFs

Add sound effects when the bot responds

Create a theme switcher (light/dark mode)

Add a feedback system where users can rate responses

:dart: Project Completion Checklist

Before submitting your project, make sure you've completed:

Part One: Basic chatbot working in Colab
Part 2: Streamlit web interface running
At least one Advanced Challenge attempted
Code is well-commented and organized
Chatbot responds appropriately to various inputs
UI is user-friendly and visually appealing