Project 04: Chatbot - Discovery Challenge

Learning Objectives

By completing this project, you will:

Master NLP Text Preprocessing: Tokenization, stemming, and cleaning
Apply Feature Extraction: Convert text to numerical vectors (Bag of Words)
Train Classification Models: KNN, Decision Tree, and Naive Bayes for intent classification
Build Conversational AI: Create a chatbot that understands user intent
Create Web Interface: Build a Streamlit UI for your chatbot

Difficulty Level: Intermediate Estimated Time: 5-7 hours (Part 1: 3-4h, Part 2: 2-3h) Prerequisites: Completed Activities 07 (Classification) and 10 (Text Preprocessing)

Business Context

Every sentence we speak carries intent! A chatbot uses NLP to:

Understand the intent behind user messages
Match intents to appropriate responses
Create natural, helpful conversations

User Message	Intent	Bot Response
"Hi there!"	Greeting	"Hello! How can I help you?"
"What time is it?"	TimeQuery	"It's 3:00 PM"
"Thank you!"	Thanks	"You're welcome!"

Getting Started (See Results in 30 Seconds!)

Option One: Explore the Complete Solution First (Recommended)

Open v1-baseline-100percent.ipynb in Google Colab or Jupyter
Run all cells (Runtime -> Run all)
Observe the complete NLP pipeline: Preprocess -> Vectorize -> Train -> Predict
Test the chatbot interface at the end!

Option 2: Jump Straight to the Challenge

Open project-04-chatbot.ipynb
Run the setup cells to see working examples
Complete TODOs in order to build your own chatbot

What's Already Working

Part 1 - Jupyter Notebook (65% complete):

Data download and JSON loading
NLTK imports and Snowball stemmer setup
Data structure declarations (intent_list, train_data, responses)
CountVectorizer declaration
Classifier imports (KNN, DecisionTree, NaiveBayes)
Bot response framework and UI loop

Part 2 - Streamlit App (files in streamlit/ folder):

Complete chatbot.py with all model functions
Working intents.json dataset
app.py template with UI structure

Your Tasks (8 TODOs to Complete)

Part One: NLP Pipeline (6 TODOs)

TODO	Task	Difficulty	Estimated Time
1	Complete text_preprocessing function	Medium	15 min
2	Process dataset and fill train_data/train_label	Medium	15 min
3	Create vocabulary and Bag of Words	Easy	10 min
4	Train all 3 classifiers	Easy	10 min
5	Preprocess and predict test sentence	Medium	10 min
6	Complete bot_respond function	Hard	20 min

Part 2: Streamlit UI (2 TODOs)

TODO	Task	Difficulty	Estimated Time
7	Create sidebar with page navigation	Medium	15 min
8	Implement Chatbot page with text input	Medium	15 min

Success Criteria

You've successfully completed this project when:

text_preprocessing returns "we all agre it was a magnific even" for test sentence
All 3 classifiers trained without errors
"Hello there" correctly predicts "Greeting" intent
Chatbot responds appropriately to various inputs
Streamlit app runs and displays chat interface
At least one Advanced Challenge attempted

Key Concepts Applied

One. Text Preprocessing Pipeline

python

def text_preprocessing(sentence):
    tokens = nltk.word_tokenize(sentence)  # Tokenize
    stem_tokens = []
    for token in tokens:
        stem_tokens.append(stemmer.stem(token.lower()))  # Stem
    # Remove punctuation, return joined string

2. Bag of Words (Feature Extraction)

python

from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
vectorizer.fit(train_data)  # Build vocabulary
train_data_bow = vectorizer.transform(train_data)  # Convert to vectors

3. Intent Classification

python

# Train multiple classifiers
clf_knn.fit(train_data_bow, train_label)
clf_dt.fit(train_data_bow, train_label)
clf_nb.fit(train_data_bow, train_label)

# Predict intent
predicted = clf_nb.predict(user_query_bow)  # Returns ['Greeting']

4. Response Generation

python

def bot_respond(user_query):
    # 1. Preprocess query
    # 2. Convert to bag of words
    # 3. Predict intent
    # 4. Select random response for that intent
    return responses[predicted[0]][random_index]

Extension Challenges

Personalized Responses
- Store username and replace <HUMAN> placeholder in responses
- Add user memory across conversation
Expand Knowledge Base
- Add new intents to intents.json (e.g., "FavoriteFood", "Weather")
- Retrain model with expanded dataset
Model Comparison
- Calculate accuracy for each classifier
- Create comparison table showing which model works best
Context Awareness
- Remember previous intent
- Provide context-relevant follow-up responses

Tips for Success

Run v1 baseline first: See the complete solution to understand the workflow
Test preprocessing separately: Verify each step produces expected output
Bag of Words is key: Understand how text becomes numbers
Random state matters: Use random_state=33 for Decision Tree consistency
Streamlit reruns entire script: Don't be surprised when page refreshes!

Common Issues & Solutions

Issue: "NameError: stemmer is not defined"

Solution: Make sure you declared snowballStemmer = snowball.SnowballStemmer("english")

Issue: Preprocessing output doesn't match expected

Solution: Check you're: (1) lowercasing, (2) stemming, (3) removing ALL punctuation

Issue: "KeyError" when getting response

Solution: Ensure predicted intent exists in responses dictionary

Issue: Streamlit "No module named"

Solution: Install with pip install streamlit nltk sklearn pandas numpy

Expected Outputs

Text Preprocessing Test

vbnet

Input: 'We all agreed, it was a magnificent evening.'
Output: 'we all agre it was a magnific even'

Model Prediction

yaml

Test: "Hello there"
KNN Prediction: Greeting
Decision Tree Prediction: Greeting
Naive Bayes Prediction: Greeting

Chatbot Interaction

vbnet

You: Hello there
Alex: Hi human, please tell me your Alex user

You: What time is it?
Alex: It's 15:30:45

You: Thank you
Alex: You're welcome!

File Structure

bash

project-04-chatbot/
├── README.md                       # This file
├── project-04-chatbot.ipynb        # Student template (65-70%)
├── v1-baseline-100percent.ipynb    # Complete solution
├── intents.json                    # Training dataset
└── streamlit/
    ├── chatbot.py                  # Core chatbot functions
    └── app.py                      # Streamlit UI template

Additional Resources

Ready to build your chatbot? Open project-04-chatbot.ipynb and start building!

Template 4: Chatbot

📦 Project Files Included:

Project 04: Chatbot - Discovery Challenge

Learning Objectives

Business Context

Getting Started (See Results in 30 Seconds!)

Option One: Explore the Complete Solution First (Recommended)

Option 2: Jump Straight to the Challenge

What's Already Working

Your Tasks (8 TODOs to Complete)

Part One: NLP Pipeline (6 TODOs)

Part 2: Streamlit UI (2 TODOs)

Success Criteria

Key Concepts Applied

One. Text Preprocessing Pipeline

2. Bag of Words (Feature Extraction)

3. Intent Classification

4. Response Generation

Extension Challenges

Tips for Success

Common Issues & Solutions

Issue: "NameError: stemmer is not defined"

Issue: Preprocessing output doesn't match expected

Issue: "KeyError" when getting response

Issue: Streamlit "No module named"

Expected Outputs

Text Preprocessing Test

Model Prediction

Chatbot Interaction

File Structure

Additional Resources