Demo Mode

Project 1 of 7

Project 1: Instagram Reach Analysis

Apply your knowledge to build something amazing!

Instagram Reach Prediction Project

:information_source: Project Overview Difficulty Level: Intermediate
Estimated Time: 2-3 hours
Skills Practiced:

Data exploration and cleaning

Feature scaling and preparation

Linear regression modeling

Model evaluation and testing

Python programming with pandas, numpy, and scikit-learn

In this exciting project, you'll become a social media data scientist! :rocket: You will use your knowledge in Data Preparation and Regression to create a regression model that predicts the amount of engagement an instagram post has gained.

:clipboard: Project Roadmap

mermaid

graph LR
    A[Phase 1: Setup & Import] --> B[Phase 2: Data Exploration]
    B --> C[Phase 3: Data Preparation]
    C --> D[Phase 4: Model Building]
    D --> E[Bonus Challenges]
    
    style A fill:#e1f5fe
    style B fill:#fff9c4
    style C fill:#f3e5f5
    style D fill:#e8f5e9
    style E fill:#ffe0b2

Before coding, make sure to create a new Google Colab Notebook named "P1_InstagramReachPrediction.ipynb" and do the coding inside.

Project Overview

Instagram is one of the most popular social media applications today. People use Instagram professionally to promote their business, building a portfolio, blogging, and creating various kinds of content. For these people, it is important to know how well their instagram posts are doing.

One way to measure how successful their instagram posts are is through the amount of interactions / reach the posts have gained. These interactions can come in the form of:

Likes
Comments
Shares
Profile Visits
Increase in Follows

Based on these values, we can generate a number known as engagement rate to serve as an all-in-one measure of the reach the posts have gained.

The dataset for this project is collected by a data scientist named Aman Kharwal for instagram reach prediction purposes. It contains information about 99 instagram posts as well as its engagement rate.

Likes	Comments	Shares	Saves	Profile Visits	Follows	Engagement
162.0	9.0	5.0	98.0	35.0	2.0	3920.0
224.0	7.0	14.0	194.0	48.0	10.0	5394.0
131.0	11.0	One.0	41.0	62.0	12.0	4021.0

To learn more about the various metrics of measuring an instagram post's success, you may go through this article.

Getting Started

:bulb: Before You Begin Make sure you have:

A Google account to use Google Colab

Basic understanding of Python programming

Completed lessons on Data Preparation and Regression

Make a copy of the template file found here and rename the copied file as "P1: Instagram Reach Analysis.ipynb".
Run the code here to download the dataset onto your collab file. Wait for the download process to complete before proceeding to the next step.
python
```
# Download file
!wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=185l3XDtStnSPdtvenICqoB_--Yqbf7SG' -O 'Instagram.csv'
```
:warning: Common Issue

If the download fails, check your internet connection and try again. The file should be named 'Instagram.csv' in your Colab environment.

You may start to do your coding here:

Phase One: Import Dependencies & Data Reading

info Milestone Checkpoint 1 By the end of this phase, you should have:

All required libraries imported
Dataset loaded successfully
No error messages in your notebook

Import the necessary libraries to complete the project. These include:
1. Numpy library for processing numerical arrays
2. Pandas for reading and manipulating dataset from csv files
3. Matplotlib for drawing and visualizing dataset
python
```
# Import basic libraries for data manipulation and visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
```
Run the next set of imports to ensure that all the necessary libraries are installed.
python
```
# Import additional visualization libraries
import seaborn as sns
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
```
:bulb: Best Practice Always import libraries at the beginning of your notebook. This makes it easy to see all dependencies at a glance.
Read the csv file "Instagram.csv" and save it as instagram_data. When reading the file, set the encoding of the file as latin1. (Remarks: You may copy the following code.)
python
```
# Load the Instagram dataset with proper encoding
instagram_data = pd.read_csv("Instagram.csv", encoding='latin1')
```
:warning: Encoding Alert

The encoding='latin1' parameter is crucial here! Without it, you might get encoding errors because the dataset contains special characters.

Phase 2: Explore the Data

Time to become a data detective! :mag: Let's explore what's in our Instagram dataset.

Please refer to Chapter 5: Data Preparation to complete the following steps. info Milestone Checkpoint 2 By the end of this phase, you should:

Understand the structure of your data
Have cleaned any missing values
See visual representations of the data
Know which features correlate with engagement

Print the first 5 rows of instagram_data.

python

# Display the first 5 rows to understand the data structure
instagram_data.head()

Print more information about instagram_data.

python

# Get detailed information about the dataset
instagram_data.info()

Check how many missing values are in the instagram_data.

python

# Check for missing values in each column
instagram_data.isnull().sum()

If there are any missing values in the dataset, remove them.
python
```
# Remove rows with missing values
instagram_data = instagram_data.dropna()
```
:warning: Data Cleaning Alert

Always check your data size before and after removing missing values. You don't want to accidentally delete too much data!

Run the following codes to show the most commonly used words in the instagram posts given in the dataset:
python
# Most commonly used words in these Instagram posts
text = " ".join(i for i in instagram_data.Caption)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text)
plt.style.use('classic')
plt.figure(figsize=(12,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
:bulb: Understanding Word Clouds The bigger the word appears, the more frequently it's used in Instagram captions. This helps identify popular topics!
Run the following codes to show the most commonly used hashtags in the instagram posts given in the dataset.
python
# Most commonly used hashtags in the instagram posts
text = " ".join(i for i in instagram_data.Hashtags)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text)
plt.figure(figsize=(12,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Run the following codes to find the relationship between the features and the engagement rate of instagram posts.
python
# Find the relationship between the engagement and the other features in the dataset
correlation = instagram_data.corr()
print(correlation["Engagement"].sort_values(ascending=False))
:bulb: Correlation Insights Values close to 1 mean strong positive correlation (when one goes up, the other goes up too). Values close to -1 mean strong negative correlation. Values near 0 mean little to no correlation.
Phase 3: Data Preparation

Now let's prepare our data for machine learning! This is a crucial step that can make or break your model. :dart:

Please refer to Chapter 5: Data Preparation to complete the following steps. info Milestone Checkpoint 3 By the end of this phase, you should have:

Features (x) and labels (y) separated
Data properly scaled between 0 and 1
Data split into training and testing sets
Correct shapes for all datasets

Run the following codes to generate the dataset x and its labels y.

python

# Generate the dataset (x) and its labels (y)
# x contains the features we'll use to predict engagement
# y contains the engagement values we want to predict
x = np.array(instagram_data[['Likes', 'Comments', 'Shares', 'Profile Visits', 'Follows']])
y = np.array(instagram_data['Engagement'])

:bulb: Understanding Features vs Labels

Features (x): The information we use to make predictions (likes, comments, etc.)

Labels (y): What we're trying to predict (engagement rate)

Type in the following codes here:

python

# Import the MinMaxScaler from sklearn
from sklearn.preprocessing import MinMaxScaler

# Create a scaler object
scaler = MinMaxScaler()

# Fit the scaler to our data (learns the min and max values)
scaler.fit(x)

# Transform the data to scale it between 0 and 1
x_scaled = scaler.transform(x)

:warning: Why Scale Data?

Machine learning algorithms work better when all features are on the same scale. Without scaling, features with larger values (like Likes) might dominate features with smaller values (like Comments).

Print the values of x_scaled.
python
# Check that our data is now scaled between 0 and 1
print("Scaled data sample:")
print(x_scaled[:5])  # Show first 5 rows
Split the dataset x_scaled and its labels y into training and test sets. Set the test size to be 0.33 and the random state to be 42.
python
# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split the data: 67% for training, 33% for testing
x_train, x_test, y_train, y_test = train_test_split(
    x_scaled, y, test_size=0.33, random_state=42
)
:bulb: Random State Explained Setting random_state=42 ensures everyone gets the same random split. It's like setting a seed for reproducibility!
Use variable shape to check your answer.
python
print("Dataset shapes:")
print(f"x_train shape: {x_train.shape}")
print(f"x_test shape: {x_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {y_test.shape}")
Expected output:
output
x_train shape: (66, 5)
x_test shape: (33, 5)
y_train shape: (66,)
y_test shape: (33,)
:warning: Debugging Tip If your shapes don't match, check:

Did you remove missing values in Phase 2?

Did you use the correct test_size (0.33)?

Did you scale the data before splitting?
Phase 4: Instagram Reach Prediction Model

Time to build your AI model! This is where the magic happens. 🪄

Please refer to Chapter 6: Regression to complete the following steps. info Milestone Checkpoint 4 By the end of this phase, you should have:

A trained Linear Regression model
Model accuracy above 80%
Understanding of model coefficients
Ability to make predictions on new data

Create and evaluate the Instagram Reach Prediction Model through the following codes. (Remarks: Type in your codes for this step here)

python

# Import LinearRegression from sklearn
from sklearn.linear_model import LinearRegression

# Create a linear regression model
model = LinearRegression()

# Train the model with our training data
model.fit(x_train, y_train)

# Evaluate the model's performance on test data
accuracy = model.score(x_test, y_test)
print(f"Model Accuracy: {accuracy:.4f}")

Expected output:

output

Model Accuracy: 0.8461

:bulb: Understanding Model Accuracy An accuracy of 0.8461 means our model explains about 84.61% of the variation in engagement. That's pretty good! In real-world projects, anything above 80% is often considered successful.

Retrieve and print the gradient / slope of model.
python
```
# Get the coefficients (slopes) for each feature
print("Model coefficients:")
print(model.coef_)
```
Expected output:
output
```
[115.67681478 1898.72802154 -394.46756815 -495.96781512 748.19781634]
```
:bulb: Interpreting Coefficients Each coefficient tells us how much engagement changes when that feature increases by 1 unit:
- Positive values = feature increases engagement
- Negative values = feature decreases engagement
- Larger absolute values = stronger impact

Retrieve and print the y-intercept of model.

python

# Get the y-intercept (base engagement when all features are 0)
print(f"Model intercept: {model.intercept_}")

Expected output:

output

Model intercept: 2358.9711885315516

Test the model with the custom instagram post data shown below:
- Likes: 282.0
- Comments: 4.0
- Shares: 9.0
- Profile Visits: 165.0
- Follows: 54.0
:warning: Data Order Alert

Make sure your features are in the correct order: [Likes, Comments, Shares, Profile Visits, Follows]

python
# Create test data for a new Instagram post
# Note: Order must match our training data
features = np.array([[282.0, 4.0, 9.0, 165.0, 54.0]])

# Scale the features using our fitted scaler
features_scaled = scaler.transform(features)

# Make a prediction
predicted_engagement = model.predict(features_scaled)
print(f"Predicted engagement: {predicted_engagement[0]:.2f}")
The output should be around 9300 - 9500.

:bulb: Real-World Application You've just built a tool that Instagram influencers could use to predict how well their posts will perform! :tada:

:star2: Extension Challenges

Ready to level up? Here are some bonus challenges to push your skills further!

Advanced Challenge One: User Interface

Create an interactive tool that anyone can use! info Challenge Goal Build a user-friendly interface that allows anyone to predict their Instagram post engagement without knowing how to code.

Prompt the users to key in the following information about their instagram posts.

python

# Create an interactive engagement predictor
print("=== Instagram Engagement Predictor ===")
print("Enter your post statistics below:\n")

# Collect user inputs with validation
try:
    likes = float(input("Number of Likes: "))
    comments = float(input("Number of Comments: "))
    shares = float(input("Number of Shares: "))
    profile_visits = float(input("Profile Visits from this post: "))
    follows = float(input("New Follows from this post: "))
except ValueError:
    print("Please enter valid numbers!")

:warning: Input Validation

Always validate user inputs! Real users might enter text instead of numbers, so handle errors gracefully.

Save all the user input as a numpy array input_data.

python

# Save all user inputs into a single numpy array
# Note: We removed 'saves' to match our model's features
input_data = np.array([[likes, comments, shares, profile_visits, follows]])

Scale the data and make predictions.

python

# Scale the input data
input_data_scaled = scaler.transform(input_data)

# Make prediction
predicted_engagement = model.predict(input_data_scaled)

# Display result in a user-friendly way
print(f"\n📊 Predicted Engagement: {predicted_engagement[0]:.0f}")
print(f"🎯 Engagement Level: ", end="")

if predicted_engagement[0] > 10000:
    print("🔥 Viral potential!")
elif predicted_engagement[0] > 5000:
    print("⭐ Great engagement!")
else:
    print("💪 Keep creating!")

Advanced Challenge 2: Model Evaluation

Let's dive deeper into understanding how well our model performs! info Challenge Goal Learn to evaluate your model using different metrics to understand its strengths and weaknesses.

Find the mean absolute error (MAE), mean square error(MSE) and root mean square error (RMSE) of model:

python

# Import metrics from scikit-learn
from sklearn import metrics

# Make predictions on test data
y_pred = model.predict(x_test)

# Calculate different error metrics
mae = metrics.mean_absolute_error(y_test, y_pred)
mse = metrics.mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

# Display results with explanations
print("Model Evaluation Metrics:")
print(f"MAE (Mean Absolute Error): {mae:.2f}")
print(f"MSE (Mean Squared Error): {mse:.2f}")
print(f"RMSE (Root Mean Squared Error): {rmse:.2f}")

Expected output (approximately):

output

MAE (Mean Absolute Error): 741.67
MSE (Mean Squared Error): 1171979.70
RMSE (Root Mean Squared Error): 1082.59

:bulb: Understanding Error Metrics

MAE: Average prediction error (in engagement units)

MSE: Penalizes large errors more heavily

RMSE: Same units as engagement, easier to interpret

Lower values = better model performance!

:dart: Debugging Tips

Having trouble? Here are common issues and solutions:

Import Errors

python

# If you get "No module named 'sklearn'"
!pip install scikit-learn

Shape Mismatch Errors
- Check that features are in the correct order
- Ensure you're using the scaled data
- Verify you have 5 features (not 6)
Low Model Accuracy
- Make sure you cleaned missing values
- Check that scaling was done correctly
- Verify train/test split parameters

:trophy: Project Complete!

Congratulations! You've successfully built an Instagram Engagement Predictor using machine learning. You've learned:

How to prepare real-world data for ML
How to build and evaluate a regression model
How to make predictions on new data
How to create user-friendly interfaces

Next Steps

Try collecting your own Instagram data
Experiment with different features
Build a web app using Streamlit
Share your predictor with friends!

Project 1 of 7

Project 1: Instagram Reach Analysis

Apply your knowledge to build something amazing!

Instagram Reach Prediction Project

:information_source: Project Overview Difficulty Level: Intermediate
Estimated Time: 2-3 hours
Skills Practiced:

Data exploration and cleaning

Feature scaling and preparation

Linear regression modeling

Model evaluation and testing

Python programming with pandas, numpy, and scikit-learn

:clipboard: Project Roadmap

mermaid

graph LR
    A[Phase 1: Setup & Import] --> B[Phase 2: Data Exploration]
    B --> C[Phase 3: Data Preparation]
    C --> D[Phase 4: Model Building]
    D --> E[Bonus Challenges]
    
    style A fill:#e1f5fe
    style B fill:#fff9c4
    style C fill:#f3e5f5
    style D fill:#e8f5e9
    style E fill:#ffe0b2

Before coding, make sure to create a new Google Colab Notebook named "P1_InstagramReachPrediction.ipynb" and do the coding inside.

Project Overview

One way to measure how successful their instagram posts are is through the amount of interactions / reach the posts have gained. These interactions can come in the form of:

Likes
Comments
Shares
Profile Visits
Increase in Follows

Based on these values, we can generate a number known as engagement rate to serve as an all-in-one measure of the reach the posts have gained.

Likes	Comments	Shares	Saves	Profile Visits	Follows	Engagement
162.0	9.0	5.0	98.0	35.0	2.0	3920.0
224.0	7.0	14.0	194.0	48.0	10.0	5394.0
131.0	11.0	One.0	41.0	62.0	12.0	4021.0

To learn more about the various metrics of measuring an instagram post's success, you may go through this article.

Getting Started

:bulb: Before You Begin Make sure you have:

A Google account to use Google Colab

Basic understanding of Python programming

Completed lessons on Data Preparation and Regression

Make a copy of the template file found here and rename the copied file as "P1: Instagram Reach Analysis.ipynb".
Run the code here to download the dataset onto your collab file. Wait for the download process to complete before proceeding to the next step.
python
```
# Download file
!wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=185l3XDtStnSPdtvenICqoB_--Yqbf7SG' -O 'Instagram.csv'
```
:warning: Common Issue

If the download fails, check your internet connection and try again. The file should be named 'Instagram.csv' in your Colab environment.

You may start to do your coding here:

Phase One: Import Dependencies & Data Reading

info Milestone Checkpoint 1 By the end of this phase, you should have:

All required libraries imported
Dataset loaded successfully
No error messages in your notebook

Import the necessary libraries to complete the project. These include:
1. Numpy library for processing numerical arrays
2. Pandas for reading and manipulating dataset from csv files
3. Matplotlib for drawing and visualizing dataset
python
```
# Import basic libraries for data manipulation and visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
```
Run the next set of imports to ensure that all the necessary libraries are installed.
python
```
# Import additional visualization libraries
import seaborn as sns
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
```
:bulb: Best Practice Always import libraries at the beginning of your notebook. This makes it easy to see all dependencies at a glance.
Read the csv file "Instagram.csv" and save it as instagram_data. When reading the file, set the encoding of the file as latin1. (Remarks: You may copy the following code.)
python
```
# Load the Instagram dataset with proper encoding
instagram_data = pd.read_csv("Instagram.csv", encoding='latin1')
```
:warning: Encoding Alert

The encoding='latin1' parameter is crucial here! Without it, you might get encoding errors because the dataset contains special characters.

Phase 2: Explore the Data

Time to become a data detective! :mag: Let's explore what's in our Instagram dataset.

Please refer to Chapter 5: Data Preparation to complete the following steps. info Milestone Checkpoint 2 By the end of this phase, you should:

Understand the structure of your data
Have cleaned any missing values
See visual representations of the data
Know which features correlate with engagement

Print the first 5 rows of instagram_data.

python

# Display the first 5 rows to understand the data structure
instagram_data.head()

Print more information about instagram_data.

python

# Get detailed information about the dataset
instagram_data.info()

Check how many missing values are in the instagram_data.

python

# Check for missing values in each column
instagram_data.isnull().sum()

If there are any missing values in the dataset, remove them.
python
```
# Remove rows with missing values
instagram_data = instagram_data.dropna()
```
:warning: Data Cleaning Alert

Always check your data size before and after removing missing values. You don't want to accidentally delete too much data!

Run the following codes to show the most commonly used words in the instagram posts given in the dataset:
python
# Most commonly used words in these Instagram posts
text = " ".join(i for i in instagram_data.Caption)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text)
plt.style.use('classic')
plt.figure(figsize=(12,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
:bulb: Understanding Word Clouds The bigger the word appears, the more frequently it's used in Instagram captions. This helps identify popular topics!
Run the following codes to show the most commonly used hashtags in the instagram posts given in the dataset.
python
# Most commonly used hashtags in the instagram posts
text = " ".join(i for i in instagram_data.Hashtags)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text)
plt.figure(figsize=(12,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Run the following codes to find the relationship between the features and the engagement rate of instagram posts.
python
# Find the relationship between the engagement and the other features in the dataset
correlation = instagram_data.corr()
print(correlation["Engagement"].sort_values(ascending=False))
:bulb: Correlation Insights Values close to 1 mean strong positive correlation (when one goes up, the other goes up too). Values close to -1 mean strong negative correlation. Values near 0 mean little to no correlation.
Phase 3: Data Preparation

Now let's prepare our data for machine learning! This is a crucial step that can make or break your model. :dart:

Please refer to Chapter 5: Data Preparation to complete the following steps. info Milestone Checkpoint 3 By the end of this phase, you should have:

Features (x) and labels (y) separated
Data properly scaled between 0 and 1
Data split into training and testing sets
Correct shapes for all datasets

Run the following codes to generate the dataset x and its labels y.

python

# Generate the dataset (x) and its labels (y)
# x contains the features we'll use to predict engagement
# y contains the engagement values we want to predict
x = np.array(instagram_data[['Likes', 'Comments', 'Shares', 'Profile Visits', 'Follows']])
y = np.array(instagram_data['Engagement'])

:bulb: Understanding Features vs Labels

Features (x): The information we use to make predictions (likes, comments, etc.)

Labels (y): What we're trying to predict (engagement rate)

Type in the following codes here:

python

# Import the MinMaxScaler from sklearn
from sklearn.preprocessing import MinMaxScaler

# Create a scaler object
scaler = MinMaxScaler()

# Fit the scaler to our data (learns the min and max values)
scaler.fit(x)

# Transform the data to scale it between 0 and 1
x_scaled = scaler.transform(x)

:warning: Why Scale Data?

Machine learning algorithms work better when all features are on the same scale. Without scaling, features with larger values (like Likes) might dominate features with smaller values (like Comments).

Print the values of x_scaled.
python
# Check that our data is now scaled between 0 and 1
print("Scaled data sample:")
print(x_scaled[:5])  # Show first 5 rows
Split the dataset x_scaled and its labels y into training and test sets. Set the test size to be 0.33 and the random state to be 42.
python
# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split the data: 67% for training, 33% for testing
x_train, x_test, y_train, y_test = train_test_split(
    x_scaled, y, test_size=0.33, random_state=42
)
:bulb: Random State Explained Setting random_state=42 ensures everyone gets the same random split. It's like setting a seed for reproducibility!
Use variable shape to check your answer.
python
print("Dataset shapes:")
print(f"x_train shape: {x_train.shape}")
print(f"x_test shape: {x_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {y_test.shape}")
Expected output:
output
x_train shape: (66, 5)
x_test shape: (33, 5)
y_train shape: (66,)
y_test shape: (33,)
:warning: Debugging Tip If your shapes don't match, check:

Did you remove missing values in Phase 2?

Did you use the correct test_size (0.33)?

Did you scale the data before splitting?
Phase 4: Instagram Reach Prediction Model

Time to build your AI model! This is where the magic happens. 🪄

Please refer to Chapter 6: Regression to complete the following steps. info Milestone Checkpoint 4 By the end of this phase, you should have:

A trained Linear Regression model
Model accuracy above 80%
Understanding of model coefficients
Ability to make predictions on new data

Create and evaluate the Instagram Reach Prediction Model through the following codes. (Remarks: Type in your codes for this step here)

python

# Import LinearRegression from sklearn
from sklearn.linear_model import LinearRegression

# Create a linear regression model
model = LinearRegression()

# Train the model with our training data
model.fit(x_train, y_train)

# Evaluate the model's performance on test data
accuracy = model.score(x_test, y_test)
print(f"Model Accuracy: {accuracy:.4f}")

Expected output:

output

Model Accuracy: 0.8461

:bulb: Understanding Model Accuracy An accuracy of 0.8461 means our model explains about 84.61% of the variation in engagement. That's pretty good! In real-world projects, anything above 80% is often considered successful.

Retrieve and print the gradient / slope of model.
python
```
# Get the coefficients (slopes) for each feature
print("Model coefficients:")
print(model.coef_)
```
Expected output:
output
```
[115.67681478 1898.72802154 -394.46756815 -495.96781512 748.19781634]
```
:bulb: Interpreting Coefficients Each coefficient tells us how much engagement changes when that feature increases by 1 unit:
- Positive values = feature increases engagement
- Negative values = feature decreases engagement
- Larger absolute values = stronger impact

Retrieve and print the y-intercept of model.

python

# Get the y-intercept (base engagement when all features are 0)
print(f"Model intercept: {model.intercept_}")

Expected output:

output

Model intercept: 2358.9711885315516

Test the model with the custom instagram post data shown below:
- Likes: 282.0
- Comments: 4.0
- Shares: 9.0
- Profile Visits: 165.0
- Follows: 54.0
:warning: Data Order Alert

Make sure your features are in the correct order: [Likes, Comments, Shares, Profile Visits, Follows]

python
# Create test data for a new Instagram post
# Note: Order must match our training data
features = np.array([[282.0, 4.0, 9.0, 165.0, 54.0]])

# Scale the features using our fitted scaler
features_scaled = scaler.transform(features)

# Make a prediction
predicted_engagement = model.predict(features_scaled)
print(f"Predicted engagement: {predicted_engagement[0]:.2f}")
The output should be around 9300 - 9500.

:bulb: Real-World Application You've just built a tool that Instagram influencers could use to predict how well their posts will perform! :tada:

:star2: Extension Challenges

Ready to level up? Here are some bonus challenges to push your skills further!

Advanced Challenge One: User Interface

Create an interactive tool that anyone can use! info Challenge Goal Build a user-friendly interface that allows anyone to predict their Instagram post engagement without knowing how to code.

Prompt the users to key in the following information about their instagram posts.

python

# Create an interactive engagement predictor
print("=== Instagram Engagement Predictor ===")
print("Enter your post statistics below:\n")

# Collect user inputs with validation
try:
    likes = float(input("Number of Likes: "))
    comments = float(input("Number of Comments: "))
    shares = float(input("Number of Shares: "))
    profile_visits = float(input("Profile Visits from this post: "))
    follows = float(input("New Follows from this post: "))
except ValueError:
    print("Please enter valid numbers!")

:warning: Input Validation

Always validate user inputs! Real users might enter text instead of numbers, so handle errors gracefully.

Save all the user input as a numpy array input_data.

python

# Save all user inputs into a single numpy array
# Note: We removed 'saves' to match our model's features
input_data = np.array([[likes, comments, shares, profile_visits, follows]])

Scale the data and make predictions.

python

# Scale the input data
input_data_scaled = scaler.transform(input_data)

# Make prediction
predicted_engagement = model.predict(input_data_scaled)

# Display result in a user-friendly way
print(f"\n📊 Predicted Engagement: {predicted_engagement[0]:.0f}")
print(f"🎯 Engagement Level: ", end="")

if predicted_engagement[0] > 10000:
    print("🔥 Viral potential!")
elif predicted_engagement[0] > 5000:
    print("⭐ Great engagement!")
else:
    print("💪 Keep creating!")

Advanced Challenge 2: Model Evaluation

Let's dive deeper into understanding how well our model performs! info Challenge Goal Learn to evaluate your model using different metrics to understand its strengths and weaknesses.

Find the mean absolute error (MAE), mean square error(MSE) and root mean square error (RMSE) of model:

python

# Import metrics from scikit-learn
from sklearn import metrics

# Make predictions on test data
y_pred = model.predict(x_test)

# Calculate different error metrics
mae = metrics.mean_absolute_error(y_test, y_pred)
mse = metrics.mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

# Display results with explanations
print("Model Evaluation Metrics:")
print(f"MAE (Mean Absolute Error): {mae:.2f}")
print(f"MSE (Mean Squared Error): {mse:.2f}")
print(f"RMSE (Root Mean Squared Error): {rmse:.2f}")

Expected output (approximately):

output

MAE (Mean Absolute Error): 741.67
MSE (Mean Squared Error): 1171979.70
RMSE (Root Mean Squared Error): 1082.59

:bulb: Understanding Error Metrics

MAE: Average prediction error (in engagement units)

MSE: Penalizes large errors more heavily

RMSE: Same units as engagement, easier to interpret

Lower values = better model performance!

:dart: Debugging Tips

Having trouble? Here are common issues and solutions:

Import Errors

python

# If you get "No module named 'sklearn'"
!pip install scikit-learn

Shape Mismatch Errors
- Check that features are in the correct order
- Ensure you're using the scaled data
- Verify you have 5 features (not 6)
Low Model Accuracy
- Make sure you cleaned missing values
- Check that scaling was done correctly
- Verify train/test split parameters

:trophy: Project Complete!

Congratulations! You've successfully built an Instagram Engagement Predictor using machine learning. You've learned:

How to prepare real-world data for ML
How to build and evaluate a regression model
How to make predictions on new data
How to create user-friendly interfaces

Next Steps

Try collecting your own Instagram data
Experiment with different features
Build a web app using Streamlit
Share your predictor with friends!