By the end of this lesson, you will be able to:
pd.concat()
function to combine dataframesℹ️ Data grouping is the process of organizing individual data points into groups for easier analysis.
Data joining is the process of combining data from two or more tables into a single table.
Data grouping helps us:
Example: Imagine you collect sleep data from students of different ages. You can group them by age (6-8 years, 9-11 years, 12-14 years) to see which age group sleeps the most!
Data joining helps us:
💡 Think of data joining like putting puzzle pieces together - each table has part of the information, and joining them gives us the complete picture!
🔄 Reverse Thinking
Reverse thinking means starting with what you want and working backwards!
Here's how it works:
- Look at your goal - What do you want your final data to look like?
- Work backwards - What steps do you need to get there?
- Break it down - Divide the problem into smaller, easier parts
The figure below shows how reverse thinking helps us plan our data analysis:
🐼 Getting Started with Pandas
Pandas makes joining data fast and easy! Let's learn by example. note Make sure you have Pandas installed and imported:
import pandas as pd
We'll use this student score data to practice:
math_score = {
"students": ["David", "Adam", "Crystal", "Edmund", "Bob"],
"math_score": [99, 87, 68, 53, 42]
}
science_score = {
"students": ["David", "Adam", "Edmund", "Bob", "Crystal"],
"science_score": [86, 78, 70, 51, 50]
}
First, we convert our dictionaries into Pandas DataFrames:
df_math = pd.DataFrame(math_score)
df_science = pd.DataFrame(science_score)
Pandas joins data based on the index. We need to tell Pandas which column to use as the "matching key".
💡 Tip The index is like a name tag - it helps Pandas know which rows belong together!
We'll use the "students" column as our index:
df_math = df_math.set_index("students")
df_science = df_science.set_index("students")
Now comes the fun part - joining our tables together!
We use pd.concat()
to combine DataFrames:
[df_math, df_science]
axis=1
to join them side by side (columns)Here's the magic:
pd.concat([df_math, df_science],axis=1)
Expected output:
math_score science_score
David 99 86
Adam 87 78
Crystal 68 50
Edmund 53 70
Bob 42 51
In this lesson, you learned:
pd.concat()
Practice with AI! Try these prompts to explore more:
Try creating your own data about your classmates' favorite subjects and test scores. Then practice grouping and joining the data!