By the end of this lesson, you will be able to:
pd.concat()
function to combine dataframes:information_source: Data grouping is the process of organizing individual data points into groups for easier analysis.
Data joining is the process of combining data from two or more tables into a single table.
Data grouping helps us:
Example: Imagine you collect sleep data from students of different ages. You can group them by age (6-8 years, 9-11 years, 12-14 years) to see which age group sleeps the most!
Data joining helps us:
:bulb: Think of data joining like putting puzzle pieces together - each table has part of the information, and joining them gives us the complete picture!
:emoji: Reverse Thinking
Reverse thinking means starting with what you want and working backwards!
Here's how it works:
- Look at your goal - What do you want your final data to look like?
- Work backwards - What steps do you need to get there?
- Break it down - Divide the problem into smaller, easier parts
The figure below shows how reverse thinking helps us plan our data analysis:
:emoji: Getting Started with Pandas
Pandas makes joining data fast and easy! Let's learn by example. note Make sure you have Pandas installed and imported:
import pandas as pd
We'll use this student score data to practice:
math_score = {
"students": ["David", "Adam", "Crystal", "Edmund", "Bob"],
"math_score": [99, 87, 68, 53, 42]
}
science_score = {
"students": ["David", "Adam", "Edmund", "Bob", "Crystal"],
"science_score": [86, 78, 70, 51, 50]
}
First, we convert our dictionaries into Pandas DataFrames:
df_math = pd.DataFrame(math_score)
df_science = pd.DataFrame(science_score)
Pandas joins data based on the index. We need to tell Pandas which column to use as the "matching key".
:bulb: Tip The index is like a name tag - it helps Pandas know which rows belong together!
We'll use the "students" column as our index:
df_math = df_math.set_index("students")
df_science = df_science.set_index("students")
Now comes the fun part - joining our tables together!
We use pd.concat()
to combine DataFrames:
[df_math, df_science]
axis=1
to join them side by side (columns)Here's the magic:
pd.concat([df_math, df_science],axis=1)
Expected output:
math_score science_score
David 99 86
Adam 87 78
Crystal 68 50
Edmund 53 70
Bob 42 51
In this lesson, you learned:
pd.concat()
Practice with AI! Try these prompts to explore more:
Try creating your own data about your classmates' favorite subjects and test scores. Then practice grouping and joining the data!