Self-paced

Explore our extensive collection of courses designed to help you master various subjects and skills. Whether you're a beginner or an advanced learner, there's something here for everyone.

Bootcamp

Learn live

Upcoming live events

Learning library

For all the self-taught geeks out there, here is our content library with most of the learning materials we have produced throughout the years.

It makes sense to start learning by reading and watching videos about fundamentals and how things work.

Search from all Lessons

← Back to Lessons
Edit on Github
Open in Colab

# Pandas exercises and solutions using Jupiter Notebooks¶

The following list below are solutions for the pandas exercises given in the pandas python lesson at 4Geeks.com, click here to access the exercise instructions.

In [ ]:
import numpy as np
import pandas as pd

np.random.seed(42)


#### Exercise 01¶

In [2]:
# From list
l = [1, 2, 3, 4, 5, 6]
serie = pd.Series(l)
print(serie)

# From NumPy array
array = np.array([1, 2, 3, 4, 5, 6])
serie = pd.Series(array)
print(serie)

# From dictionary
d = {"A": 1, "B": 2, "C": 3}
serie = pd.Series(d)
print(serie)

0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64
0    1
1    2
2    3
3    4
4    5
5    6
dtype: int64
A    1
B    2
C    3
dtype: int64


#### Exercise 02¶

In [3]:
# From NumPy array
array = np.random.randint(1, 10, size = (5, 5))
dataframe = pd.DataFrame(array)
dataframe

Out[3]:
01234
074857
137854
288365
328625
416913
In [4]:
# From dictionary
d = {
"A": np.random.randint(10, 100, size = 5),
"B": np.linspace(1, 10, 5),
"C": np.random.randn(5)
}
dataframe = pd.DataFrame(d)
dataframe

Out[4]:
ABC
0641.00-0.600254
1733.250.947440
2125.500.291034
3607.75-0.635560
41610.00-1.021552
In [5]:
# From list of tuples
t = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
dataframe = pd.DataFrame(t)
dataframe

Out[5]:
012
0123
1456
2789

#### Exercise 03¶

In [6]:
s1 = pd.Series([1, 2, 3, 4, 5])
s2 = pd.Series([4, 5, 6, 7, 8])

# Method 1
dataframe = pd.DataFrame({"ser1": s1, "ser2": s2})
dataframe = pd.DataFrame({"ser1": s1, "ser2": s2}, index = s1.index)
dataframe

Out[6]:
ser1ser2
014
125
236
347
458
In [7]:
# Method 2
dataframe = pd.concat([s1, s2], axis = 1)
dataframe

Out[7]:
01
014
125
236
347
458
In [8]:
# Method 3
s1.name = "ser1"
s2.name = "ser2"

dataframe = s1.to_frame().join(s2)
dataframe

Out[8]:
ser1ser2
014
125
236
347
458

#### Exercise 04¶

In [9]:
s1 = pd.Series([1, 2, 3, 4, 5])
s2 = pd.Series([4, 5, 6, 7, 8])

# Method 1: Using Pandas function
filtering_results = s1.isin(s2)
indices = s1[filtering_results].index

indices

Out[9]:
Index([3, 4], dtype='int64')
In [10]:
# Method 2: Using NumPy function
indices = np.where(s1.isin(s2))
indices

Out[10]:
(array([3, 4]),)
In [11]:
# Method 3: Using Python
indices = []

for value in s1.values:
if value in s2.values:
indices.append(s1[s1 == value].index[0])
indices

Out[11]:
[3, 4]

#### Exercise 05¶

In [12]:
s1 = pd.Series([1, 2, 3, 4, 5])
s2 = pd.Series([4, 5, 6, 7, 8])

# Method 1
unique_s1 = s1[~s1.isin(s2)]
unique_s2 = s2[~s2.isin(s1)]

unique_elements = np.concatenate([unique_s1, unique_s2])
unique_elements

Out[12]:
array([1, 2, 3, 6, 7, 8])
In [13]:
# Method 2
concat = pd.concat([s1, s2])
unique_elements = concat[~concat.duplicated(keep = False)].values
unique_elements

Out[13]:
array([1, 2, 3, 6, 7, 8])

#### Exercise 06¶

In [14]:
df = pd.DataFrame(np.random.rand(10, 5) * 10, columns = [f"Col {i}" for i in range(5)])
df

Out[14]:
Col 0Col 1Col 2Col 3Col 4
04.9517690.3438859.0932042.5878006.625223
13.1171115.2006805.4671031.8485459.695846
27.7513289.3949898.9482745.9790009.218742
30.8849251.9598290.4522733.2533033.886773
42.7134908.2873753.5675332.8093455.426961
51.4092428.0219700.7455069.8688697.722448
61.9871570.0552218.1546147.0685737.290072
77.7127030.7404473.5846571.1586918.631034
86.2329813.3089800.6355843.1098233.251833
97.2960626.3755758.8721274.7221491.195942
In [15]:
df.sort_values("Col 0")

Out[15]:
Col 0Col 1Col 2Col 3Col 4
30.8849251.9598290.4522733.2533033.886773
51.4092428.0219700.7455069.8688697.722448
61.9871570.0552218.1546147.0685737.290072
42.7134908.2873753.5675332.8093455.426961
13.1171115.2006805.4671031.8485459.695846
04.9517690.3438859.0932042.5878006.625223
86.2329813.3089800.6355843.1098233.251833
97.2960626.3755758.8721274.7221491.195942
77.7127030.7404473.5846571.1586918.631034
27.7513289.3949898.9482745.9790009.218742
In [16]:
df.sort_values(by = ["Col 2", "Col 4"])

Out[16]:
Col 0Col 1Col 2Col 3Col 4
30.8849251.9598290.4522733.2533033.886773
86.2329813.3089800.6355843.1098233.251833
51.4092428.0219700.7455069.8688697.722448
42.7134908.2873753.5675332.8093455.426961
77.7127030.7404473.5846571.1586918.631034
13.1171115.2006805.4671031.8485459.695846
61.9871570.0552218.1546147.0685737.290072
97.2960626.3755758.8721274.7221491.195942
27.7513289.3949898.9482745.9790009.218742
04.9517690.3438859.0932042.5878006.625223

#### Exercise 07¶

In [17]:
df.columns = [f"{i}_column" for i in range(5)]
df

Out[17]:
0_column1_column2_column3_column4_column
04.9517690.3438859.0932042.5878006.625223
13.1171115.2006805.4671031.8485459.695846
27.7513289.3949898.9482745.9790009.218742
30.8849251.9598290.4522733.2533033.886773
42.7134908.2873753.5675332.8093455.426961
51.4092428.0219700.7455069.8688697.722448
61.9871570.0552218.1546147.0685737.290072
77.7127030.7404473.5846571.1586918.631034
86.2329813.3089800.6355843.1098233.251833
97.2960626.3755758.8721274.7221491.195942

#### Exercise 08¶

In [18]:
df.index = [f"{i}_row" for i in range(10)]
df

Out[18]:
0_column1_column2_column3_column4_column
0_row4.9517690.3438859.0932042.5878006.625223
1_row3.1171115.2006805.4671031.8485459.695846
2_row7.7513289.3949898.9482745.9790009.218742
3_row0.8849251.9598290.4522733.2533033.886773
4_row2.7134908.2873753.5675332.8093455.426961
5_row1.4092428.0219700.7455069.8688697.722448
6_row1.9871570.0552218.1546147.0685737.290072
7_row7.7127030.7404473.5846571.1586918.631034
8_row6.2329813.3089800.6355843.1098233.251833
9_row7.2960626.3755758.8721274.7221491.195942