Matplotlib is a Python visualization library that provides a variety of tools and functions for creating static, animated and interactive graphics and visualizations. It is one of the most popular and widely used libraries in the Python community.
pyplot
is a module of the Matplotlib library that provides a simple and intuitive interface for creating plots. It is typically the module used by Machine Learning and data science engineers for their graphical representations. Specifically, the key points of this module are:
import numpy as np
X = np.linspace(0, 10, 100)
y = np.sin(X)
z = np.cos(X)
The line plot represents information at points connected by lines. It is useful to show the evolution of one or more data series along an axis, typically time,
import matplotlib.pyplot as plt
plt.figure(figsize = (10, 5))
plt.plot(X, y, label = "Sen X")
plt.plot(X, z, label = "Cos X")
plt.title("Line plot")
plt.legend()
plt.show()
The scatter plot shows individual values of two numerical variables on a Cartesian plane (with two axes). Each point represents one observation.
plt.figure(figsize = (10, 5))
plt.scatter(X, y, label = "Sen X")
plt.title("Scatter plot")
plt.legend()
plt.show()
The histogram represents the distribution of a numerical variable by dividing the range of data into intervals and showing how many data fall into each interval (for continuous variables) or the frequency of each category (for categorical variables).
data = np.random.randn(1000)
plt.figure(figsize = (10, 5))
plt.hist(data, bins = 30, alpha = 0.7)
plt.title("Histogram")
plt.show()
The bar plot represents categorical data with rectangular bars with heights (or lengths, in the case of horizontal bars) proportional to the values they represent.
labels = ["A", "B", "C", "D"]
values = [10, 20, 15, 30]
plt.figure(figsize = (10, 5))
plt.bar(labels, values)
plt.title("Bar chart")
plt.show()
A pie chart represents data in circular sectors, where each sector corresponds to a category and its size is proportional to the value it represents.
labels = ["A", "B", "C", "D"]
sizes = [215, 130, 245, 210]
plt.figure(figsize = (7, 7))
plt.pie(sizes, labels = labels)
plt.title("Pie chart")
plt.show()
A boxplot shows the distribution of quantitative data by its quartiles and possibly outliers.
The ends of the box indicate the lower and upper quartiles, while the line inside the box indicates the median.
data = np.random.randn(1000)
plt.figure(figsize = (10, 5))
plt.boxplot(data)
plt.title("Boxplot")
plt.show()
x = [1, 2, 3, 4], y = [1, 2, 0, 0.5]
(★☆☆)¶
NOTE: You can find the dataset at https://raw.githubusercontent.com/cvazquezlos/machine-learning-prework/main/04-matplotlib/assets/titanic_train.csv