python

pandas

machine-learning

matplotlib

**Matplotlib** is a Python visualization library that provides a variety of tools and functions for creating static, animated and interactive graphics and visualizations. It is one of the most popular and widely used libraries in the Python community.

`pyplot`

is a module of the Matplotlib library that provides a simple and intuitive interface for creating plots. It is typically the module used by Machine Learning and data science engineers for their graphical representations. Specifically, the key points of this module are:

**High-level interface**: pyplot offers a number of functionalities that facilitate the fast creation of graphs.**Functionality**: It offers a wide variety of functions for bar charts, dot plots, box plots, and so on.**Integration**: It is tightly integrated with environments such as Jupyter Notebook, allowing graphs to be displayed directly within notebooks.

In [1]:

```
import numpy as np
X = np.linspace(0, 10, 100)
y = np.sin(X)
z = np.cos(X)
```

The **line plot** represents information at points connected by lines. It is useful to show the evolution of one or more data series along an axis, typically time,

In [2]:

```
import matplotlib.pyplot as plt
plt.figure(figsize = (10, 5))
plt.plot(X, y, label = "Sen X")
plt.plot(X, z, label = "Cos X")
plt.title("Line plot")
plt.legend()
plt.show()
```

The **scatter plot** shows individual values of two numerical variables on a Cartesian plane (with two axes). Each point represents one observation.

In [3]:

```
plt.figure(figsize = (10, 5))
plt.scatter(X, y, label = "Sen X")
plt.title("Scatter plot")
plt.legend()
plt.show()
```

The **histogram** represents the distribution of a numerical variable by dividing the range of data into intervals and showing how many data fall into each interval (for continuous variables) or the frequency of each category (for categorical variables).

In [4]:

```
data = np.random.randn(1000)
plt.figure(figsize = (10, 5))
plt.hist(data, bins = 30, alpha = 0.7)
plt.title("Histogram")
plt.show()
```

The **bar plot** represents categorical data with rectangular bars with heights (or lengths, in the case of horizontal bars) proportional to the values they represent.

In [5]:

```
labels = ["A", "B", "C", "D"]
values = [10, 20, 15, 30]
plt.figure(figsize = (10, 5))
plt.bar(labels, values)
plt.title("Bar chart")
plt.show()
```

A **pie chart** represents data in circular sectors, where each sector corresponds to a category and its size is proportional to the value it represents.

In [6]:

```
labels = ["A", "B", "C", "D"]
sizes = [215, 130, 245, 210]
plt.figure(figsize = (7, 7))
plt.pie(sizes, labels = labels)
plt.title("Pie chart")
plt.show()
```

A **boxplot** shows the distribution of quantitative data by its quartiles and possibly outliers.

The ends of the box indicate the lower and upper quartiles, while the line inside the box indicates the median.

In [7]:

```
data = np.random.randn(1000)
plt.figure(figsize = (10, 5))
plt.boxplot(data)
plt.title("Boxplot")
plt.show()
```

`x = [1, 2, 3, 4], y = [1, 2, 0, 0.5]`

(★☆☆)¶In [ ]:

```
```

In [ ]:

```
```

In [ ]:

```
```

NOTE: You can find the dataset at https://raw.githubusercontent.com/cvazquezlos/machine-learning-prework/main/04-matplotlib/assets/titanic_train.csv

In [ ]:

```
```