12. Visualization with Seaborn
2025. 6. 20. 22:55ㆍPython/Matplotlib
- Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas.
- By convention, Seaborn is often imported as
sns
# In[1]
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
sns.set() # seaborn's method to set its chart style
Exploring Seaborn Plots
- The main idea of Seaborn is that it provides high-level commands to create a variety of plot types useful for statistical data exploration, and even some statistical model fitting.
- All of the following could be done using raw Matplotlib commands, but the Seaborn API is much more convenient.
Histograms, KDE, and Densities
- Often in statistical data visualization, all you want is to plot histograms and joint distributions of variables.
# In[2]
data=np.random.multivariate_normal([0,0],[[5,2],[2,2]],size=2000)
data=pd.DataFrame(data,columns=['x','y'])
for col in 'xy':
plt.hist(data[col],density=True,alpha=0.5)
- Rather than just providing a histogram as a visual output, we can get a smooth estimate of the distribution using kernel density estimation, which Seaborn does with
sns.kdeplot
.
# In[3]
sns.kdeplot(data=data,shade=True);
- If we pass
x
andy
columns tokdeplot
, we instead get a two-dimensional visualization of the joint density.
# In[4]
sns.kdeplot(data=data,x='x',y='y');
- We can see the joint distribution and the marginal distribution together using
sns.jointplot
, which we'll explore further later in this chapter.
Pair Plots
- When you generalize joint plots to datasets of larger dimensions, you end up with pair plots.
- These are very useful for exploring correlations between multidimensional data, when you'd like to plot all pairs of values against each other.
- We'll demo this with the Iris dataset, which lists measurements of petals and sepals of three Iris species.
# In[5]
iris=sns.load_dataset('iris')
iris.head()
# Out[5]
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
- Visualizing the multidimensional relationships among the samples is as easy as calling
sns.pairplot
.
# In[6]
sns.pairplot(iris,hue='species',height=2.5);
Faceted Histograms
- Sometimes the best way to view data is via histograms of subsets.
- Seaborn's
FaceGrid
makes this simple. We'll take a look at some data that shows the amount that restaurant staff receive in tips based on various indicator data.
# In[7]
tips=sns.load_dataset('tips')
tips.head()
# Out[7]
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
# In[8]
tips['tip_pct']=100 * tips['tip'] / tips['total_bill']
grid=sns.FacetGrid(tips,row='sex',col='time',margin_titles=True)
grid.map(plt.hist,"tip_pct",bins=np.linspace(0,40,15));
- The faceted chart give us some quick insights into the dataset: for example, we see that it contains far more data on male servers during the dinner hour than other categories, and typical tip amounts appear to range from approximately 10% to 20%, with some outliers on either end.
Categorical Plots
- Categorical plots can be useful for this kind of visualization as well.
- These allow you to view the distribution of a parameter within bins defined by any other parameter.
# In[9]
with sns.axes_style(style='ticks'):
g=sns.catplot(x='day',y='total_bill',hue='sex',
data=tips,kind='box')
g.set_axis_labels("Day","Total Bill");
Joint Distributions
- Similar to the pair plot we saw earlier, we can use
sns.jointplot
to show the joint distribution between different datasets, along with the associated marginal distributions.
# In[10]
with sns.axes_style('white'):
sns.jointplot(x='total_bill',y='tip',data=tips,kind='hex')
- The joint plot can even do some automatic kernel density estimation and regression.
# In[11]
sns.jointplot(x='total_bill',y='tip',data=tips,kind='reg');
Bar Plots
- Time series can be plotted using
sns.factorplot
. - We'll use the Planets dataset.
# In[12]
planets=sns.load_dataset('planets')
planets.head()
# Out[12]
method number orbital_period mass distance year
0 Radial Velocity 1 269.300 7.10 77.40 2006
1 Radial Velocity 1 874.774 2.21 56.95 2008
2 Radial Velocity 1 763.000 2.60 19.84 2011
3 Radial Velocity 1 326.030 19.40 110.62 2007
4 Radial Velocity 1 516.220 10.50 119.47 2009
# In[13]
with sns.axes_style('white'):
g=sns.catplot(x='year',data=planets,aspect=2,
kind='count',color='steelblue')
g.set_xticklabels(step=5)
- We can learn more by looking at the method of discovery of each of these planets.
# In[14]
with sns.axes_style('white'):
g=sns.catplot(x='year',data=planets,aspect=4.0,kind='count',
hue='method',order=range(2001,2015))
g.set_ylabels('Number of Planets Discovered')
For more information on plotting with Seaborn, refer to this url :
Seaborn API
'Python > Matplotlib' 카테고리의 다른 글
11. Three-Dimensional Plotting in Matplotlib (0) | 2025.06.20 |
---|---|
10. Customizing Matplotlib: Configurations and Stylesheets (0) | 2025.06.20 |
9. Customizing Ticks (0) | 2025.06.20 |
8. Text and Annotation (0) | 2025.06.20 |
7. Multiple Subplots (0) | 2025.06.20 |