12. Visualization with Seaborn

2025. 6. 20. 22:55Python/Matplotlib

  • Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas.
  • By convention, Seaborn is often imported as sns
# In[1]
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

sns.set() # seaborn's method to set its chart style

Exploring Seaborn Plots

  • The main idea of Seaborn is that it provides high-level commands to create a variety of plot types useful for statistical data exploration, and even some statistical model fitting.
  • All of the following could be done using raw Matplotlib commands, but the Seaborn API is much more convenient.

Histograms, KDE, and Densities

  • Often in statistical data visualization, all you want is to plot histograms and joint distributions of variables.
# In[2]
data=np.random.multivariate_normal([0,0],[[5,2],[2,2]],size=2000)
data=pd.DataFrame(data,columns=['x','y'])

for col in 'xy':
    plt.hist(data[col],density=True,alpha=0.5)

  • Rather than just providing a histogram as a visual output, we can get a smooth estimate of the distribution using kernel density estimation, which Seaborn does with sns.kdeplot.
# In[3]
sns.kdeplot(data=data,shade=True);

  • If we pass x and y columns to kdeplot, we instead get a two-dimensional visualization of the joint density.
# In[4]
sns.kdeplot(data=data,x='x',y='y');

  • We can see the joint distribution and the marginal distribution together using sns.jointplot, which we'll explore further later in this chapter.

Pair Plots

  • When you generalize joint plots to datasets of larger dimensions, you end up with pair plots.
  • These are very useful for exploring correlations between multidimensional data, when you'd like to plot all pairs of values against each other.
  • We'll demo this with the Iris dataset, which lists measurements of petals and sepals of three Iris species.
# In[5]
iris=sns.load_dataset('iris')
iris.head()
# Out[5]
  sepal_length    sepal_width    petal_length    petal_width    species
0           5.1            3.5             1.4            0.2     setosa
1           4.9            3.0             1.4            0.2     setosa
2           4.7            3.2             1.3            0.2     setosa
3           4.6            3.1             1.5            0.2     setosa
4           5.0            3.6             1.4            0.2     setosa
  • Visualizing the multidimensional relationships among the samples is as easy as calling sns.pairplot.
# In[6]
sns.pairplot(iris,hue='species',height=2.5);

Faceted Histograms

  • Sometimes the best way to view data is via histograms of subsets.
  • Seaborn's FaceGrid makes this simple. We'll take a look at some data that shows the amount that restaurant staff receive in tips based on various indicator data.
# In[7]
tips=sns.load_dataset('tips')
tips.head()
# Out[7]
  total_bill     tip       sex    smoker    day      time    size
0       16.99    1.01    Female        No    Sun    Dinner       2
1       10.34    1.66      Male        No    Sun    Dinner       3
2       21.01    3.50      Male        No    Sun    Dinner       3
3       23.68    3.31      Male        No    Sun    Dinner       2
4       24.59    3.61    Female        No    Sun    Dinner       4

# In[8]
tips['tip_pct']=100 * tips['tip'] / tips['total_bill']

grid=sns.FacetGrid(tips,row='sex',col='time',margin_titles=True)
grid.map(plt.hist,"tip_pct",bins=np.linspace(0,40,15));

  • The faceted chart give us some quick insights into the dataset: for example, we see that it contains far more data on male servers during the dinner hour than other categories, and typical tip amounts appear to range from approximately 10% to 20%, with some outliers on either end.

Categorical Plots

  • Categorical plots can be useful for this kind of visualization as well.
  • These allow you to view the distribution of a parameter within bins defined by any other parameter.
# In[9]
with sns.axes_style(style='ticks'):
    g=sns.catplot(x='day',y='total_bill',hue='sex',
    data=tips,kind='box')
    g.set_axis_labels("Day","Total Bill");

Joint Distributions

  • Similar to the pair plot we saw earlier, we can use sns.jointplot to show the joint distribution between different datasets, along with the associated marginal distributions.
# In[10]
with sns.axes_style('white'):
    sns.jointplot(x='total_bill',y='tip',data=tips,kind='hex')

  • The joint plot can even do some automatic kernel density estimation and regression.
# In[11]
sns.jointplot(x='total_bill',y='tip',data=tips,kind='reg');

Bar Plots

  • Time series can be plotted using sns.factorplot.
  • We'll use the Planets dataset.
# In[12]
planets=sns.load_dataset('planets')
planets.head()
# Out[12]
             method    number    orbital_period     mass    distance    year
0    Radial Velocity         1           269.300     7.10       77.40    2006
1    Radial Velocity         1           874.774     2.21       56.95    2008
2    Radial Velocity         1           763.000     2.60       19.84    2011
3    Radial Velocity         1           326.030    19.40      110.62    2007
4    Radial Velocity         1           516.220    10.50      119.47    2009

# In[13]
with sns.axes_style('white'):
    g=sns.catplot(x='year',data=planets,aspect=2,
    kind='count',color='steelblue')
    g.set_xticklabels(step=5)

  • We can learn more by looking at the method of discovery of each of these planets.
# In[14]
with sns.axes_style('white'):
    g=sns.catplot(x='year',data=planets,aspect=4.0,kind='count',
    hue='method',order=range(2001,2015))
    g.set_ylabels('Number of Planets Discovered')


For more information on plotting with Seaborn, refer to this url :
Seaborn API

'Python > Matplotlib' 카테고리의 다른 글

11. Three-Dimensional Plotting in Matplotlib  (0) 2025.06.20
10. Customizing Matplotlib: Configurations and Stylesheets  (0) 2025.06.20
9. Customizing Ticks  (0) 2025.06.20
8. Text and Annotation  (0) 2025.06.20
7. Multiple Subplots  (0) 2025.06.20