9. Pivot Tables

2025. 6. 19. 00:16ㆍPython/Pandas

The pivot table takes simple column-wise data as input, and groups the entries into a two-dimensional table that provides a multidimensional summarization of the data.
Pivot tables as essentially a multidimensional version of groupby aggregation.
That is, you split-apply-combine, but both the split and the combine happen accross not a one-dimensional index, but across a two-dimensional grid.

Motivating Pivot Tables

We'll use the database of passengers on the Titanic, available through the Seaborn library.

# In[1]
titanic=sns.load_dataset('titanic')
titanic.head()

# Out[1]
  survived    pclass       sex     age    sibsp    parch       fare    embarked    class      who    adult_male    deck    embark_town    alive    alone
0         0         3      male    22.0        1        0     7.2500           S    Third      man          True     NaN    Southampton       no    False
1         1         1    female    38.0        1        0    71.2833           C    First    woman         False       C      Cherbourg      yes    False
2         1         3    female    26.0        0        0     7.9250           S    Third    woman         False     NaN    Southampton      yes     True
3         1         1    female    35.0        1        0    53.1000           S    First    woman         False       C    Southampton      yes    False
4         0         3      male    35.0        0        0     8.0500           S    Third      man          True     NaN    Southampton       no     True

Pivot Tables by Hand

We might begin by grouping according to sex, survival status, or combination thereof.

# In[2]
titanic.groupby('sex')[['survived']].mean() # make a DataFrame
"""titanic.groupby('sex')['survived'].mean() return Series"""

# Out[2]
        survived
sex    
female    0.742038
male    0.188908

# In[3]
titanic.groupby(['sex','class'])['survived'].aggregate('mean').unstack()

# Out[3]
class    First        Second        Third
sex            
female    0.968085    0.921053    0.500000
male    0.368852    0.157407    0.135447

The above code gives us a better idea of how both sex and class affected survival, but the code is starting to look a bit garbled(=confused).
The long string of code is not particularly easy to read or use.
This two-dimensional groupby is common enough that Pandas includes a convenience routine, pivot_table, which succinctly(간단명료한) handles this type of multidimensional aggregation.

Pivot Table Syntax

Here is the equivalent to the preceding operation using the DataFrame.pivot_table method

# In[4]
titanic.pivot_table('survived',index='sex',columns='class',aggfunc='mean')

# Out[4]
class    First        Second        Third
sex            
female    0.968085    0.921053    0.500000
male    0.368852    0.157407    0.135447

Multilevel Pivot Tables

Just as in a groupby, the grouping in pivot tables can be specified with multiple levels and via a number of options.
We'll bin the age using the pd.cut function.

# In[5]
age=pd.cut(titanic['age'],[0,18,80])
titanic.pivot_table('survived',['sex',age],'class')

# Out[5]
class            First        Second        Third
sex       age            
female    (0, 18]    0.909091    1.000000    0.511628
       (18, 80]    0.972973    0.900000    0.423729
male    (0, 18]    0.800000    0.600000    0.215686
       (18, 80]    0.375000    0.071429    0.133663

We can apply the same strategy when working with the columns as well; let's add info on the fare paid, using pd.qcut to automatically compute quantiles.

# In[6]
fare=pd.qcut(titanic['fare'],2)
titanic.pivot_table('survived',['sex',age],[fare,'class'])

# Out[6]
fare           (-0.001, 14.454]                      (14.454, 512.329]
class           First      Second       Third       First      Second       Third
sex        age                        
female    (0, 18]     NaN    1.000000    0.714286    0.909091    1.000000    0.318182
       (18, 80]     NaN    0.880000    0.444444    0.972973    0.914286    0.391304
male    (0, 18]     NaN    0.000000    0.260870    0.800000    0.818182    0.178571
       (18, 80]     0.0    0.098039    0.125000    0.391304    0.030303    0.192308

The result is a four-dimensional aggregation with hierarchical indices, shown in a grid demonstrating the relationship between the values.

If you don't know about the pd.cut and pd.qcut, please reference this url :
Difference of pd.cut() and pd.qcut()

Additional Pivot Table Options

The full call signature of the DataFrame.pivot_table method is as follows.

DataFrame.pivot_table(data, values=None, index=None, columns=None, 
aggfunc='mean', fill_value=None, margins=False, dropna=True, 
margins_name='All', observed=False, sort=True)

fill_value and dropna have to do with missing data and are fairly straightforward.

The aggfunc keyword controls what type of aggregation is applied, which is mean by default.
As with groupby, the aggregation specification can be a string representing one of several common choices like sum , mean, count etc.
Or a function that implements an aggregation like np.sum(), min(), sum() etc.
Additionally, it can be specified as a dictionary mapping a column to any of the desired options.

# In[7]
titanic.pivot_table(index='sex',columns='class',
aggfunc={'survived':sum,'fare':'mean'})

# Out[7]
        fare                                survived
class    First        Second        Third        First    Second    Third
sex                        
female    106.125798    21.970121    16.118810    91        70        72
male     67.226127    19.741782    12.661633    45        17        47

We've omitted the values keyword here; when specifying a mapping for aggfunc, this is determined automatically.
At times it's useful to compute totals along each grouping. This can be done via the margins keyword.

# In[8]
titanic.pivot_table('survived',index='sex',columns='class',margins=True)

# Out[8]
class    First        Second        Third        All
sex                
female    0.968085    0.921053    0.500000    0.742038
male    0.368852    0.157407    0.135447    0.188908
All        0.629630    0.472826    0.242363    0.383838

The margin label can be specified with the margins_name keyword; it defaults to "All"

저작자표시 (새창열림)

'Python > Pandas' 카테고리의 다른 글

11. Working with Time Series (0)	2025.06.19
10. Vectorized String Operations (0)	2025.06.19
8. Aggregation and Grouping (0)	2025.06.19
7. Combining Datasets: merge and join (0)	2025.06.19
6. Combining Datasets: concat and append (0)	2025.06.19

노정훈

노정훈

태그

최근글

댓글

공지사항

아카이브

Motivating Pivot Tables

Pivot Tables by Hand

Pivot Table Syntax

Multilevel Pivot Tables

Additional Pivot Table Options

'Python > Pandas' 카테고리의 다른 글

관련글

티스토리툴바