9. Pivot Tables
2025. 6. 19. 00:16ㆍPython/Pandas
- The pivot table takes simple column-wise data as input, and groups the entries into a two-dimensional table that provides a multidimensional summarization of the data.
- Pivot tables as essentially a multidimensional version of groupby aggregation.
- That is, you split-apply-combine, but both the split and the combine happen accross not a one-dimensional index, but across a two-dimensional grid.
Motivating Pivot Tables
- We'll use the database of passengers on the Titanic, available through the Seaborn library.
# In[1]
titanic=sns.load_dataset('titanic')
titanic.head()
# Out[1]
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
Pivot Tables by Hand
- We might begin by grouping according to sex, survival status, or combination thereof.
# In[2]
titanic.groupby('sex')[['survived']].mean() # make a DataFrame
"""titanic.groupby('sex')['survived'].mean() return Series"""
# Out[2]
survived
sex
female 0.742038
male 0.188908
# In[3]
titanic.groupby(['sex','class'])['survived'].aggregate('mean').unstack()
# Out[3]
class First Second Third
sex
female 0.968085 0.921053 0.500000
male 0.368852 0.157407 0.135447
- The above code gives us a better idea of how both sex and class affected survival, but the code is starting to look a bit garbled(=confused).
- The long string of code is not particularly easy to read or use.
- This two-dimensional groupby is common enough that Pandas includes a convenience routine,
pivot_table
, which succinctly(간단명료한) handles this type of multidimensional aggregation.
Pivot Table Syntax
- Here is the equivalent to the preceding operation using the
DataFrame.pivot_table
method
# In[4]
titanic.pivot_table('survived',index='sex',columns='class',aggfunc='mean')
# Out[4]
class First Second Third
sex
female 0.968085 0.921053 0.500000
male 0.368852 0.157407 0.135447
Multilevel Pivot Tables
- Just as in a groupby, the grouping in pivot tables can be specified with multiple levels and via a number of options.
- We'll bin the age using the
pd.cut
function.
# In[5]
age=pd.cut(titanic['age'],[0,18,80])
titanic.pivot_table('survived',['sex',age],'class')
# Out[5]
class First Second Third
sex age
female (0, 18] 0.909091 1.000000 0.511628
(18, 80] 0.972973 0.900000 0.423729
male (0, 18] 0.800000 0.600000 0.215686
(18, 80] 0.375000 0.071429 0.133663
- We can apply the same strategy when working with the columns as well; let's add info on the fare paid, using
pd.qcut
to automatically compute quantiles.
# In[6]
fare=pd.qcut(titanic['fare'],2)
titanic.pivot_table('survived',['sex',age],[fare,'class'])
# Out[6]
fare (-0.001, 14.454] (14.454, 512.329]
class First Second Third First Second Third
sex age
female (0, 18] NaN 1.000000 0.714286 0.909091 1.000000 0.318182
(18, 80] NaN 0.880000 0.444444 0.972973 0.914286 0.391304
male (0, 18] NaN 0.000000 0.260870 0.800000 0.818182 0.178571
(18, 80] 0.0 0.098039 0.125000 0.391304 0.030303 0.192308
- The result is a four-dimensional aggregation with hierarchical indices, shown in a grid demonstrating the relationship between the values.
If you don't know about the pd.cut
and pd.qcut
, please reference this url :
Difference of pd.cut() and pd.qcut()
Additional Pivot Table Options
- The full call signature of the
DataFrame.pivot_table
method is as follows.
DataFrame.pivot_table(data, values=None, index=None, columns=None,
aggfunc='mean', fill_value=None, margins=False, dropna=True,
margins_name='All', observed=False, sort=True)
fill_value
anddropna
have to do with missing data and are fairly straightforward.
- The
aggfunc
keyword controls what type of aggregation is applied, which is mean by default. - As with groupby, the aggregation specification can be a string representing one of several common choices like
sum
,mean
,count
etc. - Or a function that implements an aggregation like
np.sum()
,min()
,sum()
etc. - Additionally, it can be specified as a dictionary mapping a column to any of the desired options.
# In[7]
titanic.pivot_table(index='sex',columns='class',
aggfunc={'survived':sum,'fare':'mean'})
# Out[7]
fare survived
class First Second Third First Second Third
sex
female 106.125798 21.970121 16.118810 91 70 72
male 67.226127 19.741782 12.661633 45 17 47
- We've omitted the
values
keyword here; when specifying a mapping foraggfunc
, this is determined automatically. - At times it's useful to compute totals along each grouping. This can be done via the
margins
keyword.
# In[8]
titanic.pivot_table('survived',index='sex',columns='class',margins=True)
# Out[8]
class First Second Third All
sex
female 0.968085 0.921053 0.500000 0.742038
male 0.368852 0.157407 0.135447 0.188908
All 0.629630 0.472826 0.242363 0.383838
- The margin label can be specified with the
margins_name
keyword; it defaults to "All"
'Python > Pandas' 카테고리의 다른 글
11. Working with Time Series (0) | 2025.06.19 |
---|---|
10. Vectorized String Operations (0) | 2025.06.19 |
8. Aggregation and Grouping (0) | 2025.06.19 |
7. Combining Datasets: merge and join (0) | 2025.06.19 |
6. Combining Datasets: concat and append (0) | 2025.06.19 |