Python/Pandas

3. Operating on Data in Pandas

njh1008 2025. 6. 19. 00:03

Pandas inherits much of the functionality from Numpy, including ufuncs.
Pandas will automatically align indices when passing the objects to the ufunc.
This means that keeping the context of data and combining data from different sources-both potentially error-prone tasks with raw Numpy arrays- becomes essentially foolproof with Pandas.

Ufuncs: Index Preservation

Because Pandas is designed to work with Numpy, any Numpy ufunc will work on Pandas Series and DataFrame objects

# In[1]
rng=np.random.default_rng(42)
ser=pd.Series(rng.integers(0,10,4))
ser

# Out[1]
0    0
1    7
2    6
3    4
dtype: int64

# In[2]
df=pd.DataFrame(rng.integers(0,10,(3,4)),columns=['A','B','C','D'])
df

# Out[2]
    A    B    C    D
0    4    8    0    6
1    2    0    5    9
2    7    7    7    7

If we apply a Numpy ufunc on either of these objects, the result will be another Pandas object with the indices preserved

# In[3]
np.exp(ser)

# Out[3]
0       1.000000
1    1096.633158
2     403.428793
3      54.598150
dtype: float64

# In[4]
np.sin(df*np.pi/4)

# Out[4]
                A                B            C            D
0     1.224647e-16    -2.449294e-16     0.000000    -1.000000
1     1.000000e+00    0.000000e+00    -0.707107    0.707107
2    -7.071068e-01    -7.071068e-01    -0.707107    -0.707107

Ufuncs: Index Alignment

For binary operation on two Series or DataFrame objects, Pandas will align indices in the process of performing the operation.

Index Alignment in Series

# In[5]
area=pd.Series({'Alaska':172337,'Texas':695662,'California':423967},name='area')
population=pd.Series({'California':39538223,'Texas':29145505,'Florida':21538187},name='population')
population/area

# Out[5]
Alaska              NaN
California    93.257784
Florida             NaN
Texas         41.896072
dtype: float64

# In[6]
area.index.union(population.index)

# Out[6]
Index(['Alaska', 'California', 'Florida', 'Texas'], dtype='object')

Any item for which one or the other does not have an entry is marked with NaN, which is how Pandas marks missing data.
This index matching is implemented this way for any of Python's built-in arithmetic expressions; any missing values are marked by NaN

# In[7]
A=pd.Series([2,4,6],index=[0,1,2])
B=pd.Series([1,3,5],index=[1,2,3])
A+B

# Out[7]
0    NaN
1    5.0
2    9.0
3    NaN
dtype: float64

If using NaN values is not the desired behavior, the fill_value can be modified using appropriate object methods in place of the operators.

# In[8]
A.add(B,fill_value=0)

# Out[8]
0    2.0
1    5.0
2    9.0
3    5.0
dtype: float64

Index Alignment in DataFrames

A similar type of alignment takes place for both columns and indices when performing operations on DataFrame objects.

# In[9]
A=pd.DataFrame(rng.integers(0,20,(2,2)),columns=['a','b'])
A

# Out[9]
     a    b
0    10    2
1    16    9

# In[10]
B=pd.DataFrame(rng.integers(0,10,(3,3)),columns=['b','a','c'])
B

# Out[10]
    b    a    c
0    5    3    1
1    9    7    6
2    4    8    5

# In[11]
A+B

# Out[11]
       a       b      c
0    13.0     7.0    NaN
1    23.0    18.0    NaN
2     NaN     NaN    NaN

Indices are aligned correctly irrespective of their order in the two objects, and indices in the result are sorted.
We can use the associated object's arithmetic methods and pass any desired fill_value to be used in place of missing entries.

# In[12]
A.add(B,fill_value=A.values.mean()) # A.values.mean()=9.25

# Out[12]
        a        b        c
0    13.00     7.00    10.25
1    23.00    18.00    15.25
2    17.25    13.25    14.25

Mapping between Python operators and Pandas methods

Python operator	Pandas methods
+	`add`
-	`sub,subtract`
*	`mul,multiply`
/	`truediv,div,divide`
//	`floordiv`
%	`mod`
**	`pow`

Ufuncs: Operations Between DataFrames and Series

# In[13]
A=rng.integers(10,size=(3,4))
A

# Out[13]
array([[5, 4, 4, 2],
       [0, 5, 8, 0],
       [8, 8, 2, 6]])

# In[14]
A-A[0]

# Out[14]
array([[ 0,  0,  0,  0],
       [-5,  1,  4, -2],
       [ 3,  4, -2,  4]])

According to Numpy's broadcasting rules, subtraction between a two-dimensional array and one of its rows is applied row-wise.
In Pandas, the convention similarly operates row-wise by default.

# In[15]
df=pd.DataFrame(A,columns=['Q','R','S','T'])
df-df.iloc[0]

# Out[15]
     Q     R     S     T
0     0     0     0     0
1    -5     1     4    -2
2     3     4    -2     4

If you would instead like to operate column-wise, you can use the object methods axis

# In[16]
df.subtract(df['R'],axis=0)

# Out[16]
     Q    R    S     T
0     1    0   0    -2
1    -5    0    3    -5
2     0    0  -6    -2

DataFrame/Series operations will automatically align indices between the two elements.

# In[17]
halfrow=df.iloc[0,::2]
halfrow

# Out[17]
Q    5
S    4
Name: 0, dtype: int64

# In[18]
df-halfrow

# Out[18]
       Q      R      S      T
0     0.0    NaN    0.0    NaN
1    -2.0    NaN    5.0    NaN
2     3.0    NaN    7.0    NaN

This preservation and alignment of indices and columns means that operations on data in Pandas will always maintain the data context, which prevents the common errors that might arise when working with heterogeneous and/or misaligned data in raw Numpy arrays.

저작자표시 (새창열림)

'Python > Pandas' 카테고리의 다른 글

6. Combining Datasets: concat and append (2)	2025.06.19
5. Hierarchical Indexing (0)	2025.06.19
4. Handling Missing Data (1)	2025.06.19
2. Data Indexing and Selection (0)	2025.06.19
1. Introducing Pandas Object (0)	2025.06.18

현재글3. Operating on Data in Pandas

노정훈

Today :
Yesterday :

노정훈

3. Operating on Data in Pandas

Ufuncs: Index Preservation

Ufuncs: Index Alignment

Index Alignment in Series

Index Alignment in DataFrames

Ufuncs: Operations Between DataFrames and Series

'Python > Pandas' 카테고리의 다른 글

'Python/Pandas'의 다른글

티스토리툴바

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

3. Operating on Data in Pandas

Ufuncs: Index Preservation

Ufuncs: Index Alignment

Index Alignment in Series

Index Alignment in DataFrames

Ufuncs: Operations Between DataFrames and Series

'Python > Pandas' 카테고리의 다른 글

'Python/Pandas'의 다른글

관련글

티스토리툴바