Python/Pandas

1. Introducing Pandas Object

njh1008 2025. 6. 18. 23:58

There are three fundamental Pandas structures : Series, DataFrame, and Index

# In[1]
import numpy as np 
import pandas as pd

Pandas Series Object

A Pandas Series is a one-dimensional array of indexed data.

# In[2]
data=pd.Series([0.25,0.5,0.75,1.0])
data

# Out[2]
0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

Series combines a sequence of values with an explicit sequence of indices, which we can access with the values and index attributes

# In[3]
print(data.values)
print(data.index)

# Out[3]
[0.25 0.5  0.75 1.  ]
RangeIndex(start=0, stop=4, step=1)

Like with a Numpy array, data can be accessed by the associated index via the familiar Python square-bracket.

# In[4]
print(data[1])
print(data[1:3])

# Out[4]
0.5
1    0.50
2    0.75
dtype: float64

Pandas Series is much more general and flexible than the one-dimensional Numpy array that is emulates.

Series as Generalized Numpy array

Numpy array has an implicitly defined integer index used to access the values
Pandas Series has an explicitly defined index associated with the values.
This explicit index definition gives the Series object additional capabilities.

# In[5]
data=pd.Series([0.25,0.5,0.75,1.0],index=['b','a','d','c'])
data

# Out[5]
b    0.25
a    0.50
d    0.75
c    1.00
dtype: float64

# In[6]
data['b']

# Out[6]
0.25

Series as Specialized Dictionary

A dictionary is a structure that maps arbitrary keys to a set of arbitrary values
Series is a structure that maps types keys to set of types values.
The type-specific compiled code behind a Numpy array makes it more efficient than a Python list for certain operations, the type information of a Pandas Series makes it more efficient than Python dictionaries for certain operations.

# In[7]
population_dict={'California':39538223,'Texas':29145505,'Florida':21538187,'New York':20201249,'Pennsylvania':13002700}
population=pd.Series(population_dict)
population

# Out[7]
California      39538223
Texas           29145505
Florida         21538187
New York        20201249
Pennsylvania    13002700
dtype: int64

# In[8]
population['California']

# Out[8]
39538223

Unlike a dictionary, though, the Series also supports array-style operations such as slicing.

# In[9]
population['California':'Florida']

# Out[9]
California    39538223
Texas         29145505
Florida       21538187
dtype: int64

Constructing Series Objects

Pandas Series following pd.Series(data,index=index)
index is an optional argument, and data can be one of may entities
data can be a list or Numpy array like this

# In[10]
pd.Series([2,4,6])

# Out[10]
0    2
1    4
2    6
dtype: int64

-data can be a scalar, which is repeated to fill the specified index

# In[11]
pd.Series(5,index=[100,200,300])

# Out[11]
100    5
200    5
300    5
dtype: int64

Or it can be a dictionary, in which case index defaults to the dictionary keys

# In[12]
pd.Series({2:'a',1:'b',3:'c'})

# Out[12]
2    a
1    b
3    c
dtype: object

The index can be explicitly set to control the order or the subset of keys used.

# In[13]
pd.Series({2:'a',1:'b',3:'c'},index=[1,2])

# Out[13]
1    b
2    a
dtype: object

Pandas DataFrame Object

DataFrame as Generalized Numpy Array

If a Series is an analog of a one-dimensional array with explicit indices, a DataFrame is an analog of a two-dimensional array with explicit row and column indices.

# In[14]
area_dict={'California':423967,'Texas':695662,'Florida':170312,'New York':141297,'Pennsylvania':119280}
area=pd.Series(area_dict)
area

# Out[14]
California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
dtype: int64

# In[15]
states=pd.DataFrame({'population':population,'area':area})
states

# Out[15]
              population      area
California        39538223    423967
Texas            29145505    695662
Florida            21538187    170312
New York        20201249    141297
Pennsylvania    13002700    119280

Like Series object, the DataFrame has an index attribute that gives access to the index labels.

# In[16]
states.index

# Out[16]
Index(['California', 'Texas', 'Florida', 'New York', 'Pennsylvania'], dtype='object')

Additionally, the DataFrame has a columns attribute, which is an Index object holding the column labels.

# In[17]
states.columns

# Out[17]
Index(['population', 'area'], dtype='object')

DataFrame as Specialized Dictionary

Where a dictionary maps a key to a value, a DataFrame maps a column name to a Series of column data.

# In[18]
states['area']

# Out[18]
California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
Name: area, dtype: int64

Constructing DataFrame Object

From a single Series object

A DataFrame is a collection of Series objects, and a single-column DataFrame can be constructed from a single Series.

# In[19]
pd.DataFrame(population,columns=['population'])

# Out[19]
              population
California        39538223
Texas            29145505
Florida            21538187
New York        20201249
Pennsylvania    13002700

From a list of dicts

# In[20]
data=[{'a':i,'b':2*i} for i in range(3)]
pd.DataFrame(data)

# Out[20]
    a    b
0    0    0
1    1    2
2    2    4

If some keys in the dictionary are missing, Pandas will fill them in with NaN(Not a Number) values.

# In[21]
pd.DataFrame([{'a':1,'b':2},{'b':3,'c':4}])

# Out[21]
      a    b      c
0    1.0    2    NaN
1    NaN    3    4.0

From a dictionary of Series objects

A DataFrame can be constructed from a dictionary of Series object
We saw this before. Please refer # In[15]

From a two-dimensional Numpy array

Given a two-dimensional array of data, we can create a DataFrame with any specified column and index names.
If omitted, an integer index will be used for each.

# In[22]
pd.DataFrame(np.random.rand(3,2),columns=['foo','bar'],index=['a','b','c'])

# Out[22]
         foo         bar
a    0.466496    0.888614
b    0.228347    0.613272
c    0.912784    0.961023

From a Numpy structured array

A Pandas DataFrame operates much like a structured array, and can be created directly from one.

# In[23]
A=np.zeros(3,dtype=[('A','i8'),('B','f8')])
A

# Out[23]
array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('A', '<i8'), ('B', '<f8')])

# In[24]
pd.DataFrame(A)

# Out[24]
    A      B
0    0    0.0
1    0    0.0
2    0    0.0

Pandas Index Object

The Series and DataFrame objects both contain an explicit index that let you reference and modify data.
Index object is an interesting structure in itself, and it can be thought of either as an immutable array or as an ordered set.

# In[25]
ind=pd.Index([2,3,5,7,11])
ind

# Out[25]
Int64Index([2, 3, 5, 7, 11], dtype='int64')

Index as Immutable array

The Index in many ways operates like an array.

# In[26]
print(ind[1])
print(ind[::2])
print(ind.size, ind.shape, ind.ndim, ind.dtype)

# Out[26]
3
Int64Index([2, 5, 11], dtype='int64')
5 (5,) 1 int64

One difference between Index objects and Numpy arrays is that the indices are immutable.
That is, they cannot be modified via the normal means.

Index as Ordered Set

The Index object follows many of the conventions used by Python's built-in set data structure, so that unions, intersections, differences, and other combinations can be computed in a familiar way.

# In[27]
indA=pd.Index([1,3,5,7,9])
indB=pd.Index([2,3,5,7,11])

# In[28]
print(indA.intersection(indB))
print(indA.union(indB))
print(indA.symmetric_difference(indB))

# Out[28]
Int64Index([3, 5, 7], dtype='int64')
Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')
Int64Index([1, 2, 9, 11], dtype='int64')

저작자표시 (새창열림)

'Python > Pandas' 카테고리의 다른 글

6. Combining Datasets: concat and append (2)	2025.06.19
5. Hierarchical Indexing (0)	2025.06.19
4. Handling Missing Data (1)	2025.06.19
3. Operating on Data in Pandas (0)	2025.06.19
2. Data Indexing and Selection (0)	2025.06.19

현재글1. Introducing Pandas Object

노정훈

Today :
Yesterday :

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

노정훈

1. Introducing Pandas Object

Pandas Series Object

Series as Generalized Numpy array

Series as Specialized Dictionary

Constructing Series Objects

Pandas DataFrame Object

DataFrame as Generalized Numpy Array

DataFrame as Specialized Dictionary

Constructing DataFrame Object

From a single Series object

From a list of dicts

From a dictionary of Series objects

From a two-dimensional Numpy array

From a Numpy structured array

Pandas Index Object

Index as Immutable array

Index as Ordered Set

'Python > Pandas' 카테고리의 다른 글

'Python/Pandas'의 다른글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

2025. 08
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

1. Introducing Pandas Object

Pandas Series Object

Series as Generalized Numpy array

Series as Specialized Dictionary

Constructing Series Objects

Pandas DataFrame Object

DataFrame as Generalized Numpy Array

DataFrame as Specialized Dictionary

Constructing DataFrame Object

From a single Series object

From a list of dicts

From a dictionary of Series objects

From a two-dimensional Numpy array

From a Numpy structured array

Pandas Index Object

Index as Immutable array

Index as Ordered Set

'Python > Pandas' 카테고리의 다른 글

'Python/Pandas'의 다른글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역