๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
study๐Ÿ“š/python

[python/ํŒŒ์ด์ฌ] ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ - ํƒ€์ž… ๋ณ€ํ™˜ dtype, astype(), to_datetime()

by ์Šค๋‹ 2022. 7. 26.

ํƒ€์ž… ๋ณ€ํ™˜

  • ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
import pandas as pd

df = pd.DataFrame({'ํŒ๋งค์ผ' : ['5/11/21', '5/12/21', '5/13/21', '5/14/21', '5/15/21'],
                   'ํŒ๋งค๋Ÿ‰' : ['10', '15', '20', '25', '30'], '๋ฐฉ๋ฌธ์ž์ˆ˜' : ['10', '-', '17', '23', '25'], 
                   '๊ธฐ์˜จ' : ['24.1', '24.3', '24.8', '25', '25.4']})
df

  • dtype : ๋ฐ์ดํ„ฐ ํƒ€์ž… ํ™•์ธ
df.dtypes

ํŒ๋งค์ผ object
ํŒ๋งค๋Ÿ‰ object
๋ฐฉ๋ฌธ์ž์ˆ˜ object
๊ธฐ์˜จ object
dtype: object

df['ํŒ๋งค๋Ÿ‰ ๋ณด์ •'] = df['ํŒ๋งค๋Ÿ‰'] + 1

  • astype(ํƒ€์ž…) : ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํƒ€์ž… ์ „์ฒด ํ•œ๊บผ๋ฒˆ์— ๋ฐ”๊พธ๊ธฐ
  • astype({'column' : 'type'}) : ์›ํ•˜๋Š” ์ปฌ๋Ÿผ๋งŒ ํƒ€์ž… ๋ฐ”๊พธ๊ธฐ

-๋ฌธ์ œ : ํŒ๋งค๋Ÿ‰์„ ์ •์ˆ˜ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ

df.astype({'ํŒ๋งค๋Ÿ‰' : 'int'})

df.dtypes

ํŒ๋งค์ผ object
ํŒ๋งค๋Ÿ‰ object
๋ฐฉ๋ฌธ์ž์ˆ˜ object
๊ธฐ์˜จ object
dtype: object

df = df.astype({'ํŒ๋งค๋Ÿ‰' : 'int'})
df.dtypes

ํŒ๋งค์ผ object
ํŒ๋งค๋Ÿ‰ int64
๋ฐฉ๋ฌธ์ž์ˆ˜ object
๊ธฐ์˜จ object
dtype: object

df['ํŒ๋งค๋Ÿ‰ ๋ณด์ •'] = df['ํŒ๋งค๋Ÿ‰'] + 1
df

-๋ฌธ์ œ : ๋ฐฉ๋ฌธ์ž์ˆ˜๋ฅผ ์ˆซ์ž ํƒ€์ž…์œผ๋กœ ๋ณ€ํ˜•ํ•˜๊ธฐ

df.astype({'๋ฐฉ๋ฌธ์ž์ˆ˜' : 'int'})

pd.to_numeric(df['๋ฐฉ๋ฌธ์ž์ˆ˜'])

pd.to_numeric(df['๋ฐฉ๋ฌธ์ž์ˆ˜'], errors = 'coerce')

0 10.0
1 NaN
2 17.0
3 23.0
4 25.0
Name: ๋ฐฉ๋ฌธ์ž์ˆ˜, dtype: float64

df.dtypes

ํŒ๋งค์ผ object
ํŒ๋งค๋Ÿ‰ int64
๋ฐฉ๋ฌธ์ž์ˆ˜ object
๊ธฐ์˜จ object
ํŒ๋งค๋Ÿ‰ ๋ณด์ • int64
dtype: object

df['๋ฐฉ๋ฌธ์ž์ˆ˜'] = pd.to_numeric(df['๋ฐฉ๋ฌธ์ž์ˆ˜'], errors = 'coerce')
df.dtypes

ํŒ๋งค์ผ object
ํŒ๋งค๋Ÿ‰ int64
๋ฐฉ๋ฌธ์ž์ˆ˜ float64
๊ธฐ์˜จ object
ํŒ๋งค๋Ÿ‰ ๋ณด์ • int64
dtype: object

df

df = df.astype({'๋ฐฉ๋ฌธ์ž์ˆ˜' : 'int'})

df.fillna(0, inplace = True)
df

df = df.astype({'๋ฐฉ๋ฌธ์ž์ˆ˜' : 'int'})
df.dtypes

ํŒ๋งค์ผ object
ํŒ๋งค๋Ÿ‰ int64
๋ฐฉ๋ฌธ์ž์ˆ˜ int64
๊ธฐ์˜จ object
ํŒ๋งค๋Ÿ‰ ๋ณด์ • int64
dtype: object

df

  • to_datetime(param, format="") : ์ฃผ์–ด์ง„ ์ธ์ˆ˜๋ฅผ datetime ์œผ๋กœ ๋ณ€ํ™˜

-๋ฌธ์ œ : ํŒ๋งค์ผ์„ datetime ์˜ ํ˜•ํƒœ๋กœ ๋ฐ”๊พธ๊ธฐ

df['ํŒ๋งค์ผ'] = pd.to_datetime(df['ํŒ๋งค์ผ'], format="%m/%d/%y")
df

df.dtypes

ํŒ๋งค์ผ  datetime64[ns]
ํŒ๋งค๋Ÿ‰  int64
๋ฐฉ๋ฌธ์ž์ˆ˜  int64
๊ธฐ์˜จ  object
ํŒ๋งค๋Ÿ‰ ๋ณด์ • int64
dtype:  object

๋Œ“๊ธ€