๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
study๐Ÿ“š/python

[python/ํŒŒ์ด์ฌ] ํŒ๋‹ค์Šค ํ”„๋กœํŒŒ์ผ๋ง(Pandas-Profiling)

by ์Šค๋‹ 2022. 8. 5.

ํŒ๋‹ค์Šค ํ”„๋กœํŒŒ์ผ๋ง(Pandas Profiling)

์ข‹์€ ๋จธ์‹  ๋Ÿฌ๋‹ ๊ฒฐ๊ณผ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด์„œ ๋ฐ์ดํ„ฐ์˜ ์„ฑ๊ฒฉ์„ ํŒŒ์•…ํ•˜๋Š” ๊ณผ์ •์ด ์„ ํ–‰๋˜์–ด์•ผ ํ•œ๋‹ค. ์ด ๊ณผ์ •์—์„œ ๋ฐ์ดํ„ฐ ๋‚ด ๊ฐ’์˜ ๋ถ„ํฌ, ๋ณ€์ˆ˜ ๊ฐ„์˜ ๊ด€๊ณ„, Null๊ฐ’๊ณผ ๊ฐ™์€ ๊ฒฐ์ธก๊ฐ’(missing values)์กด์žฌ ์œ ๋ฌด ๋“ฑ์„ ํŒŒ์•…ํ•˜๊ฒŒ ๋˜๋Š”๋ฐ ์ด์™€ ๊ฐ™์ด ๋ฐ์ดํ„ฐ๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๊ณผ์ •์„ EDA(Exploratory Data Analysis, ํƒ์ƒ‰์  ๋ฐ์ดํ„ฐ ๋ถ„์„) ์ด๋ผ๊ณ  ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ํƒ์ƒ‰์  ๋ฐ์ดํ„ฐ ๋ถ„์„์— ๋“œ๋Š” ์‹œ๊ฐ„์„ ์ ˆ์•ฝํ•˜๊ธฐ ์œ„ํ•ด ๋ช‡ ์ค„์˜ ์ฝ”๋“œ๋กœ ์—ฌ๋Ÿฌ ๋ถ„์„ ํ†ต๊ณ„๋Ÿ‰์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š” ํ•˜๋Š” ๋ฐ ์ด๋ฅผ ํŒ๋‹ค์Šค ํ”„๋กœํŒŒ์ผ๋ง(Pandas-Profiling) ์ด๋ผ๊ณ  ํ•œ๋‹ค.

  1. pip ๋ช…๋ น์„ ํ†ตํ•ด ํŒจํ‚ค์ง€ ์„ค์น˜

pip install -U pandas-profiling

  1. ๋ฐ์ดํ„ฐ ๋กœ๋“œํ•˜๊ธฐ
import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport
import seaborn as sns
df = sns.load_dataset('iris')
df.head()

  1. ํ”„๋กœํŒŒ์ผ ๋ฆฌํฌํŠธ ์ƒ์„ฑํ•˜๊ธฐ
profile = df.profile_report() # ํ”„๋กœํŒŒ์ผ๋ง ๊ฒฐ๊ณผ ๋ฆฌํฌํŠธ๋ฅผ profile์— ์ €์žฅ
profile # ๋ฆฌํฌํŠธ ํ™•์ธ
# ํ”„๋กœํŒŒ์ผ ๋ฆฌํฌํŠธ๋ฅผ html ํ˜•์‹์œผ๋กœ ์ €์žฅ
profile.to_file('./pr_report.html')

Overview ๋ถ€ํ„ฐ ํ•ด์„œ Variables, Interactions, Correlations, Missing values, Sample ๋“ฑ์„ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ๋‹ค.]

  1. ๋ฆฌํฌํŠธ ์‚ดํŽด๋ณด๊ธฐ

๋Œ“๊ธ€