[Pandas] 05. DataFrame에 함수 적용하기

Updated: November 24, 2021

import pandas as pd
import numpy as np

1. DataFrame에 함수 적용하기

DataFrame은 for-loop를 사용할 필요없이 DataFrame에 함수를 적용할 수 있는 메소드를 제공해준다.

df.apply(func,axis=0)

data = np.random.randint(1,100,(6,5))

df = pd.DataFrame(data,index = list('abcdef'), columns=['col ' + str(i) for i in range(1,6)])

df

	col 1	col 2	col 3	col 4	col 5
a	31	83	63	98	18
b	24	96	35	30	99
c	80	63	15	90	12
d	81	71	20	25	29
e	20	98	25	88	20
f	31	7	60	39	31

df.apply(np.cumprod)

	col 1	col 2	col 3	col 4	col 5
a	31	83	63	98	18
b	744	7968	2205	2940	1782
c	59520	501984	33075	264600	21384
d	4821120	35640864	661500	6615000	620136
e	96422400	3492804672	16537500	582120000	12402720
f	2989094400	24449632704	992250000	22702680000	384484320

df.apply(np.cumprod,axis=1)

	col 1	col 2	col 3	col 4	col 5
a	31	2573	162099	15885702	285942636
b	24	2304	80640	2419200	239500800
c	80	5040	75600	6804000	81648000
d	81	5751	115020	2875500	83389500
e	20	1960	49000	4312000	86240000
f	31	217	13020	507780	15741180

df.apply(lambda x: x.mean() - x.std())

col 1    16.295390
col 2    36.059128
col 3    15.725891
col 4    27.969980
col 5     2.612740
dtype: float64

DataFrame말고 특정 column에 함수를 적용할 수 있다.

df[col_name].apply(func,axis=0)과 같이 사용하면되고 나머지 방법은 DataFrame에서 사용하는 apply와 방법이 똑같다.

data = np.random.rand(6,5) * 20

df = pd.DataFrame(data,index = list('abcdef'), columns=['col ' + str(i) for i in range(1,6)])

df

	col 1	col 2	col 3	col 4	col 5
a	14.053738	4.185958	8.462147	9.553656	12.509743
b	18.348219	19.926258	11.843050	17.556651	7.280544
c	19.710057	7.822683	10.415486	9.005349	11.622472
d	4.833044	2.655133	16.890674	6.622639	17.183202
e	1.491357	5.968906	7.721816	9.442697	6.192048
f	10.398943	17.934551	17.050292	5.622842	10.727492

df['col 1'].apply(np.trunc)

a    14.0
b    18.0
c    19.0
d     4.0
e     1.0
f    10.0
Name: col 1, dtype: float64

Series의 map 메소드는 Series의 각 value를 다른 value로 바꾸거나 각 value에 함수를 적용할 때 사용된다.

Series에만 적용된다는 것에 주의해야한다. 따라서 DataFrame에 적용할 수는 없고 column에 적용할 수 있다.

Series.map(arg,na_action=None)

s = pd.Series(["dog","cat","eagle","rabbit"])

s

     dog
     cat
   eagle
  rabbit
dtype: object

s.map({animal: animal+"s" for animal in s})

     dogs
     cats
   eagles
  rabbits
dtype: object

s = pd.Series(np.random.uniform(10,50,(5,)))

s

  25.981592
  20.995231
  35.596201
  43.628246
  24.256641
dtype: float64

s.map(np.trunc)

  25.0
  20.0
  35.0
  43.0
  24.0
dtype: float64

apply가 각 row별로 또는 column별로 적용이 된다면 applymap은 원소별로 적용이 된다.

특정 함수에 따라서 apply와 applymap이 비슷하게 동작할 수는 있으나 둘은 작동방식에 있어 엄연히 차이가 난다는 것을 기억하자

df.applymap(func,na_action=None)

data = np.random.rand(6,5) * 20

df = pd.DataFrame(data,index = list('abcdef'), columns=['col ' + str(i) for i in range(1,6)])

df

	col 1	col 2	col 3	col 4	col 5
a	5.357879	9.412591	13.617003	12.760888	0.950344
b	1.235938	6.882730	4.173534	18.967401	5.843447
c	0.450929	7.948630	4.506180	16.463640	3.449097
d	13.320694	18.387282	3.808073	1.867454	13.546942
e	1.406042	6.796554	9.490889	7.089272	5.998784
f	19.606024	0.927500	12.849440	16.924449	1.271375

df.applymap(np.trunc)

	col 1	col 2	col 3	col 4	col 5
a	5.0	9.0	13.0	12.0	0.0
b	1.0	6.0	4.0	18.0	5.0
c	0.0	7.0	4.0	16.0	3.0
d	13.0	18.0	3.0	1.0	13.0
e	1.0	6.0	9.0	7.0	5.0
f	19.0	0.0	12.0	16.0	1.0

	col 1	col 2	col 3	col 4	col 5
a	31	83	63	98	18
b	24	96	35	30	99
c	80	63	15	90	12
d	81	71	20	25	29
e	20	98	25	88	20
f	31	7	60	39	31

	col 1	col 2	col 3	col 4	col 5
a	31	83	63	98	18
b	24	96	35	30	99
c	80	63	15	90	12
d	81	71	20	25	29
e	20	98	25	88	20
f	31	7	60	39	31

	col 1	col 2	col 3	col 4	col 5
a	31	83	63	98	18
b	24	96	35	30	99
c	80	63	15	90	12
d	81	71	20	25	29
e	20	98	25	88	20
f	31	7	60	39	31