Let say we have the following dataframe

```
In [149]: df
Out[149]:
revenue profit
Level1 Level2 Company
g1 sg1 1st 10 5
2nd 11 3
sg2 3rd 14 7
g2 sg3 3rd 14 7
1st 10 5
g3 NaN 2nd 11 3
3rd 14 7
```

In case we want to calculate weighted average it is quite straiforward with apply function:
```
In [153]: def w_avg(group):
.....: d = group['profit']
.....: w = group['revenue']
.....: return (d * w).sum() / w.sum()
.....:
In [154]: df.groupby(level='Level2').apply(w_avg)
Out[154]:
Level2
sg1 3
sg2 7
sg3 6
dtype: int64
```

ok simple, but what if we want to make multi aggregation - aggregate each column with each own funtion. The way to do this in pandas is using agg function like this:
```
In [155]: df.groupby(level='Level2').agg({'profit': np.mean, 'revenue': np.sum})
Out[155]:
profit revenue
Level2
sg1 4 21
sg2 7 14
sg3 6 24
```

What if we want to pass our weighted average as aggregation function to one of the columns? Here is a way to do this:
```
In [156]: import functools
In [157]: def wavg(g, df, weight_series):
.....: w = df.ix[g.index][weight_series]
.....: return (g * w).sum() / w.sum()
.....:
In [160]: fnc = functools.partial(wavg, df=df, weight_series='revenue')
In [161]: df.groupby(level='Level2').agg({'revenue': np.sum, 'profit': fnc})
Out[161]:
profit revenue
Level2
sg1 3 21
sg2 7 14
sg3 6 24
```

Hi.. can you explain a bit more your last screen? Why you can not use directly 'profit': wavg? What functools is used for?

ReplyDeleteThis is because function you pass to agg operates over a single series. We obtain such a function using partial application over wavg. So after [160] we have fnc as a function of a single argument and can pass it to agg. You can avoid using functools if you hardcode your dataframe and a column to weight on into your version of wavg function. The code above is a flexible version :)

DeleteI tried this on a a dataframe with ~ 120000 rows and ~ 10000 groups. And I found that this timed to about 2 mins per loop as compared with a plain `np.mean` restricting to a single `series` object. I commented on [this](https://github.com/pandas-dev/pandas/issues/10030) referencing your post in search of a faster answer. Thanks.

ReplyDelete