Multicolumn aggregation in pandas using function over multiple columns (weighted average example)

Here is how to perform multicolumn aggregation over a dataframe with user defined functions that depends on data in other dataframe columns. Code below is an example of weighted average implementation.


Covert datetime to unixtime and back

Very often I need to convert datetime into unix timestamp to pass datetime values into javascript json response. One way to do this using standard library is:


Running Apache Spark step from Python on AWS EMR

Here is how to send Apache Spark step with from Python script with Boto3 on Amazon Elastic Map Reduce cluster. I've recently needed to do this, and I spent some time figuring it out.