Published on March 11, 2018 by

PyData New York City 2017

Apache Spark has become a popular and successful way for Python programming to parallelize and scale up data processing. However, it’s not well integrated with popular Python tools such as Pandas, and often result in poor performance when using Pandas with PySpark. In this talk, we will demonstrate how we improve PySpark performance with Apache Arrow.

Category Tag