Menu:

Events:

No recent events, but stay tuned!

Lazy Fat Pandas and SCIRPy

Pandas is widely used in data science and analytics for its simplicity, but it struggles with large datasets that exceed memory limits. Existing scalable frameworks like Dask, Modin, and Pandas on Spark require users to rewrite their code, making adoption difficult. To address this, we present Lazy Fat Pandas (LaFP)—an optimization framework that enables seamless scalability while preserving the familiar Pandas API. LaFP uses a combination of static program analysis and lazy evaluation to optimize memory usage and execution time. With minimal code modifications, users can leverage multiple backend engines (Pandas, Dask, Modin, and Pandas on Spark). Performance evaluations demonstrate that LaFP not only outperforms Pandas but also delivers significant improvements over direct use of scalable frameworks. LaFP comprises two modules: SCIRPy, a rewriter that applies static optimizations to restruc- ture Pandas programs, and a lazy-evaluation based runtime API. LaFP builds a task graph to represent dataframe operations dynamically, optimizes the task graph at runtime, and then executes the task graph on the chosen backend.

Publications

Talks and Posters

People