How to delete multiple pandas (python) dataframes from memory to save RAM?

PythonMemory ManagementPandasDataframeRam

Python Problem Overview


I have lot of dataframes created as part of preprocessing. Since I have limited 6GB ram, I want to delete all the unnecessary dataframes from RAM to avoid running out of memory when running GRIDSEARCHCV in scikit-learn.

  1. Is there a function to list only, all the dataframes currently loaded in memory?

I tried dir() but it gives lot of other object other than dataframes.

  1. I created a list of dataframes to delete

    del_df=[Gender_dummies, capsule_trans, col, concat_df_list, coup_CAPSULE_dummies] & ran

    for i in del_df: del (i) But its not deleting the dataframes. But deleting dataframes individially like below is deleting dataframe from memory.

    del Gender_dummies del col

Python Solutions


Solution 1 - Python

del statement does not delete an instance, it merely deletes a name.

When you do del i, you are deleting just the name i - but the instance is still bound to some other name, so it won't be Garbage-Collected.

If you want to release memory, your dataframes has to be Garbage-Collected, i.e. delete all references to them.

If you created your dateframes dynamically to list, then removing that list will trigger Garbage Collection.

>>> lst = [pd.DataFrame(), pd.DataFrame(), pd.DataFrame()]
>>> del lst     # memory is released

If you created some variables, you have to delete them all.

>>> a, b, c = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
>>> lst = [a, b, c]
>>> del a, b, c # dfs still in list
>>> del lst     # memory release now

Solution 2 - Python

In python automatic garbage collection deallocates the variable (pandas DataFrame are also just another object in terms of python). There are different garbage collection strategies that can be tweaked (requires significant learning).

You can manually trigger the garbage collection using

import gc
gc.collect()

But frequent calls to garbage collection is discouraged as it is a costly operation and may affect performance.

Reference

Solution 3 - Python

This will delete the dataframe and will release the RAM/memory

del [[df_1,df_2]]
gc.collect()
df_1=pd.DataFrame()
df_2=pd.DataFrame()

the data-frame will be explicitly set to null

in the above statements

Firstly, the self reference of the dataframe is deleted meaning the dataframe is no longer available to python there after all the references of the dataframe is collected by garbage collector (gc.collect()) and then explicitly set all the references to empty dataframe.

more on the working of garbage collector is well explained in https://stackify.com/python-garbage-collection/

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGeorgeOfTheRFView Question on Stackoverflow
Solution 1 - PythonpacholikView Answer on Stackoverflow
Solution 2 - PythonshanmugaView Answer on Stackoverflow
Solution 3 - PythonhardiView Answer on Stackoverflow