Use [`str.replace`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.replace.html):

    df.columns = df.columns.str.replace(&quot;[()]&quot;, &quot;_&quot;)

Sample:

    df = pd.DataFrame({&#39;(A)&#39;:[1,2,3],
                       &#39;(B)&#39;:[4,5,6],
                       &#39;C)&#39;:[7,8,9]})
    
    print (df)
       (A)  (B)  C)
    0    1    4   7
    1    2    5   8
    2    3    6   9
    
    df.columns = df.columns.str.replace(r&quot;[()]&quot;, &quot;_&quot;)
    print (df)
       _A_  _B_  C_
    0    1    4   7
    1    2    5   8
    2    3    6   9

Older pandas versions don&#39;t work with the accepted answer above. Something like this is needed:

    df.columns = [c.replace(&quot;[()]&quot;, &quot;_&quot;) for c in list(df.columns)]

The square brackets are used to demarcate a range of characters you want extracted. for example:

    r&quot;[Nn]ational&quot;

  
will extract both occurences where we have &quot;National&quot; and &quot;national&quot; i.e it extracts N or n.

Is there any difference between using `typing.Any` as opposed to `object` in typing? For example:

    def get_item(L: list, i: int) -&gt; typing.Any:
        return L[i]

Compared to:

    def get_item(L: list, i: int) -&gt; object:
        return L[i]

typing.Any vs object?

Using IntelliJ IDEA 15, I get these constant and annoying documentation popups whenever my mouse is anywhere in the code window for a decompiled class (from a 3rd party jar). It will popup docs for whatever variable/method/class/anything happens to be near my mouse. If my mouse is not near any lines of code, it will popup for the current classfile, so basically I can&#39;t browse code unless I move my mouse to another window. 

It only happens with decompiled classes, not my normal code. How do I stop these?

How to stop annoying documentation popups in IntelliJ IDEA

I have data frames with column names (coming from .csv files) containing `(` and `)` and I&#39;d like to replace them with `_`.

How can I do that in place for all columns?

Pandas replace a character in all column names

<p>I have data frames with column names (coming from .csv files) containing <code>(</code> and <code>)</code> and I'd like to replace them with <code>_</code>.</p>
<p>How can I do that in place for all columns?</p>


I have 2 `DataFrame`s:

[![Source data][1]][1]

I need union like this:

[![enter image description here][2]][2]

The `unionAll` function doesn&#39;t work because the number and the name of columns are different.

How can I do this?

  [1]: http://i.stack.imgur.com/L4qs0.png
  [2]: http://i.stack.imgur.com/mdICY.png

How to perform union on two DataFrames with different amounts of columns in spark?

I&#39;ve trained 3 models and am now running code that loads each of the 3 checkpoints in sequence and runs predictions using them. I&#39;m using the GPU.

When the first model is loaded it pre-allocates the entire GPU memory (which I want for working through the first batch of data). But it doesn&#39;t unload memory when it&#39;s finished. When the second model is loaded, using both `tf.reset_default_graph()` and `with tf.Graph().as_default()` the GPU memory still is fully consumed from the first model, and the second model is then starved of memory.

Is there a way to resolve this, other than using Python subprocesses or multiprocessing to work around the problem (the only solution I&#39;ve found on via google searches)?

Clearing Tensorflow GPU memory after model execution

I recently discovered pandas [&quot;assign&quot; method][1] which I find very elegant.
My issue is that the name of the new column is assigned as keyword, so it cannot have spaces or dashes in it. 

    df = DataFrame({&#39;A&#39;: range(1, 11), &#39;B&#39;: np.random.randn(10)})
    df.assign(ln_A = lambda x: np.log(x.A))
            A         B      ln_A
    0   1  0.426905  0.000000
    1   2 -0.780949  0.693147
    2   3 -0.418711  1.098612
    3   4 -0.269708  1.386294
    4   5 -0.274002  1.609438
    5   6 -0.500792  1.791759
    6   7  1.649697  1.945910
    7   8 -1.495604  2.079442
    8   9  0.549296  2.197225
    9  10 -0.758542  2.302585

but what if I want to name the new column &quot;ln(A)&quot; for example?
E.g. 

    df.assign(ln(A) = lambda x: np.log(x.A))
    df.assign(&quot;ln(A)&quot; = lambda x: np.log(x.A))

    
    File &quot;&lt;ipython-input-7-de0da86dce68&gt;&quot;, line 1
    df.assign(ln(A) = lambda x: np.log(x.A))
    SyntaxError: keyword can&#39;t be an expression

I know I could rename the column right after the .assign call, but I want to understand more about this method and its syntax.

  [1]: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html

pandas assign with new column name as string

I am working on a project for one of my lectures and I need to download the package psycopg2 in order to work with the postgresql database in use. Unfortunately, when I try to pip install psycopg2 the following error pops up:

    ld: library not found for -lssl
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    error: command &#39;/usr/bin/clang&#39; failed with exit status 1
    ld: library not found for -lssl
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    error: command &#39;/usr/bin/clang&#39; failed with exit status 1
Does anyone know why this is happening? Thanks in advance!

Can&#39;t install psycopg2 package through pip install on MacOS

I want to replace all strings that contain a specific substring. So for example if I have this dataframe:

    import pandas as pd
    df = pd.DataFrame({&#39;name&#39;: [&#39;Bob&#39;, &#39;Jane&#39;, &#39;Alice&#39;], 
                       &#39;sport&#39;: [&#39;tennis&#39;, &#39;football&#39;, &#39;basketball&#39;]})
    

I could replace football with the string &#39;ball sport&#39; like this:
    
    df.replace({&#39;sport&#39;: {&#39;football&#39;: &#39;ball sport&#39;}})

What I want though is to replace everything that contains `ball` (in this case `football` and `basketball`) with &#39;ball sport&#39;. Something like this:
    
    df.replace({&#39;sport&#39;: {&#39;[strings that contain ball]&#39;: &#39;ball sport&#39;}})
   

Replace whole string if it contains substring in pandas

I am querying a SQL database and I want to use pandas to process the data. However, I am not sure how to move the data. Below is my input and output.  

    import pyodbc
    import pandas
    from pandas import DataFrame

    cnxn = pyodbc.connect(r&#39;DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:\users\bartogre\desktop\CorpRentalPivot1.accdb;UID=&quot;&quot;;PWD=&quot;&quot;;&#39;)
    crsr = cnxn.cursor()
    for table_name in crsr.tables(tableType=&#39;TABLE&#39;):
        print(table_name)
    cursor = cnxn.cursor()
    sql = &quot;Select sum(CYTM), sum(PYTM), BRAND From data Group By BRAND&quot;
    cursor.execute(sql)
    for data in cursor.fetchall():
        print (data)

_____________

    (&#39;C:\\users\\bartogre\\desktop\\CorpRentalPivot1.accdb&#39;, None, &#39;Data&#39;, &#39;TABLE&#39;, None)
    (&#39;C:\\users\\bartogre\\desktop\\CorpRentalPivot1.accdb&#39;, None, &#39;SFDB&#39;, &#39;TABLE&#39;, None)
    (Decimal(&#39;78071898.71&#39;), Decimal(&#39;82192672.29&#39;), &#39;A&#39;)
    (Decimal(&#39;12120663.79&#39;), Decimal(&#39;13278814.52&#39;), &#39;B&#39;)



Read data from pyodbc to pandas

I would like to make supervised learning. 

Until now I know to do supervised learning to all features.

However, I would like also to conduct experiment with the K best features.

I read the documentation and found the in Scikit learn there is SelectKBest method.

Unfortunately, I am not sure how to create new dataframe after finding those best features:

Let&#39;s assume I would like to conduct experiment with 5 best features:

    from sklearn.feature_selection import SelectKBest, f_classif
    select_k_best_classifier = SelectKBest(score_func=f_classif, k=5).fit_transform(features_dataframe, targeted_class)

Now if I would add the next line:

    dataframe = pd.DataFrame(select_k_best_classifier)

I will receive a new dataframe without feature names (only index starting from 0 to 4). 

I should replace it to:

    dataframe = pd.DataFrame(fit_transofrmed_features, columns=features_names)


My question is how to create the features_names list??

I know that I should use: 
   

     select_k_best_classifier.get_support()

 
Which returns array of boolean values.

The true value in the array represent the index in the right column.

How should I use this boolean array with the array of all features names I can get via the method:

    feature_names = list(features_dataframe.columns.values)

The easiest way for getting feature names after running SelectKBest in Scikit Learn

I have 2 Data Frames, one named USERS and another named EXCLUDE. Both of them have a field named &quot;email&quot;.

Basically, I want to remove every row in USERS that has an email contained in EXCLUDE.

How can I do it?


Content Type	Original Author	Original Content on Stackoverflow
Question	Cedric H.	View Question on Stackoverflow
Solution 1 - Python	jezrael	View Answer on Stackoverflow
Solution 2 - Python	JamesR	View Answer on Stackoverflow
Solution 3 - Python	agbalutemi	View Answer on Stackoverflow

Pandas replace a character in all column names

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

How to stop annoying documentation popups in IntelliJ IDEA

typing.Any vs object?

Attributions