You can use `replace` and pass the strings to find/replace as dictionary keys/items:

    df.replace({&#39;\n&#39;: &#39;&lt;br&gt;&#39;}, regex=True)

For example:

    &gt;&gt;&gt; df = pd.DataFrame({&#39;a&#39;: [&#39;1\n&#39;, &#39;2\n&#39;, &#39;3&#39;], &#39;b&#39;: [&#39;4\n&#39;, &#39;5&#39;, &#39;6\n&#39;]})
    &gt;&gt;&gt; df
       a    b
    0  1\n  4\n
    1  2\n  5
    2  3    6\n

    &gt;&gt;&gt; df.replace({&#39;\n&#39;: &#39;&lt;br&gt;&#39;}, regex=True)
       a      b
    0  1&lt;br&gt;  4&lt;br&gt;
    1  2&lt;br&gt;  5
    2  3      6&lt;br&gt;

It seems Pandas has change its API to avoid ambiguity when handling regex. Now you should use:

    df.replace({&#39;\n&#39;: &#39;&lt;br&gt;&#39;}, regex=True)

For example:

    &gt;&gt;&gt; df = pd.DataFrame({&#39;a&#39;: [&#39;1\n&#39;, &#39;2\n&#39;, &#39;3&#39;], &#39;b&#39;: [&#39;4\n&#39;, &#39;5&#39;, &#39;6\n&#39;]})
    &gt;&gt;&gt; df
       a    b
    0  1\n  4\n
    1  2\n  5
    2  3    6\n

    &gt;&gt;&gt; df.replace({&#39;\n&#39;: &#39;&lt;br&gt;&#39;}, regex=True)
       a      b
    0  1&lt;br&gt;  4&lt;br&gt;
    1  2&lt;br&gt;  5
    2  3      6&lt;br&gt;

You can iterate over all columns and use the method `str.replace`:

    for col in df.columns:
       df[col] = df[col].str.replace(&#39;\n&#39;, &#39;&lt;br&gt;&#39;)

This method uses regex by default.

This will remove all newlines and unecessary spaces. You can edit the **&#39; &#39;.join** to specify a replacement character

        df[&#39;columnname&#39;] = [&#39;&#39;.join(c.split()) for c in df[&#39;columnname&#39;].astype(str)]

With **only** the parent `div` and the child `img` elements as demonstrated below how do I vertically *and* horizontally center the `img` element while **explicitly not** defining the `height` of the parent `div`?

    &lt;div class=&quot;do_not_define_height&quot;&gt;
     &lt;img alt=&quot;No, he&#39;ll be an engineer.&quot; src=&quot;theknack.png&quot; /&gt;
    &lt;/div&gt;

I&#39;m not too familiar with flexbox so I&#39;m okay if flexbox itself fills up the full height, but not any other unrelated properties.

CSS flexbox vertically/horizontally center image WITHOUT explicitely defining parent height

How can I trim the leading or trailing characters from a string in java? 

For example, the slash character &quot;/&quot; - I&#39;m not interested in spaces, and am looking to trim either leading or trailing characters at different times.



Trim leading or trailing characters from a string?

I have a pandas dataframe with about 20 columns.

It is possible to replace all occurrences of a string (here a newline) by manually writing all column names:

    df[&#39;columnname1&#39;] = df[&#39;columnname1&#39;].str.replace(&quot;\n&quot;,&quot;&lt;br&gt;&quot;)
    df[&#39;columnname2&#39;] = df[&#39;columnname2&#39;].str.replace(&quot;\n&quot;,&quot;&lt;br&gt;&quot;)
    df[&#39;columnname3&#39;] = df[&#39;columnname3&#39;].str.replace(&quot;\n&quot;,&quot;&lt;br&gt;&quot;)
    ...
    df[&#39;columnname20&#39;] = df[&#39;columnname20&#39;].str.replace(&quot;\n&quot;,&quot;&lt;br&gt;&quot;)

This unfortunately does not work:

    df = df.replace(&quot;\n&quot;,&quot;&lt;br&gt;&quot;)



Is there any other, more elegant solution?

Replace all occurrences of a string in a pandas dataframe (Python)

I have a pandas dataframe with about 20 columns.
It is possible to replace all occurrences of a string (here a newline) by manually writing all column names:
<pre><code class="hljs language-less">df['columnname1'] = df['columnname1'].str.replace("\n","&#x3C;br>")
df['columnname2'] = df['columnname2'].str.replace("\n","&#x3C;br>")
df['columnname3'] = df['columnname3'].str.replace("\n","&#x3C;br>")
...
df['columnname20'] = df['columnname20'].str.replace("\n","&#x3C;br>")
</code></pre>
This unfortunately does not work:
<pre><code class="hljs language-ini">df = df.replace("\n","&#x3C;br>")
</code></pre>
Is there any other, more elegant solution?

I have a large dataframe (several million rows).

I want to be able to do a groupby operation on it, but just grouping by arbitrary consecutive (preferably equal-sized) subsets of rows, rather than using any particular property of the individual rows to decide which group they go to.

The use case: I want to apply a function to each row via a parallel map in IPython. It doesn&#39;t matter which rows go to which back-end engine, as the function calculates a result based on one row at a time. (Conceptually at least; in reality it&#39;s vectorized.)

I&#39;ve come up with something like this:

    # Generate a number from 0-9 for each row, indicating which tenth of the DF it belongs to
    max_idx = dataframe.index.max()
    tenths = ((10 * dataframe.index) / (1 + max_idx)).astype(np.uint32)
    
    # Use this value to perform a groupby, yielding 10 consecutive chunks
    groups = [g[1] for g in dataframe.groupby(tenths)]
    
    # Process chunks in parallel
    results = dview.map_sync(my_function, groups)

But this seems very long-winded, and doesn&#39;t guarantee equal sized chunks. Especially if the index is sparse or non-integer or whatever.

Any suggestions for a better way?

Thanks!

How to iterate over consecutive chunks of Pandas dataframe efficiently

I am having trouble using json.loads to convert to a dict object and I can&#39;t figure out what I&#39;m doing wrong.The exact error I get running this is 

    ValueError: Expecting property name: line 1 column 2 (char 1)

Here is my code:

    from kafka.client import KafkaClient
    from kafka.consumer import SimpleConsumer
    from kafka.producer import SimpleProducer, KeyedProducer
    import pymongo
    from pymongo import MongoClient
    import json

    c = MongoClient(&quot;54.210.157.57&quot;)
    db = c.test_database3
    collection = db.tweet_col

    kafka = KafkaClient(&quot;54.210.157.57:9092&quot;)

    consumer = SimpleConsumer(kafka,&quot;myconsumer&quot;,&quot;test&quot;)
    for tweet in consumer:
        print tweet.message.value
        jsonTweet=json.loads(({u&#39;favorited&#39;: False, u&#39;contributors&#39;: None})
        collection.insert(jsonTweet)

I&#39;m pretty sure that the error is occuring at the 2nd to last line 

    jsonTweet=json.loads({u&#39;favorited&#39;: False, u&#39;contributors&#39;: None})

 but I do not know what to do to fix it. Any advice would be appreciated.


JSON ValueError: Expecting property name: line 1 column 2 (char 1)

I have a list 

    [[12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [6, 0], [12, 6], [0, 6], [12, 0], [0, 6], [0, 6], [12, 0], [0, 6], [6, 0], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [0, 6], [0, 6], [12, 6], [6, 0], [6, 0], [12, 6], [12, 0], [12, 0], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 0], [12, 0], [12, 0], [12, 0], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [0, 6], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [12, 6], [12, 0], [0, 6], [6, 0], [12, 0], [0, 6], [12, 6], [12, 6], [0, 6], [12, 0], [6, 0], [6, 0], [12, 6], [12, 0], [0, 6], [12, 0], [12, 0], [12, 0], [6, 0], [12, 6], [12, 6], [12, 6], [12, 6], [0, 6], [12, 0], [12, 6], [0, 6], [0, 6], [12, 0], [0, 6], [12, 6], [6, 0], [12, 6], [12, 6], [12, 0], [12, 0], [12, 6], [0, 6], [6, 0], [12, 0], [6, 0], [12, 0], [12, 0], [12, 6], [12, 0], [6, 0], [12, 6], [6, 0], [12, 0], [6, 0], [12, 0], [6, 0], [6, 0]]

I want to count the frequency of each element in this list. 
Something like 

    freq[[12,6]] = 40
In R this can be obtained with the `table` function. Is there anything similar in python3?



python equivalent of R table

I have the following bs4 object listing:


    &gt;&gt;&gt; listing
    &lt;div class=&quot;listingHeader&quot;&gt;
    &lt;h2&gt;
    ....


    &gt;&gt;&gt; type(listing)
    &lt;class &#39;bs4.element.Tag&#39;&gt;

I want to extract the raw html as a string. I&#39;ve tried:

    &gt;&gt;&gt; a = listing.contents
    &gt;&gt;&gt; type(a)
    &lt;type &#39;list&#39;&gt;

So this does not work. How can I do this?

How to get HTML from a beautiful soup object

I have access to NumPy and SciPy and want to create a simple FFT of a data set. I have two lists, one that is `y` values and the other is timestamps for those `y` values. 

What is the simplest way to feed these lists into a SciPy or NumPy method and plot the resulting FFT?

I have looked up examples, but they all rely on creating a set of fake data with some certain number of data points, and frequency, etc. and don&#39;t really show how to do it with just a set of data and the corresponding timestamps.

I have tried the following example:

    from scipy.fftpack import fft

    # Number of samplepoints
    N = 600

    # Sample spacing
    T = 1.0 / 800.0
    x = np.linspace(0.0, N*T, N)
    y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x)
    yf = fft(y)
    xf = np.linspace(0.0, 1.0/(2.0*T), N/2)
    import matplotlib.pyplot as plt
    plt.plot(xf, 2.0/N * np.abs(yf[0:N/2]))
    plt.grid()
    plt.show()

But when I change the argument of `fft` to my data set and plot it, I get extremely odd results, and it appears the scaling for the frequency may be off. I am unsure.

Here is a pastebin of the data I am attempting to FFT

http://pastebin.com/0WhjjMkb
http://pastebin.com/ksM4FvZS

When I use `fft()` on the whole thing it just has a huge spike at zero and nothing else.

Here is my code:

    ## Perform FFT with SciPy
    signalFFT = fft(yInterp)
    
    ## Get power spectral density
    signalPSD = np.abs(signalFFT) ** 2
    
    ## Get frequencies corresponding to signal PSD
    fftFreq = fftfreq(len(signalPSD), spacing)
    
    ## Get positive half of frequencies
    i = fftfreq&gt;0
    
    ##
    plt.figurefigsize = (8, 4)
    plt.plot(fftFreq[i], 10*np.log10(signalPSD[i]));
    #plt.xlim(0, 100);
    plt.xlabel(&#39;Frequency [Hz]&#39;);
    plt.ylabel(&#39;PSD [dB]&#39;)

Spacing is just equal to `xInterp[1]-xInterp[0]`.


Plotting a fast Fourier transform in Python

Have output from `sed`:

    http://sitename.com/galleries/83450
    72-profile

Those two strings should be merged into one and separated with space like:

    http://sitename.com/galleries/83450 72-profile

Two strings are pipelined to `tr` in order to replace newline with space:

    tr &#39;\n&#39; &#39; &#39;

And it&#39;s not working, the result is the same as input.

Indicating space with ASCII code `&#39;\032&#39;` results in replacing `\n` with non-printable characters.

What&#39;s wrong? I&#39;m using Git Bash on Windows.

Using tr to replace newline with space

Suppose I have &#39;abbc&#39; string and I want to replace:

- ab -&gt; bc
- bc -&gt; ab

If I try two replaces the result is not what I want:

    echo &#39;abbc&#39; | sed &#39;s/ab/bc/g;s/bc/ab/g&#39;
    abab

So what sed command can I use to replace like below?

    echo abbc | sed SED_COMMAND
    bcab


**EDIT**:
Actually the text could have more than 2 patterns and I don&#39;t know how many replaces I will need. Since there was a answer saying that `sed` is a stream editor and its replaces are greedily I think that I will need to use some script language for that.


How to swap text based on patterns at once with sed?

Let&#39;s say that I have the following code:

    String word1 = &quot;bar&quot;;
    String word2 = &quot;foo&quot;;
    String story = &quot;Once upon a time, there was a foo and a bar.&quot;
    story = story.replace(&quot;foo&quot;, word1);
    story = story.replace(&quot;bar&quot;, word2);

After this code runs, the value of `story` will be `&quot;Once upon a time, there was a foo and a foo.&quot;`

A similar issue occurs if I replaced them in the opposite order:

    String word1 = &quot;bar&quot;;
    String word2 = &quot;foo&quot;;
    String story = &quot;Once upon a time, there was a foo and a bar.&quot;
    story = story.replace(&quot;bar&quot;, word2);
    story = story.replace(&quot;foo&quot;, word1);

The value of `story` will be `&quot;Once upon a time, there was a bar and a bar.&quot;`

My goal is to turn `story` into `&quot;Once upon a time, there was a bar and a foo.&quot;` How could I accomplish that?

How can I replace two strings in a way that one does not end up replacing the other?

I have a pandas dataframe `df` as illustrated below:

    BrandName Specialty
    A          H
    B          I
    ABC        J
    D          K
    AB         L

I want to replace `&#39;ABC&#39;` and `&#39;AB&#39;` in column `BrandName` by `&#39;A&#39;`.
Can someone help with this?


Replacing few values in a pandas dataframe column with another value

I would like to know if there is someway of replacing all DataFrame negative numbers by zeros?

How to replace negative numbers in Pandas Data Frame by zero

I have the following DataFrame:

    In [1]:
    df = pd.DataFrame({&#39;a&#39;: [1, 2, 3],
                       &#39;b&#39;: [2, 3, 4],
                       &#39;c&#39;: [&#39;dd&#39;, &#39;ee&#39;, &#39;ff&#39;],
                       &#39;d&#39;: [5, 9, 1]})

    df
    Out [1]:
       a  b   c  d
    0  1  2  dd  5
    1  2  3  ee  9
    2  3  4  ff  1



I would like to add a column `&#39;e&#39;` which is the sum of columns `&#39;a&#39;`, `&#39;b&#39;` and `&#39;d&#39;`.

Going across forums, I thought something like this would work:

    df[&#39;e&#39;] = df[[&#39;a&#39;, &#39;b&#39;, &#39;d&#39;]].map(sum)

But it didn&#39;t.

I would like to know the appropriate operation with the list of columns `[&#39;a&#39;, &#39;b&#39;, &#39;d&#39;]` and `df` as inputs.

Pandas: sum DataFrame rows for given columns

I&#39;ve been very confused about how python axes are defined, and whether they refer to a DataFrame&#39;s rows or columns. Consider the code below:

    &gt;&gt;&gt; df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]], columns=[&quot;col1&quot;, &quot;col2&quot;, &quot;col3&quot;, &quot;col4&quot;])
    &gt;&gt;&gt; df
       col1  col2  col3  col4
    0     1     1     1     1
    1     2     2     2     2
    2     3     3     3     3

So if we call `df.mean(axis=1)`, we&#39;ll get a mean across the rows:

    &gt;&gt;&gt; df.mean(axis=1)
    0    1
    1    2
    2    3

However, if we call `df.drop(name, axis=1)`, we actually **drop a column**, not a row:

    &gt;&gt;&gt; df.drop(&quot;col4&quot;, axis=1)
       col1  col2  col3
    0     1     1     1
    1     2     2     2
    2     3     3     3

Can someone help me understand what is meant by an &quot;axis&quot; in pandas/numpy/scipy?

A side note, `DataFrame.mean` just might be defined wrong. It says in the documentation for [`DataFrame.mean`][1] that `axis=1` is supposed to mean a mean over the columns, not the rows...

  [1]: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mean.html

Ambiguity in Pandas Dataframe / Numpy Array &quot;axis&quot; definition

I&#39;ve read something about a Python 2 limitation with respect to Pandas&#39; to_csv( ... etc ...).  Have I hit it? I&#39;m on Python 2.7.3

This turns out trash characters for ≥ and - when they appear in strings. Aside from that the export is perfect.

    df.to_csv(&quot;file.csv&quot;, encoding=&quot;utf-8&quot;) 

Is there any workaround?

df.head() is this:


    demography  Adults ≥49 yrs  Adults 18−49 yrs at high risk||  \
    state                                                           
    Alabama                 32.7                             38.6   
    Alaska                  31.2                             33.2   
    Arizona                 22.9                             38.8   
    Arkansas                31.2                             34.0   
    California              29.8                             38.8  

csv output is this

	state,	Adults &#226;‰&#165;49 yrs,	Adults 18&#226;ˆ’49 yrs at high risk||
    0,	Alabama,	32.7,	38.6
    1,	Alaska,	31.2,	33.2
    2,	Arizona,	22.9,	38.8
    3,	Arkansas,31.2,  34
    4,	California,29.8, 38.8


the whole code is this:  

    import pandas
    import xlrd
    import csv
    import json

    df = pandas.DataFrame()
    dy = pandas.DataFrame()
    # first merge all this xls together


    workbook = xlrd.open_workbook(&#39;csv_merger/vaccoverage.xls&#39;)
    worksheets = workbook.sheet_names()


    for i in range(3,len(worksheets)):
	    dy = pandas.io.excel.read_excel(workbook, i, engine=&#39;xlrd&#39;, index=None)
	    i = i+1
	    df = df.append(dy)
    
    df.index.name = &quot;index&quot;

    df.columns = [&#39;demography&#39;, &#39;area&#39;,&#39;state&#39;, &#39;month&#39;, &#39;rate&#39;, &#39;moe&#39;]

    #Then just grab month = &#39;May&#39;

    may_mask = df[&#39;month&#39;] == &quot;May&quot;
    may_df = (df[may_mask])

    #then delete some columns we dont need

    may_df = may_df.drop(&#39;area&#39;, 1)
    may_df = may_df.drop(&#39;month&#39;, 1)
    may_df = may_df.drop(&#39;moe&#39;, 1)


    print may_df.dtypes #uh oh, it sees &#39;rate&#39; as type &#39;object&#39;, not &#39;float&#39;.  Better change that.

    may_df = may_df.convert_objects(&#39;rate&#39;, convert_numeric=True)

    print may_df.dtypes #that&#39;s better

    res = may_df.pivot_table(&#39;rate&#39;, &#39;state&#39;, &#39;demography&#39;)
    print res.head()


    #and this is going to spit out an array of Objects, each Object a state containing its demographics
    res.reset_index().to_json(&quot;thejson.json&quot;, orient=&#39;records&#39;)
    #and a .csv for good measure
    res.reset_index().to_csv(&quot;thecsv.csv&quot;, orient=&#39;records&#39;, encoding=&quot;utf-8&quot;)

Pandas df.to_csv(&quot;file.csv&quot; encode=&quot;utf-8&quot;) still gives trash characters for minus sign

I want to create a new column in Pandas using a string sliced for another column in the dataframe.

For example.

    Sample  Value  New_sample
    AAB     23     A
    BAB     25     B


Where `New_sample` is a new column formed from a simple `[:1]` slice of `Sample`

I&#39;ve tried a number of things to no avail - I feel I&#39;m missing something simple.

What&#39;s the most efficient way of doing this?

Pandas make new column from string slice of another column

According to [this documentation](http://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html#left-outer-join) I can only make a join between fields having the same name.

Do you know if it&#39;s possible to join two DataFrames on a field having different names?

The equivalent in SQL would be:

    SELECT *
    FROM df1
    LEFT OUTER JOIN df2
      ON df1.id_key = df2.fk_key

Pandas: join DataFrames on field with different names?

I have a python-pandas-DataFrame in which first column is `&quot;user_id&quot;` and rest of the columns are tags(`&quot;Tag_0&quot;` to `&quot;Tag_122&quot;`). 

I have the data in the following format:

    UserId	Tag_0	Tag_1
    7867688	0	5
    7867688	0	3
    7867688	3	0
    7867688	3.5	3.5
    7867688	4	4
    7867688	3.5	0

My aim is to achieve `Sum(Tag)/Count(NonZero(Tags))` for each user_id

`df.groupby(&#39;user_id&#39;).sum()`, gives me `sum(tag)`, however I am clueless about counting non zero values

Is it possible to achieve `Sum(Tag)/Count(NonZero(Tags))` in one command?

In MySQL I could achieve this as follows:-

    select user_id, sum(tag)/count(nullif(tag,0)) from table group by 1

Any help shall be appreciated.



Counting non zero values in each column of a DataFrame in python

I have a Pandas series sf:

    email
    email1@email.com    [1.0, 0.0, 0.0]
    email2@email.com    [2.0, 0.0, 0.0]
    email3@email.com    [1.0, 0.0, 0.0]
    email4@email.com    [4.0, 0.0, 0.0]
    email5@email.com    [1.0, 0.0, 3.0]
    email6@email.com    [1.0, 5.0, 0.0]

And I would like to transform it to the following DataFrame:

    index | email             | list
    _____________________________________________
    0     | email1@email.com  | [1.0, 0.0, 0.0]
    1     | email2@email.com  | [2.0, 0.0, 0.0]
    2     | email3@email.com  | [1.0, 0.0, 0.0]
    3     | email4@email.com  | [4.0, 0.0, 0.0]
    4     | email5@email.com  | [1.0, 0.0, 3.0]
    5     | email6@email.com  | [1.0, 5.0, 0.0]

I found a way to do it, but I doubt it&#39;s the more efficient one:

    df1 = pd.DataFrame(data=sf.index, columns=[&#39;email&#39;])
    df2 = pd.DataFrame(data=sf.values, columns=[&#39;list&#39;])
    df = pd.merge(df1, df2, left_index=True, right_index=True)

Content Type	Original Author	Original Content on Stackoverflow
Question	nauti	View Question on Stackoverflow
Solution 1 - Python	Alex Riley	View Answer on Stackoverflow
Solution 2 - Python	Yichuan Wang	View Answer on Stackoverflow
Solution 3 - Python	Mykola Zotko	View Answer on Stackoverflow
Solution 4 - Python	Jasper Kinoti	View Answer on Stackoverflow

Replace all occurrences of a string in a pandas dataframe (Python)

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

Trim leading or trailing characters from a string?

CSS flexbox vertically/horizontally center image WITHOUT explicitely defining parent height

Attributions