How about:

    df[&#39;new_col&#39;] = range(1, len(df) + 1)

Alternatively if you want the index to be the ranks and store the original index as a column:

    df = df.reset_index()

I stumbled on this question while trying to do the same thing (I think). Here is how I did it:

    df[&#39;index_col&#39;] = df.index

You can then sort on the new index column, if you like.

How about this:

    from pandas import *
    
    idx = Int64Index([171, 174, 173])
    df = DataFrame(index = idx, data =([1,2,3]))
    print df
    
It gives me:
    
         0
    171  1
    174  2
    173  3

Is this what you are looking for?


The way to do that would be this:

Resetting the index:

    df.reset_index(drop=True, inplace=True)

Sorting an index:

    df.sort_index(inplace=True)

Setting a new index from a column:

    df.set_index(&#39;column_name&#39;, inplace=True)

Setting a new index from a range:

    df.index = range(1, 31, 1) #a range starting at one ending at 30 with a stepsize of 1.

Sorting a dataframe based on column value:

    df.sort_values(by=&#39;column_name&#39;, inplace=True)

Reassigning variables works as-well:

    df=df.reset_index(drop=True)
    df=df.sort_index()
    df=df.set_index(&#39;column_name&#39;)
    df.index = range(1, 31, 1) #a range starting at one ending at 30 with a stepsize of 1.
    df=df.sort_values(by=&#39;column_name&#39;)

I read a thoughtful [series of blog posts][1] about the new `&lt;system_error&gt;` header in C++11.  It says that the header defines an `error_code` class that represents a specific error value returned by an operation (such as a system call).  It says that the header defines a `system_error` class, which is an exception class (inherits from `runtime_exception`) and is used to wrap `error_codes`s.

What I want to know is how to actually convert a system error from `errno` into a `system_error` so I can throw it.  For example, the POSIX `open` function reports errors by returning -1 and setting `errno`, so if I want to throw an exception how should I complete the code below?

    void x()
    {
        fd = open(&quot;foo&quot;, O_RDWR);
        if (fd == -1)
        {
            throw /* need some code here to make a std::system_error from errno */;
        }
    }

I randomly tried:

    errno = ENOENT;
    throw std::system_error();

but the resulting exception returns no information when `what()` is called.

I know I could do `throw errno;` but I want to do it the right way, using the new `&lt;system_error&gt;` header.

There is a constructor for `system_error` that takes a single `error_code` as its argument, so if I can just convert `errno` to `error_code` then the rest should be obvious.

This seems like a really basic thing, so I don&#39;t know why I can&#39;t find a good tutorial on it.

I am using gcc 4.4.5 on an ARM processor, if that matters.

  [1]: http://blog.think-async.com/2010/04/system-error-support-in-c0x-part-1.html

How to convert errno to exception using &lt;system_error&gt;

I read that to suppress the newline after a print statement you can put a comma after the text. The example [here][1] looks like Python 2. **How can it be done in Python 3?**

For example:

    for item in [1,2,3,4]:
        print(item, &quot; &quot;)

What needs to change so that it prints them on the same line?


  [1]: https://stackoverflow.com/a/4390955/1343005

How can I suppress the newline after a print statement?

The index that I have in the dataframe (with 30 rows) is of the form:

    Int64Index([171, 174,173, 172, 199..............
            ....175, 200])

The index is not strictly increasing because the data frame is the output of a sort().
I want to have add a column which is the series:
    
    [1, 2, 3, 4, 5......................., 30]

How should I go about doing that?

Pandas (python): How to add column to dataframe for index?

The index that I have in the dataframe (with 30 rows) is of the form:
<pre><code class="hljs language-scss">Int64Index([171, 174,173, 172, 199..............
 ....175, 200])
</code></pre>
The index is not strictly increasing because the data frame is the output of a sort().
I want to have add a column which is the series:
<pre><code class="hljs language-csharp">[1, 2, 3, 4, 5......................., 30]
</code></pre>
How should I go about doing that?

I have a dataframe with columns `A`,`B`. I need to create a column `C` such that for every record / row:

`C = max(A, B)`.

How should I go about doing this?

Find the max of two or more columns with pandas

Since Python&#39;s `string` can&#39;t be changed, I was wondering how to concatenate a string more efficiently?

I can write like it:

```py
s += stringfromelsewhere
```

or like this:

```py
s = []

s.append(somestring)
    
# later
    
s = &#39;&#39;.join(s)
```

While writing this question, I found a good article talking about the topic.

http://www.skymind.com/~ocrow/python_string/

But it&#39;s in Python 2.x., so the question would be did something change in Python 3?

Which is the preferred way to concatenate a string in Python?

I don&#39;t understand the following from [pep-0404][1]


&gt; In Python 3, implicit relative imports within packages are no longer
&gt; available - only absolute imports and explicit relative imports are
&gt; supported. In addition, star imports (e.g. from x import *) are only
&gt; permitted in module level code.

What is a relative import?
In what other places star import was allowed in python2?
Please explain with examples.

  [1]: http://www.python.org/dev/peps/pep-0404/

Changes in import statement python3

Could someone explain to me the meaning of `@classmethod` and `@staticmethod` in python? I need to know the difference and the meaning. 

As far as I understand, `@classmethod` tells a class that it&#39;s a method which should be inherited into subclasses, or... something. However, what&#39;s the point of that? Why not just define the class method without adding `@classmethod` or `@staticmethod` or any `@` definitions?

**tl;dr:** *when* should I use them, *why* should I use them, and *how* should I use them?


Meaning of @classmethod and @staticmethod for beginner?

I&#39;m just starting to learn Flask, and I am trying to create a form which will allow a **POST** method.

Here&#39;s my method:

    @app.route(&#39;/template&#39;, methods=[&#39;GET&#39;, &#39;POST&#39;])
    def template():
        if request.method == &#39;POST&#39;:
            return(&quot;Hello&quot;)
        return render_template(&#39;index.html&#39;)

And my `index.html`:

&lt;!-- language: lang-html --&gt;

    &lt;html&gt;

    &lt;head&gt;
      &lt;title&gt; Title &lt;/title&gt;
    &lt;/head&gt;

    &lt;body&gt;
      Enter Python to execute:
      &lt;form action=&quot;/&quot; method=&quot;post&quot;&gt;
        &lt;input type=&quot;text&quot; name=&quot;expression&quot; /&gt;
        &lt;input type=&quot;submit&quot; value=&quot;Execute&quot; /&gt;
      &lt;/form&gt;
    &lt;/body&gt;

    &lt;/html&gt;

Loading the form (rendering it when it receives **GET**) works fine. When I click on the **submit** button however, I get a `POST 405 error Method Not Allowed`.

Why isn&#39;t it displaying **&quot;Hello&quot;**?

Flask - POST Error 405 Method Not Allowed

Imagine a table with multiple columns, say, `id, a, b, c, d, e`. I usually select by `id`, however, there are multiple queries in the client app that uses various conditions over subsets of the columns.

When MySQL executes a query on a single table with multiple WHERE conditions on multiple columns, can it really make use of indexes created on different columns? Or the only way to make it fast is to create multi-column indexes for all possible queries?


Can MySQL use multiple indexes for a single query?

Here is the query:

    SELECT * FROM table WHERE accountid = 1 ORDER BY logindate DESC LIMIT 1

Now if I added an index with multiple columns on the fields:

    INDEX(accountid,logindate)

Would MySQL take advantage of this multiple column index? Or would it not use it because one field is in the where clause and the other is in an order statement? Or does it not matter as long as I use the fields in the order of the multiple column index?

Understanding multiple column indexes in MySQL query

In R when you need to retrieve a column index based on the name of the column you could do

    idx &lt;- which(names(my_data)==my_colum_name)

Is there a way to do the same with pandas dataframes?

Get column index from column name in python pandas

Using:

    set -o nounset

1) Having an indexed array like:

       myArray=( &quot;red&quot; &quot;black&quot; &quot;blue&quot; )

    What is the shortest way to check if element 1 is set?&lt;br&gt;
I sometimes use the following:

       test &quot;${#myArray[@]}&quot; -gt &quot;1&quot; &amp;&amp; echo &quot;1 exists&quot; || echo &quot;1 doesn&#39;t exist&quot;

    I would like to know if there&#39;s a preferred one.

2) How to deal with non-consecutive indexes?

       myArray=()
       myArray[12]=&quot;red&quot;
       myArray[51]=&quot;black&quot;
       myArray[129]=&quot;blue&quot;

    How to quick check that `51` is already set for example?

3) How to deal with associative arrays?

       declare -A myArray
       myArray[&quot;key1&quot;]=&quot;red&quot;
       myArray[&quot;key2&quot;]=&quot;black&quot;
       myArray[&quot;key3&quot;]=&quot;blue&quot;

    How to quick check that `key2` is already used for example?

Easiest way to check for an index or a key in an array?

I&#39;m a little bit confused about how to get an index of a selected option from a HTML `&lt;select&gt;` item.

On [this][1] page there are two methods described. However, both are always returning `-1`. Here is my jQuery code:

    $(document).ready(function(){
    	$(&quot;#dropDownMenuKategorie&quot;).change(function(){
    		alert($(&quot;#dropDownMenuKategorie option:selected&quot;).index());
    		alert($(&quot;select[name=&#39;dropDownMenuKategorie&#39;] option:selected&quot;).index());
    	});
    });

and in html

    (...)
    &lt;select id=&quot;dropDownMenuKategorie&quot;&gt;
    	&lt;option value=&quot;gastronomie&quot;&gt;Gastronomie&lt;/option&gt;
    	&lt;option value=&quot;finanzen&quot;&gt;Finanzen&lt;/option&gt;
    	&lt;option value=&quot;lebensmittel&quot;&gt;Lebensmittel&lt;/option&gt;
    	&lt;option value=&quot;gewerbe&quot;&gt;Gewerbe&lt;/option&gt;
    	&lt;option value=&quot;shopping&quot;&gt;Shopping&lt;/option&gt;
    	&lt;option value=&quot;bildung&quot;&gt;Bildung&lt;/option&gt;
    &lt;/select&gt;
    (...)

Why this behavior?  Is there any chance that the `select` is not &quot;ready&quot; at the moment of assigning its `change()` method? Additionally, changing `.index()` to `.val()` is returning the right value, so that&#39;s what confuses me even more.

  [1]: http://www.theextremewebdesigns.com/blog/jquery-get-selected-index-jquery-get-selected-option-index-2-ways/

Get index of selected option with jQuery

In R, I have an operation which creates some `Inf` values when I transform a dataframe.  

I would like to turn these `Inf` values into `NA` values.  The code I have is slow for large data, is there a faster way of doing this? 

Say I have the following dataframe: 

    dat &lt;- data.frame(a=c(1, Inf), b=c(Inf, 3), d=c(&quot;a&quot;,&quot;b&quot;))

The following works in a single case: 

     dat[,1][is.infinite(dat[,1])] = NA

So I generalized it with following loop

    cf_DFinf2NA &lt;- function(x)
    {
    	for (i in 1:ncol(x)){
    		  x[,i][is.infinite(x[,i])] = NA
    	}
    	return(x)
    }

But I don&#39;t think that this is really using the power of R. 

Cleaning `Inf` values from an R dataframe

I have the following data frame in IPython, where each row is a single stock:

    In [261]: bdata
    Out[261]:
    &lt;class &#39;pandas.core.frame.DataFrame&#39;&gt;
    Int64Index: 21210 entries, 0 to 21209
    Data columns:
    BloombergTicker      21206  non-null values
    Company              21210  non-null values
    Country              21210  non-null values
    MarketCap            21210  non-null values
    PriceReturn          21210  non-null values
    SEDOL                21210  non-null values
    yearmonth            21210  non-null values
    dtypes: float64(2), int64(1), object(4)


I want to apply a groupby operation that computes cap-weighted average return across everything, per each date in the &quot;yearmonth&quot; column.

This works as expected:

    In [262]: bdata.groupby(&quot;yearmonth&quot;).apply(lambda x: (x[&quot;PriceReturn&quot;]*x[&quot;MarketCap&quot;]/x[&quot;MarketCap&quot;].sum()).sum())
    Out[262]:
    yearmonth
    201204      -0.109444
    201205      -0.290546

But then I want to sort of &quot;broadcast&quot; these values back to the indices in the original data frame, and save them as constant columns where the dates match.

    In [263]: dateGrps = bdata.groupby(&quot;yearmonth&quot;)
    
    In [264]: dateGrps[&quot;MarketReturn&quot;] = dateGrps.apply(lambda x: (x[&quot;PriceReturn&quot;]*x[&quot;MarketCap&quot;]/x[&quot;MarketCap&quot;].sum()).sum())
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    /mnt/bos-devrnd04/usr6/home/espears/ws/Research/Projects/python-util/src/util/&lt;ipython-input-264-4a68c8782426&gt; in &lt;module&gt;()
    ----&gt; 1 dateGrps[&quot;MarketReturn&quot;] = dateGrps.apply(lambda x: (x[&quot;PriceReturn&quot;]*x[&quot;MarketCap&quot;]/x[&quot;MarketCap&quot;].sum()).sum())
    
    TypeError: &#39;DataFrameGroupBy&#39; object does not support item assignment

I realize this naive assignment should not work. But what is the &quot;right&quot; Pandas idiom for assigning the result of a groupby operation into a new column on the parent dataframe?

In the end, I want a column called &quot;MarketReturn&quot; than will be a repeated constant value for all indices that have matching date with the output of the groupby operation.


One hack to achieve this would be the following:

    marketRetsByDate  = dateGrps.apply(lambda x: (x[&quot;PriceReturn&quot;]*x[&quot;MarketCap&quot;]/x[&quot;MarketCap&quot;].sum()).sum())
    
    bdata[&quot;MarketReturn&quot;] = np.repeat(np.NaN, len(bdata))
    
    for elem in marketRetsByDate.index.values:
        bdata[&quot;MarketReturn&quot;][bdata[&quot;yearmonth&quot;]==elem] = marketRetsByDate.ix[elem]

But this is slow, bad, and unPythonic.

Python Pandas How to assign groupby operation results back to columns in parent dataframe?

I&#39;m familiar with being able to extract columns from an R data frame (or matrix) like so:

    df.2 &lt;- df[, c(&quot;name1&quot;, &quot;name2&quot;, &quot;name3&quot;)]

But can one use a `!` or other tool to select *all but those listed columns*?

For background, I have a data frame with quite a few column vectors and I&#39;d like to avoid:

- Typing out the majority of the names when I could just remove a minority
- Using the much shorter `df.2 &lt;- df[, c(1,3,5)]` because when my .csv file changes, my code goes to heck since the numbering isn&#39;t the same anymore. I&#39;m new to R and think I&#39;ve learned the hard way not to use number vectors for larger df&#39;s that might change.

I tried:

    df.2 &lt;- df[, !c(&quot;name1&quot;, &quot;name2&quot;, &quot;name3&quot;)]
    df.2 &lt;- df[, !=c(&quot;name1&quot;, &quot;name2&quot;, &quot;name3&quot;)]

And just as I was typing this, found out that this works:

    df.2 &lt;- df[, !names(df) %in% c(&quot;name1&quot;, &quot;name2&quot;, &quot;name3&quot;)]

Is there a better way than this last one?

Selecting columns in R data frame based on those *not* in a vector

I have a dataframe generated from Python&#39;s Pandas package. How can I generate  heatmap using DataFrame from pandas package. 


    import numpy as np 
    from pandas import *

    Index= [&#39;aaa&#39;,&#39;bbb&#39;,&#39;ccc&#39;,&#39;ddd&#39;,&#39;eee&#39;]
    Cols = [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;,&#39;D&#39;]
    df = DataFrame(abs(np.random.randn(5, 4)), index= Index, columns=Cols)

    &gt;&gt;&gt; df
              A         B         C         D
    aaa  2.431645  1.248688  0.267648  0.613826
    bbb  0.809296  1.671020  1.564420  0.347662
    ccc  1.501939  1.126518  0.702019  1.596048
    ddd  0.137160  0.147368  1.504663  0.202822
    eee  0.134540  3.708104  0.309097  1.641090
    &gt;&gt;&gt; 






Making heatmap from pandas DataFrame

I want to apply a function with arguments to a series in python pandas:

    x = my_series.apply(my_function, more_arguments_1)
    y = my_series.apply(my_function, more_arguments_2)
    ...

The [documentation][1] describes support for an apply method, but it doesn&#39;t accept any arguments.  Is there a different method that accepts arguments?  Alternatively, am I missing a simple workaround?

**Update (October 2017):**  Note that since this question was originally asked that pandas `apply()` has been updated to handle positional and keyword arguments and the documentation link above now reflects that and shows how to include either type of argument.

  [1]: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html

python pandas: apply a function with arguments to a series

I&#39;m trying to read a fairly large CSV file with Pandas and split it up into two random chunks, one of which being 10% of the data and the other being 90%.

Here&#39;s my current attempt:

    rows = data.index
    row_count = len(rows)
    random.shuffle(list(rows))
    
    data.reindex(rows)
    
    training_data = data[row_count // 10:]
    testing_data = data[:row_count // 10]

For some reason, `sklearn` throws this error when I try to use one of these resulting DataFrame objects inside of a SVM classifier:

    IndexError: each subindex must be either a slice, an integer, Ellipsis, or newaxis

I think I&#39;m doing it wrong. Is there a better way to do this?

Pandas: Sampling a DataFrame

One last newbie pandas question for the day:  How do I generate a table for a single Series?

For example:

    my_series = pandas.Series([1,2,2,3,3,3])
    pandas.magical_frequency_function( my_series )

    &gt;&gt; {
         1 : 1,
         2 : 2, 
         3 : 3
       }

Lots of googling has led me to Series.describe() and pandas.crosstabs, but neither of these does quite what I need: one variable, counts by categories.  Oh, and it&#39;d be nice if it worked for different data types: strings, ints, etc.



Content Type	Original Author	Original Content on Stackoverflow
Question	Navneet	View Question on Stackoverflow
Solution 1 - Python	Chang She	View Answer on Stackoverflow
Solution 2 - Python	user1225054	View Answer on Stackoverflow
Solution 3 - Python	nitin	View Answer on Stackoverflow
Solution 4 - Python	XiB	View Answer on Stackoverflow

Pandas (python): How to add column to dataframe for index?

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

How can I suppress the newline after a print statement?

How to convert errno to exception using <system_error>

Attributions