Most [sklearn][1] objects work with `pandas` dataframes just fine, would something like this work for you?

    import pandas as pd
    import numpy as np
    from sklearn.decomposition import PCA
    
    df = pd.DataFrame(data=np.random.normal(0, 1, (20, 10)))
    
    pca = PCA(n_components=5)
    pca.fit(df)

You can access the components themselves with

    pca.components_ 


  [1]: http://scikit-learn.org/stable/

    import pandas
    from sklearn.decomposition import PCA
    import numpy
    import matplotlib.pyplot as plot
    
    df = pandas.DataFrame(data=numpy.random.normal(0, 1, (20, 10)))
    
    # You must normalize the data before applying the fit method
    df_normalized=(df - df.mean()) / df.std()
    pca = PCA(n_components=df.shape[1])
    pca.fit(df_normalized)
    
    # Reformat and view results
    loadings = pandas.DataFrame(pca.components_.T,
    columns=[&#39;PC%s&#39; % _ for _ in range(len(df_normalized.columns))],
    index=df.columns)
    print(loadings)

    plot.plot(pca.explained_variance_ratio_)
    plot.ylabel(&#39;Explained Variance&#39;)
    plot.xlabel(&#39;Components&#39;)
    plot.show()

I am trying to include a php file in a page via

      require_once(http://localhost/web/a.php)

I am getting an error 
  
     Warning: require_once(): http:// wrapper is disabled in the server configuration by   allow_url_include=0

I changed `allow_url_include=1` in the php.ini and that worked but I don&#39;t think that everybody will let me change their php.ini file.

So, is there any way to accomplish this?

Warning: require_once(): http:// wrapper is disabled in the server configuration by allow_url_include=0

I had to do a presentation yesterday, and as part of the presentation, I used Eclipse to show some code. Many of my coworkers in the room could not read the text and asked me to increase the size of the text for ALL files, not just Java files or XML files.

But it wasn&#39;t immediately obvious from the available options how to do this. I went to menu *Window* → *Preferences and typed font* in the search input. This filtered the options to *General* → *Appearance* → *Colors and Fonts*. From here, I could see an option to change the font in Java files, but I didn&#39;t know how to change the font globally.

I&#39;m using Eclipse&amp;nbsp;v4.3 Service Release 1 (Kepler) on Windows.

This is similar to Stack&amp;nbsp;Overflow question *[How can I change font size in Eclipse for Java text editors?][1]*.

  [1]: https://stackoverflow.com/questions/4922305



How can I change font size in Eclipse for ALL text editors?

How can I calculate Principal Components Analysis from data in a pandas dataframe?



Principal components analysis using pandas dataframe

<p>How can I calculate Principal Components Analysis from data in a pandas dataframe?</p>


I am newbie to Python and need to convert a list to dictionary. I know that we can convert a list of tuples to a dictionary. 

This is the input list: 

    L = [1,term1, 3, term2, x, term3,... z, termN]

and I want to convert this list to a list of tuples (or directly to a dictionary) like this:

    [(1, term1), (3, term2), (x, term3), ...(z, termN)]

How can we do that easily in Python?

How to convert a list to a list of tuples?

I&#39;m using the `nltk` library&#39;s `movie_reviews` corpus which contains a large number of documents. My task is get  predictive performance of these reviews with pre-processing of the data and without pre-processing. But there is problem, in lists `documents` and `documents2` I have the same documents and I need shuffle them in order to keep same order in both lists. I cannot shuffle them separately because each time I shuffle the list, I get other results. That is why I need to shuffle the at once with same order because I need compare them in the end (it depends on order). I&#39;m using python 2.7

Example (in real are strings tokenized, but it is not relative):

    documents = [([&#39;plot : two teen couples go to a church party , &#39;], &#39;neg&#39;),
                 ([&#39;drink and then drive . &#39;], &#39;pos&#39;),
                 ([&#39;they get into an accident . &#39;], &#39;neg&#39;),
                 ([&#39;one of the guys dies&#39;], &#39;neg&#39;)]
    
    documents2 = [([&#39;plot two teen couples church party&#39;], &#39;neg&#39;),
                  ([&#39;drink then drive . &#39;], &#39;pos&#39;),
                  ([&#39;they get accident . &#39;], &#39;neg&#39;),
                  ([&#39;one guys dies&#39;], &#39;neg&#39;)]

And I need get this result after shuffle both lists:

    documents = [([&#39;one of the guys dies&#39;], &#39;neg&#39;),
                 ([&#39;they get into an accident . &#39;], &#39;neg&#39;),
                 ([&#39;drink and then drive . &#39;], &#39;pos&#39;),
                 ([&#39;plot : two teen couples go to a church party , &#39;], &#39;neg&#39;)]
    
    documents2 = [([&#39;one guys dies&#39;], &#39;neg&#39;),
                  ([&#39;they get accident . &#39;], &#39;neg&#39;),
                  ([&#39;drink then drive . &#39;], &#39;pos&#39;),
                  ([&#39;plot two teen couples church party&#39;], &#39;neg&#39;)]


I have this code:

    def cleanDoc(doc):
        stopset = set(stopwords.words(&#39;english&#39;))
        stemmer = nltk.PorterStemmer()
        clean = [token.lower() for token in doc if token.lower() not in stopset and len(token) &gt; 2]
        final = [stemmer.stem(word) for word in clean]
        return final
    
    documents = [(list(movie_reviews.words(fileid)), category)
                 for category in movie_reviews.categories()
                 for fileid in movie_reviews.fileids(category)]
    
    documents2 = [(list(cleanDoc(movie_reviews.words(fileid))), category)
                 for category in movie_reviews.categories()
                 for fileid in movie_reviews.fileids(category)]
    
    random.shuffle( and here shuffle documents and documents2 with same order) # or somehow

Shuffle two list at once with same order

IPython Notebook comes with [`nbconvert`][1], which can _export_ notebooks to other formats.  But how do I convert text in the opposite direction? I ask because I already have materials, and a good workflow, in a different format, but I would like to take advantage of Notebook&#39;s interactive environment.

A likely solution: A notebook can be created by importing a `.py` file, and the documentation states that when `nbconvert` exports a notebook as a python script, it embeds directives in comments that can be used to recreate the notebook. But the information comes with [a disclaimer][2] about the limitations of this method, and the accepted format is not documented anywhere that I could find. (A sample is shown, oddly enough, in the section describing notebook&#39;s [JSON format][3]). Can anyone provide more information, or a better alternative?

**Edit (1 March 2016):** The accepted answer no longer works, because for some reason this input format is not supported by version 4 of the Notebook API.  **I have added [a self-answer][4]** showing how to import a notebook with the current (v4) API. (I am not un-accepting the current answer, since it solved my problem at the time and pointed me to the resources I used in my self-answer.)


  [1]: http://ipython.org/ipython-doc/2/notebook/nbconvert.html
  [2]: http://ipython.org/ipython-doc/2/notebook/notebook.html#importing-py-files
  [3]: http://ipython.org/ipython-doc/2/notebook/nbconvert.html#notebook-json-file-format
  [4]: https://stackoverflow.com/a/35720002/699305

Converting to (not from) ipython Notebook format

I am writing a program that accepts an input from the user.

    #note: Python 2.7 users should use `raw_input`, the equivalent of 3.X&#39;s `input`
    age = int(input(&quot;Please enter your age: &quot;))
    if age &gt;= 18: 
        print(&quot;You are able to vote in the United States!&quot;)
    else:
        print(&quot;You are not able to vote in the United States.&quot;)

The program works as expected as long as the the user enters meaningful data. 

&lt;!-- language: lang-none --&gt;

    C:\Python\Projects&gt; canyouvote.py
    Please enter your age: 23
    You are able to vote in the United States!

But it fails if the user enters invalid data:

&lt;!-- language: lang-none --&gt;

    C:\Python\Projects&gt; canyouvote.py
    Please enter your age: dickety six
    Traceback (most recent call last):
      File &quot;canyouvote.py&quot;, line 1, in &lt;module&gt;
        age = int(input(&quot;Please enter your age: &quot;))
    ValueError: invalid literal for int() with base 10: &#39;dickety six&#39;

Instead of crashing, I would like the program to ask for the input again. Like this:

&lt;!-- language: lang-none --&gt;

    C:\Python\Projects&gt; canyouvote.py
    Please enter your age: dickety six
    Sorry, I didn&#39;t understand that.
    Please enter your age: 26
    You are able to vote in the United States!

How can I make the program ask for valid inputs instead of crashing when non-sensical data is entered?

How can I reject values like `-1`, which is a valid `int`, but nonsensical in this context?

Asking the user for input until they give a valid response

I&#39;m confused about the rules Pandas uses when deciding that a selection from a dataframe is a copy of the original dataframe, or a view on the original.

If I have, for example,

    df = pd.DataFrame(np.random.randn(8,8), columns=list(&#39;ABCDEFGH&#39;), index=range(1,9))

I understand that a `query` returns a copy so that something like

    foo = df.query(&#39;2 &lt; index &lt;= 5&#39;)
    foo.loc[:,&#39;E&#39;] = 40

will have no effect on the original dataframe, `df`. I also understand that scalar or named slices return a view, so that assignments to these, such as 

    df.iloc[3] = 70

or 

    df.ix[1,&#39;B&#39;:&#39;E&#39;] = 222

will change `df`. But I&#39;m lost when it comes to more complicated cases. For example, 

    df[df.C &lt;= df.B] = 7654321

changes `df`, but

    df[df.C &lt;= df.B].ix[:,&#39;B&#39;:&#39;E&#39;]

does not.

Is there a simple rule that Pandas is using that I&#39;m just missing? What&#39;s going on in these specific cases; and in particular, how do I change all values (or a subset of values) in a dataframe that satisfy a particular query (as I&#39;m attempting to do in the last example above)?

---

Note: This is not the same as [this question][1]; and I have read [the documentation][2], but am not enlightened by it. I&#39;ve also read through the &quot;Related&quot; questions on this topic, but I&#39;m still missing the simple rule Pandas is using, and how I&#39;d apply it to — for example —&#160;modify the values (or a subset of values) in a dataframe that satisfy a particular query.


  [1]: https://stackoverflow.com/q/17960511/656912
  [2]: http://pandas.pydata.org/pandas-docs/dev/indexing.html#returning-a-view-versus-a-copy

What rules does Pandas use to generate a view vs a copy?

I&#39;m trying to replace the values in one column of a dataframe. The column (&#39;female&#39;) only contains the values &#39;female&#39; and &#39;male&#39;. 

I have tried the following:

    w[&#39;female&#39;][&#39;female&#39;]=&#39;1&#39;
    w[&#39;female&#39;][&#39;male&#39;]=&#39;0&#39; 

But receive the exact same copy of the previous results.

I would ideally like to get some output which resembles the following loop element-wise.

    if w[&#39;female&#39;] ==&#39;female&#39;:
        w[&#39;female&#39;] = &#39;1&#39;;
    else:
        w[&#39;female&#39;] = &#39;0&#39;;

I&#39;ve looked through the gotchas documentation (http://pandas.pydata.org/pandas-docs/stable/gotchas.html) but cannot figure out why nothing happens.

Any help will be appreciated.

Replacing column values in a pandas DataFrame

I have a pandas dataframe with a column named &#39;City, State, Country&#39;. I want to separate this column into three new columns, &#39;City, &#39;State&#39; and &#39;Country&#39;.

    0                 HUN
    1                 ESP
    2                 GBR
    3                 ESP
    4                 FRA
    5             ID, USA
    6             GA, USA
    7    Hoboken, NJ, USA
    8             NJ, USA
    9                 AUS

Splitting the column into three columns is trivial enough:

    location_df = df[&#39;City, State, Country&#39;].apply(lambda x: pd.Series(x.split(&#39;,&#39;)))

However, this creates left-aligned data:

         0       1       2
    0	 HUN	 NaN     NaN
    1	 ESP	 NaN     NaN
    2	 GBR	 NaN     NaN
    3	 ESP	 NaN     NaN
    4	 FRA	 NaN     NaN
    5	 ID      USA     NaN
    6	 GA      USA     NaN
    7	 Hoboken  NJ     USA
    8	 NJ      USA     NaN
    9	 AUS	 NaN     NaN

How would one go about creating the new columns with the data right-aligned? Would I need to iterate through every row, count the number of commas and handle the contents individually?

Pandas Dataframe: split column into multiple columns, right-align inconsistent cell entries

I have a pandas data frame that looks like this (its a pretty big one)

               date      exer exp     ifor         mat  
    1092  2014-03-17  American   M  528.205  2014-04-19 
    1093  2014-03-17  American   M  528.205  2014-04-19 
    1094  2014-03-17  American   M  528.205  2014-04-19 
    1095  2014-03-17  American   M  528.205  2014-04-19    
    1096  2014-03-17  American   M  528.205  2014-05-17 

now I would like to iterate row by row and as I go through each row, the value of `ifor`
in each row can change depending on some conditions and I need to lookup another dataframe.

Now, how do I update this as I iterate.
Tried a few things none of them worked.

    for i, row in df.iterrows():
        if &lt;something&gt;:
            row[&#39;ifor&#39;] = x
        else:
            row[&#39;ifor&#39;] = y
    
        df.ix[i][&#39;ifor&#39;] = x

None of these approaches seem to work. I don&#39;t see the values updated in the dataframe.

Update a dataframe in pandas while iterating row by row

This is obviously simple, but as a numpy newbe I&#39;m getting stuck.

I have a CSV file that contains 3 columns, the State, the Office ID, and the Sales for that office.

I want to calculate the percentage of sales per office in a given state (total of all percentages in each state is 100%).



    df = pd.DataFrame({&#39;state&#39;: [&#39;CA&#39;, &#39;WA&#39;, &#39;CO&#39;, &#39;AZ&#39;] * 3,
                       &#39;office_id&#39;: range(1, 7) * 2,
                       &#39;sales&#39;: [np.random.randint(100000, 999999)
                                 for _ in range(12)]})

    df.groupby([&#39;state&#39;, &#39;office_id&#39;]).agg({&#39;sales&#39;: &#39;sum&#39;})


This returns:

                      sales
    state office_id        
    AZ    2          839507
          4          373917
          6          347225
    CA    1          798585
          3          890850
          5          454423
    CO    1          819975
          3          202969
          5          614011
    WA    2          163942
          4          369858
          6          959285

I can&#39;t seem to figure out how to &quot;reach up&quot; to the `state` level of the `groupby` to total up the `sales` for the entire `state` to calculate the fraction.

Pandas percentage of total with groupby

How I can get the the eigen values and eigen vectors of the PCA application? 


    from sklearn.decomposition import PCA
    clf=PCA(0.98,whiten=True)      #converse 98% variance
    X_train=clf.fit_transform(X_train)
    X_test=clf.transform(X_test)
I can&#39;t find it in [docs][1].

1.I am &quot;not&quot; able to comprehend the different results here.

**Edit**:

    def pca_code(data):
        #raw_implementation
        var_per=.98
        data-=np.mean(data, axis=0)
        data/=np.std(data, axis=0)
        cov_mat=np.cov(data, rowvar=False)
        evals, evecs = np.linalg.eigh(cov_mat)
        idx = np.argsort(evals)[::-1]
        evecs = evecs[:,idx]
        evals = evals[idx]
        variance_retained=np.cumsum(evals)/np.sum(evals)
        index=np.argmax(variance_retained&gt;=var_per)
        evecs = evecs[:,:index+1]
        reduced_data=np.dot(evecs.T, data.T).T
        print(evals)
        print(&quot;_&quot;*30)
        print(evecs)
        print(&quot;_&quot;*30)
        #using scipy package
        clf=PCA(var_per)
        X_train=data.T
        X_train=clf.fit_transform(X_train)
        print(clf.explained_variance_)
        print(&quot;_&quot;*30)
        print(clf.components_)
        print(&quot;__&quot;*30)


  [1]: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA.get_precision


2. I wish to obtain all the eigenvalues and eigenvectors instead of just the reduced set with the convergence condition.

Obtain eigen values and vectors from sklearn PCA

I have performed a PCA analysis over my original dataset and from the compressed dataset transformed by the PCA I have also selected the number of PC I want to keep (they explain almost the 94% of the variance). Now I am struggling with the identification of the original features that are important in the reduced dataset. 
How do I find out which feature is important and which is not among the remaining Principal Components after the dimension reduction?
Here is my code:


    from sklearn.decomposition import PCA
    pca = PCA(n_components=8)
    pca.fit(scaledDataset)
    projection = pca.transform(scaledDataset)

Furthermore, I tried also to perform a clustering algorithm on the reduced dataset but surprisingly for me, the score is lower than on the original dataset. How is it possible? 

Content Type	Original Author	Original Content on Stackoverflow
Question	user3362813	View Question on Stackoverflow
Solution 1 - Python	Akavall	View Answer on Stackoverflow
Solution 2 - Python	NL23codes	View Answer on Stackoverflow

Principal components analysis using pandas dataframe

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

How can I change font size in Eclipse for ALL text editors?

Warning: require_once(): http:// wrapper is disabled in the server configuration by allow_url_include=0

Attributions