Use `value_counts` with `normalize=True`:

    df[&#39;gender&#39;].value_counts(normalize=True) * 100

The result is a fraction in range (0, 1]. We multiply by 100 here in order to get the %.

If you do not need to look `M` and `F` values other than `gender` column then, may be you can try using `value_counts()` and `count()` as following:

    df = pd.DataFrame({&#39;gender&#39;:[&#39;M&#39;,&#39;M&#39;,&#39;F&#39;, &#39;F&#39;, &#39;F&#39;]})
    # Percentage calculation
    (df[&#39;gender&#39;].value_counts()/df[&#39;gender&#39;].count())*100

Result:

    F    60.0
    M    40.0
    Name: gender, dtype: float64

 ---
Or, using `groupby`:

    (df.groupby(&#39;gender&#39;).size()/df[&#39;gender&#39;].count())*100

Let&#39;s say there are 200 values out of which 120 are categorized as M and 80 as F

**1)**

    df[&#39;gender&#39;].value_counts()
    
     output:
    
     M=120
     F=80

**2)**

    df[&#39;gender&#39;].value_counts(Normalize=True)
    
      output:
    
      M=0.60
      F=0.40

**3)**

    df[&#39;gender&#39;].value_counts(Normalize=True)*100 #will convert output to percentages
    
      output:
    
      M=60
      F=40
      




# finding the percentage of target variation to chenck imbalance/not.

    g = data[Target_col_Y]
    df = pd.concat([g.value_counts(),              
    g.value_counts(normalize=True).mul(100)],axis=1,keys=(&#39;counts&#39;,&#39;percentage&#39;))

    print (df)

counts  percentage

0   36548   88.734583

1    4640   11.265417

# finding the maximum in the columns percentage here, to check how much #imbalance there

    df1=df.diff(periods=1,axis=0)
    difvalue=df1[[list(df1.columns)[-1]]].max()


    print(&#39;(Gender Male= 0):\n {}%&#39;.format(100 - round(df[&#39;Gender&#39;].mean()*100, 2)))
    print(&#39;(Gender Female=1):\n{}%&#39;.format(round(df[&#39;Gender&#39;].mean()*100, 2)))

I&#39;m trying to add firebase cloud storage to my app. Below is the app build.gradle. But it says: 
Failed to resolve: com.google.firebase:firebase-core:16.0.1.
Why? There is no firebase-core in the dependencies at all.

    apply plugin: &#39;com.android.application&#39;
    
    android {
        compileSdkVersion 27
        defaultConfig {
            applicationId &quot;com.louise.udacity.mydict&quot;
            minSdkVersion 15
            targetSdkVersion 27
            versionCode 1
            versionName &quot;1.0&quot;
            testInstrumentationRunner &quot;android.support.test.runner.AndroidJUnitRunner&quot;
        }
        buildTypes {
            release {
                minifyEnabled false
                proguardFiles getDefaultProguardFile(&#39;proguard-android.txt&#39;), &#39;proguard-rules.pro&#39;
            }
        }
    }
    
    dependencies {
        implementation fileTree(dir: &#39;libs&#39;, include: [&#39;*.jar&#39;])
        implementation &#39;com.android.support:appcompat-v7:27.1.1&#39;
        implementation &#39;com.android.support.constraint:constraint-layout:1.1.0&#39;
        implementation &#39;com.google.firebase:firebase-storage:16.0.1&#39;
        implementation &#39;com.google.firebase:firebase-auth:16.0.1&#39;
        testImplementation &#39;junit:junit:4.12&#39;
        androidTestImplementation &#39;com.android.support.test:runner:1.0.2&#39;
        androidTestImplementation &#39;com.android.support.test.espresso:espresso-core:3.0.2&#39;
    
        implementation &#39;com.google.cloud:google-cloud-storage:1.31.0&#39;
        implementation &#39;com.firebase:firebase-jobdispatcher:0.8.5&#39;
    }
    
    apply plugin: &#39;com.google.gms.google-services&#39;

Failed to resolve: com.google.firebase:firebase-core:16.0.1

I am new to reactjs, I am working on a app. It was running fine, but when I&#39;ve run npm run build command, I am getting error &quot;You need to enable JavaScript to run this app.&quot;. I have made changes in server.js file even I&#39;ve given  &quot;homepage&quot;: &quot;./&quot;, but it did not solved my issue.

And I&#39;ve checked by running laravel project, javascript is enabled in browser, also tried different browsers.

Someone please help me to overcome this error. 


  

I am getting error in console &quot;You need to enable JavaScript to run this app.&quot; reactjs

I want to get a percentage of a particular value in a df column. Say I have a df with (col1, col2 , col3, gender)  gender column has values of M, F, or Other. I want to get the percentage of M, F, Other values in the df. 

I have tried this, which gives me the number M, F, Other instances, but I want these as a percentage of the total number of values in the df.

    df.groupby(&#39;gender&#39;).size()

Can someone help?

Pandas get frequency of item occurrences in a column as percentage

I want to get a percentage of a particular value in a df column. Say I have a df with (col1, col2 , col3, gender) gender column has values of M, F, or Other. I want to get the percentage of M, F, Other values in the df.
I have tried this, which gives me the number M, F, Other instances, but I want these as a percentage of the total number of values in the df.
<pre><code class="hljs language-scss">df.groupby('gender').size()
</code></pre>
Can someone help?

This is a self-answered post. A common problem is to randomly generate dates between a given start and end date. 

There are two cases to consider:

1. random dates with a time component, and 
2. random dates without time

For example, given some start date `2015-01-01` and an end date `2018-01-01`, how can I sample N random dates between this range using pandas?


Generating random dates within a given range in pandas

Python 3.7 was released a while ago, and I wanted to test some of the fancy new `dataclass`+typing features. Getting hints to work right is easy enough, with both native types and those from the `typing` module:

    &gt;&gt;&gt; import dataclasses
    &gt;&gt;&gt; import typing as ty
    &gt;&gt;&gt; 
    ... @dataclasses.dataclass
    ... class Structure:
    ...     a_str: str
    ...     a_str_list: ty.List[str]
    ...
    &gt;&gt;&gt; my_struct = Structure(a_str=&#39;test&#39;, a_str_list=[&#39;t&#39;, &#39;e&#39;, &#39;s&#39;, &#39;t&#39;])
    &gt;&gt;&gt; my_struct.a_str_list[0].  # IDE suggests all the string methods :)

But one other thing that I wanted to try was forcing the type hints as conditions during runtime, i.e. it should not be possible for a `dataclass` with incorrect types to exist. It can be implemented nicely with [`__post_init__`](https://www.python.org/dev/peps/pep-0557/#post-init-processing):

    &gt;&gt;&gt; @dataclasses.dataclass
    ... class Structure:
    ...     a_str: str
    ...     a_str_list: ty.List[str]
    ...     
    ...     def validate(self):
    ...         ret = True
    ...         for field_name, field_def in self.__dataclass_fields__.items():
    ...             actual_type = type(getattr(self, field_name))
    ...             if actual_type != field_def.type:
    ...                 print(f&quot;\t{field_name}: &#39;{actual_type}&#39; instead of &#39;{field_def.type}&#39;&quot;)
    ...                 ret = False
    ...         return ret
    ...     
    ...     def __post_init__(self):
    ...         if not self.validate():
    ...             raise ValueError(&#39;Wrong types&#39;)

This kind of `validate` function works for native types and custom classes, but not those specified by the `typing` module:

    &gt;&gt;&gt; my_struct = Structure(a_str=&#39;test&#39;, a_str_list=[&#39;t&#39;, &#39;e&#39;, &#39;s&#39;, &#39;t&#39;])
    Traceback (most recent call last):
	  a_str_list: &#39;&lt;class &#39;list&#39;&gt;&#39; instead of &#39;typing.List[str]&#39;
      ValueError: Wrong types

Is there a better approach to validate an untyped list with a `typing`-typed one? Preferably one that doesn&#39;t include checking the types of all elements in any `list`, `dict`, `tuple`, or `set` that is a `dataclass`&#39; attribute.

---

Revisiting this question after a couple of years, I&#39;ve now moved to use [`pydantic`](https://pydantic-docs.helpmanual.io/) in cases where I want to validate classes that I&#39;d normally just define a dataclass for. I&#39;ll leave my mark with the currently accepted answer though, since it correctly answers the original question and has outstanding educational value. 

Validating detailed types in python dataclasses

I am trying to make a python package which I want to install using `pip install .` locally. The package name is listed in `pip freeze` but `import &lt;package&gt;` results in an error `No module named &lt;package&gt;`. Also the site-packages folder does only contain a dist-info folder. `find_packages()` is able to find packages. What am I missing?


    import io
    import os
    import sys
    from shutil import rmtree
    
    from setuptools import find_packages, setup, Command
    
    # Package meta-data.
    NAME = &#39;&lt;package&gt;&#39;
    DESCRIPTION = &#39;description&#39;
    URL = &#39;&#39;
    EMAIL = &#39;email&#39;
    AUTHOR = &#39;name&#39;
    
    # What packages are required for this module to be executed?
    REQUIRED = [
        # &#39;requests&#39;, &#39;maya&#39;, &#39;records&#39;,
    ]
    
    # The rest you shouldn&#39;t have to touch too much :)
    # ------------------------------------------------
    # Except, perhaps the License and Trove Classifiers!
    # If you do change the License, remember to change the Trove Classifier for that!
    
    here = os.path.abspath(os.path.dirname(__file__))
    
    
    
    # Where the magic happens:
    setup(
        name=NAME,
        #version=about[&#39;__version__&#39;],
        description=DESCRIPTION,
        # long_description=long_description,
        author=AUTHOR,
        author_email=EMAIL,
        url=URL,
        packages=find_packages(),
        # If your package is a single module, use this instead of &#39;packages&#39;:
        # py_modules=[&#39;mypackage&#39;],
    
        # entry_points={
        #     &#39;console_scripts&#39;: [&#39;mycli=mymodule:cli&#39;],
        # },
        install_requires=REQUIRED,
        include_package_data=True,
        license=&#39;MIT&#39;,
        classifiers=[
            # Trove classifiers
            # Full list: https://pypi.python.org/pypi?%3Aaction=list_classifiers
            &#39;License :: OSI Approved :: MIT License&#39;,
            &#39;Programming Language :: Python&#39;,
            &#39;Programming Language :: Python :: 2.6&#39;,
            &#39;Programming Language :: Python :: 2.7&#39;,
            &#39;Programming Language :: Python :: 3&#39;,
            &#39;Programming Language :: Python :: 3.3&#39;,
            &#39;Programming Language :: Python :: 3.4&#39;,
            &#39;Programming Language :: Python :: 3.5&#39;,
            &#39;Programming Language :: Python :: 3.6&#39;,
            &#39;Programming Language :: Python :: Implementation :: CPython&#39;,
            &#39;Programming Language :: Python :: Implementation :: PyPy&#39;
        ],
    
    )



pip install . creates only the dist-info not the package

I have a flattened dictionary which I want to make into a nested one, of the form

    flat = {&#39;X_a_one&#39;: 10,
            &#39;X_a_two&#39;: 20, 
            &#39;X_b_one&#39;: 10,
            &#39;X_b_two&#39;: 20, 
            &#39;Y_a_one&#39;: 10,
            &#39;Y_a_two&#39;: 20,
            &#39;Y_b_one&#39;: 10,
            &#39;Y_b_two&#39;: 20}

I want to convert it to the form

    nested = {&#39;X&#39;: {&#39;a&#39;: {&#39;one&#39;: 10,
                          &#39;two&#39;: 20}, 
                    &#39;b&#39;: {&#39;one&#39;: 10,
                          &#39;two&#39;: 20}}, 
              &#39;Y&#39;: {&#39;a&#39;: {&#39;one&#39;: 10,
                          &#39;two&#39;: 20},
                    &#39;b&#39;: {&#39;one&#39;: 10,
                          &#39;two&#39;: 20}}}

The structure of the flat dictionary is such that there should not be any problems with ambiguities. I want it to work for dictionaries of arbitrary depth, but performance is not really an issue. I&#39;ve seen lots of methods for flattening a nested dictionary, but basically none for nesting a flattened dictionary. The values stored in the dictionary are either scalars or strings, never iterables.

So far I have got something which can take the input 

    test_dict = {&#39;X_a_one&#39;: &#39;10&#39;,
                 &#39;X_b_one&#39;: &#39;10&#39;,
                 &#39;X_c_one&#39;: &#39;10&#39;}


to the output

    test_out = {&#39;X&#39;: {&#39;a_one&#39;: &#39;10&#39;, 
                      &#39;b_one&#39;: &#39;10&#39;, 
                      &#39;c_one&#39;: &#39;10&#39;}}

using the code

    def nest_once(inp_dict):
        out = {}
        if isinstance(inp_dict, dict):
            for key, val in inp_dict.items():
                if &#39;_&#39; in key:
                    head, tail = key.split(&#39;_&#39;, 1)

                    if head not in out.keys():
                        out[head] = {tail: val}
                    else:
                        out[head].update({tail: val})
                else:
                    out[key] = val
        return out
 
    test_out = nest_once(test_dict)

But I&#39;m having trouble working out how to make this into something which recursively creates all levels of the dictionary.

Any help would be appreciated!

(As for why I want to do this: I have a file whose structure is equivalent to a nested dict, and I want to store this file&#39;s contents in the attributes dictionary of a NetCDF file and retrieve it later. However NetCDF only allows you to put flat dictionaries as the attributes, so I want to unflatten the dictionary I previously stored in the NetCDF file.)

Creating a nested dictionary from a flattened dictionary

These are my two dataframes saved in two variables:

    &gt; print(df.head())
    &gt;
              club_name  tr_jan  tr_dec  year
        0  ADO Den Haag    1368    1422  2010
        1  ADO Den Haag    1455    1477  2011
        2  ADO Den Haag    1461    1443  2012
        3  ADO Den Haag    1437    1383  2013
        4  ADO Den Haag    1386    1422  2014
    &gt; print(rankingdf.head())
    &gt;
               club_name  ranking  year
        0    ADO Den Haag    12    2010
        1    ADO Den Haag    13    2011
        2    ADO Den Haag    11    2012
        3    ADO Den Haag    14    2013
        4    ADO Den Haag    17    2014

I&#39;m trying to merge these two using this code:

    new_df = df.merge(ranking_df, on=[&#39;club_name&#39;, &#39;year&#39;], how=&#39;left&#39;)

The how=&#39;left&#39; is added because I have less datapoints in my ranking_df than in my standard df.

The expected behaviour is as such:

    &gt; print(new_df.head()) 
    &gt; 

          club_name  tr_jan  tr_dec  year    ranking
    0  ADO Den Haag    1368    1422  2010    12
    1  ADO Den Haag    1455    1477  2011    13
    2  ADO Den Haag    1461    1443  2012    11
    3  ADO Den Haag    1437    1383  2013    14
    4  ADO Den Haag    1386    1422  2014    17

But I get this error:

&gt; ValueError: You are trying to merge on object and int64 columns. If
&gt; you wish to proceed you should use pd.concat

But I do not wish to use concat since I want to merge the trees not just add them on.

Another behaviour that&#39;s weird in my mind is that my code works if I save the first df to .csv and then load that .csv into a dataframe.

The code for that:

    df = pd.DataFrame(data_points, columns=[&#39;club_name&#39;, &#39;tr_jan&#39;, &#39;tr_dec&#39;, &#39;year&#39;])
    df.to_csv(&#39;preliminary.csv&#39;)

    df = pd.read_csv(&#39;preliminary.csv&#39;, index_col=0)
    
    ranking_df = pd.DataFrame(rankings, columns=[&#39;club_name&#39;, &#39;ranking&#39;, &#39;year&#39;])
    
    new_df = df.merge(ranking_df, on=[&#39;club_name&#39;, &#39;year&#39;], how=&#39;left&#39;)

I think that it has to do with the index_col=0 parameter. But I have no idea to fix it without having to save it, it doesn&#39;t matter much but is kind of an annoyance that I have to do that.

Trying to merge 2 dataframes but get ValueError

Is there a way to convert a Spark Df (not RDD) to pandas DF

I tried the following:

    var some_df = Seq(
     (&quot;A&quot;, &quot;no&quot;),
     (&quot;B&quot;, &quot;yes&quot;),
     (&quot;B&quot;, &quot;yes&quot;),
     (&quot;B&quot;, &quot;no&quot;)

     ).toDF(
    &quot;user_id&quot;, &quot;phone_number&quot;)

Code:
     
    %pyspark
    pandas_df = some_df.toPandas()

Error:

     NameError: name &#39;some_df&#39; is not defined

Any suggestions.



Convert a spark DataFrame to pandas DF

    import pandas as pd
    df = pd.read_csv(&#39;https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0&#39;)
    percent= 100*(len(df.loc[:,df.isnull().sum(axis=0)&gt;=1 ].index) / len(df.index))
    print(round(percent,2))

input is https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0

and the output should be

    Ord_id                 0.00
    Prod_id                0.00
    Ship_id                0.00
    Cust_id                0.00
    Sales                  0.24
    Discount               0.65
    Order_Quantity         0.65
    Profit                 0.65
    Shipping_Cost          0.65
    Product_Base_Margin    1.30
    dtype: float64

Find out the percentage of missing values in each column in the given dataset

How can I modify the size of the output image of the function `pandas.DataFrame.plot`?

I tried:

`plt.figure (figsize=(10,5))`

and

`%matplotlib notebook`

but none of them work.

How to increase image size of pandas.DataFrame.plot

I have a function, which returns a dictionary like this:

    {&#39;truth&#39;: 185.179993, &#39;day1&#39;: 197.22307753038834, &#39;day2&#39;: 197.26118010160317, &#39;day3&#39;: 197.19846975345905, &#39;day4&#39;: 197.1490578795196, &#39;day5&#39;: 197.37179265011116}

I am trying to append this dictionary to a dataframe like so:

    output = pd.DataFrame()
    output.append(dictionary, ignore_index=True)
    print(output.head())

Unfortunately, the printing of the dataframe results in an empty dataframe. Any ideas?

append dictionary to data frame

I am trying out Seaborn to make my plot visually better than matplotlib. I have a dataset which has a column &#39;Year&#39; which I want to plot on the X-axis and 4 Columns say A,B,C,D on the Y-axis using different coloured lines. I was trying to do this using the sns.lineplot method but it allows for only one variable on the X-axis and one on the Y-axis. I tried doing this

    sns.lineplot(data_preproc[&#39;Year&#39;],data_preproc[&#39;A&#39;], err_style=None)
    sns.lineplot(data_preproc[&#39;Year&#39;],data_preproc[&#39;B&#39;], err_style=None)
    sns.lineplot(data_preproc[&#39;Year&#39;],data_preproc[&#39;C&#39;], err_style=None)
    sns.lineplot(data_preproc[&#39;Year&#39;],data_preproc[&#39;D&#39;], err_style=None)
But this way I don&#39;t get a legend in the plot to show which coloured line corresponds to what. I tried checking the documentation but couldn&#39;t find a proper way to do this.

How do I create a multiline plot using seaborn?

I have a dataframe ```df``` as follows:

    | name  | coverage |
    |-------|----------|
    | Jason | 25.1     |
I want to convert it to a dictionary.
I used the following command in ```pandas``` :

    dict=df.to_dict()
The output of ```dict``` gave me the following:

    {&#39;coverage&#39;: {0: 25.1}, &#39;name&#39;: {0: &#39;Jason&#39;}} 
I do not want the ```0``` in my output. I believe this is captured because of the column index in my dataframe ```df```.
What can I do to eliminate ```0``` in my output
( I do not want index to be captured.) expected output :

    {&#39;coverage&#39;: 25.1, &#39;name&#39;: &#39;Jason&#39;} 



How to convert dataframe to dictionary in pandas WITHOUT index

I have the following DataFrame where one of the columns is an object (list type cell):

    df=pd.DataFrame({&#39;A&#39;:[1,2],&#39;B&#39;:[[1,2],[1,2]]})
    df
    Out[458]: 
       A       B
    0  1  [1, 2]
    1  2  [1, 2]

My expected output is: 

       A  B
    0  1  1
    1  1  2
    3  2  1
    4  2  2


What should I do to achieve this?

---

Related question 

https://stackoverflow.com/questions/27263805/pandas-when-cell-contents-are-lists-create-a-row-for-each-element-in-the-list

Good question and answer but only handle one column with list(In my answer the self-def function will work for multiple columns, also the accepted answer is use the most time consuming `apply` , which is not recommended, check more info https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code) 

Content Type	Original Author	Original Content on Stackoverflow
Question	SANM2009	View Question on Stackoverflow
Solution 1 - Python	cs95	View Answer on Stackoverflow
Solution 2 - Python	niraj	View Answer on Stackoverflow
Solution 3 - Python	Rohith Gunda	View Answer on Stackoverflow
Solution 4 - Python	Ayyasamy	View Answer on Stackoverflow
Solution 5 - Python	Harshal SG	View Answer on Stackoverflow

Pandas get frequency of item occurrences in a column as percentage

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

finding the percentage of target variation to chenck imbalance/not.

finding the maximum in the columns percentage here, to check how much #imbalance there

Solution 5 - Python

I am getting error in console "You need to enable JavaScript to run this app." reactjs

Failed to resolve: com.google.firebase:firebase-core:16.0.1

Attributions