If you only want to read the first 999,999 (non-header) rows:

    read_csv(..., nrows=999999)

If you only want to read rows 1,000,000 ... 1,999,999

    read_csv(..., skiprows=1000000, nrows=999999)

***nrows*** : int, default None Number of rows of file to read. Useful for
reading pieces of large files*

***skiprows*** : list-like or integer
Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file

and for large files, you&#39;ll probably also want to use chunksize:

***chunksize*** : int, default None
Return TextFileReader object for iteration


[pandas.io.parsers.read_csv documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)

I have a large data table.
There are 10 million records in this table.

What is the best way for this query

       Delete LargeTable where readTime &lt; dateadd(MONTH,-7,GETDATE())

How to delete large data of table in SQL without log?

What is the difference between up-casting and down-casting with respect to class variable?

For example in the following program class Animal contains only one method but Dog class              contains two methods, then how we cast the Dog variable to the Animal Variable.

If casting is done then how can we call the Dog&#39;s another method with Animal&#39;s variable.


    class Animal 
    { 
        public void callme()
        {
            System.out.println(&quot;In callme of Animal&quot;);
        }
    }
    
        
    class Dog extends Animal 
    { 
        public void callme()
        {
            System.out.println(&quot;In callme of Dog&quot;);
        }

        public void callme2()
        {
            System.out.println(&quot;In callme2 of Dog&quot;);
        }
    }
            
    public class UseAnimlas 
    {
        public static void main (String [] args) 
        {
            Dog d = new Dog();      
            Animal a = (Animal)d;
            d.callme();
            a.callme();
            ((Dog) a).callme2();
        }
    }

    

What is the difference between up-casting and down-casting with respect to class variable

I have a very large data set and I can&#39;t afford to read the entire data set in. So, I&#39;m thinking of reading only one chunk of it to train but I have no idea how to do it. Any thought will be appreciated.

Python Pandas: How to read only first n rows of CSV files in?

<p>I have a very large data set and I can't afford to read the entire data set in. So, I'm thinking of reading only one chunk of it to train but I have no idea how to do it. Any thought will be appreciated.</p>


I was playing around with timeit and noticed that doing a simple list comprehension over a small string took longer than doing the same operation on a list of small single character strings. Any explanation? It&#39;s almost 1.35 times as much time.

    &gt;&gt;&gt; from timeit import timeit
    &gt;&gt;&gt; timeit(&quot;[x for x in &#39;abc&#39;]&quot;)
    2.0691067844831528
    &gt;&gt;&gt; timeit(&quot;[x for x in [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;]]&quot;)
    1.5286479570345861

What&#39;s happening on a lower level that&#39;s causing this?

Why is it slower to iterate over a small string than a small list?

I&#39;m a C coder developing something in python. I know how to do the following in C (and hence in C-like logic applied to python), but I&#39;m wondering what the &#39;Python&#39; way of doing it is.

I have a dictionary d, and I&#39;d like to operate on a subset of the items, only those who&#39;s key (string) contains a specific substring.

i.e. the C logic would be:

    for key in d:
        if filter_string in key:
            # do something
        else
            # do nothing, continue

I&#39;m imagining the python version would be something like

    filtered_dict = crazy_python_syntax(d, substring)
    for key,value in filtered_dict.iteritems():
        # do something


I&#39;ve found a lot of posts on here regarding filtering dictionaries, but couldn&#39;t find one which involved exactly this.

My dictionary is not nested and i&#39;m using python 2.7



filter items in a python dictionary where keys contain a specific string

I am currently using python 2.7 and trying to open an Excel sheet.
When using the code below:
    
    import os
    from win32com.client import Dispatch
    
    xlApp = win32com.client.Dispatch(&quot;Excel.Application&quot;)
    xlApp.Visible = True
    # Open the file we want in Excel
    workbook = xlApp.Workbooks.Open(&#39;example.xls&#39;)
    
I get this error:
    
&gt;ImportError: No module named win32com.client

Is there any possibility of getting the error since I am using 64-bit Windows machine? 

ImportError: No module named win32com.client

I&#39;m new in this area so I have a question. Recently, I started working with Python and Django. I installed PyCharm Community edition as my IDE, but I&#39;m unable to create a Django project.

I looked for some tutorials, and there is an option to select &quot;project type&quot;, but in the latest version this option is missing. Can someone tell me how to do this?


How to set up a Django project in PyCharm

I am trying to upload image through admin page, but it keeps saying: 

    [Errno 13] Permission denied: &#39;/path/to/my/site/media/userfolder/2014/05/26&#39;

the folders  ``userfolder/2014/05/26`` are created dynamically while uploading. 

In Traceback, i found that the error is occuring during this command:

In /usr/lib64/python2.6/os.py Line 157. while calling 

    mkdir(name, mode) 

meaning, it cannot create any folder as it doesnot have the permission to do this

I have OpenSuse as OS in Server. In httpd.conf, i have this: 

    &lt;Directory /path/to/my/site/media&gt;
       Order allow,deny
       Allow from all
    &lt;/Directory&gt;

Do I have to chmod or chown something? 


OSError - Errno 13 Permission denied

This is my DataFrame that should be repeated for 5 times:

    &gt;&gt;&gt; x = pd.DataFrame({&#39;a&#39;:1,&#39;b&#39;:2}, index = range(1))
    &gt;&gt;&gt; x
       a  b
    0  1  2

I want to have the result like this:

    &gt;&gt;&gt; x.append(x).append(x).append(x)
       a  b
    0  1  2
    0  1  2
    0  1  2
    0  1  2

But there must be a smarter way than appending 4 times. Actually the DataFrame I’m working on should be repeated 50 times.

I haven&#39;t found anything practical, including those like `np.repeat` ---- it just doesn&#39;t work on a DataFrame.

Could anyone help?



How to repeat a Pandas DataFrame?

is there a way to conveniently merge two data frames side by side?

both two data frames have 30 rows, they have different number of columns, say, df1 has 20 columns and df2 has 40 columns.

how can i easily get a new data frame of 30 rows and 60 columns?


    df3 = pd.someSpecialMergeFunct(df1, df2)

or maybe there is some special parameter in append

    df3 = pd.append(df1, df2, left_index=False, right_index=false, how=&#39;left&#39;)


ps: if possible, i hope the replicated column names could be resolved automatically.

thanks!

How to merge two dataframes side-by-side?

I am trying to write a paper in IPython notebook, but encountered some issues with display format. Say I have following dataframe `df`, is there any way to format `var1` and `var2` into 2 digit decimals and `var3` into percentages.

	       var1	       var2         var3	
    id												
    0	 1.458315	 1.500092	-0.005709	
    1	 1.576704	 1.608445	-0.005122	 
    2	 1.629253	 1.652577	-0.004754	 
    3	 1.669331	 1.685456	-0.003525	
    4	 1.705139	 1.712096	-0.003134	
    5	 1.740447	 1.741961	-0.001223	
    6	 1.775980	 1.770801	-0.001723	 
    7	 1.812037	 1.799327	-0.002013	 
    8	 1.853130	 1.822982	-0.001396	 
    9	 1.943985	 1.868401	 0.005732

The numbers inside are not multiplied by 100, e.g.  -0.0057=-0.57%.

Format certain floating dataframe columns into percentage in pandas

If the dataframe looks like:

    Store,Dept,Date,Weekly_Sales,IsHoliday
    1,1,2010-02-05,24924.5,FALSE
    1,1,2010-02-12,46039.49,TRUE
    1,1,2010-02-19,41595.55,FALSE
    1,1,2010-02-26,19403.54,FALSE
    1,1,2010-03-05,21827.9,FALSE
    1,1,2010-03-12,21043.39,FALSE
    1,1,2010-03-19,22136.64,FALSE
    1,1,2010-03-26,26229.21,FALSE
    1,1,2010-04-02,57258.43,FALSE

And I wanna duplicate rows with `IsHoliday` equal to TRUE, I can do:

    is_hol = df[&#39;IsHoliday&#39;] == True
    df_try = df[is_hol]
    df=df.append(df_try*10)

But is there a better way to do this as I need to duplicate holiday rows  5 times, and I have to append 5 times if using the above way.

Python Pandas replicate rows in dataframe

With the nice indexing methods in Pandas I have no problems extracting data in various ways. On the other hand I am still confused about how to change data in an existing DataFrame. 

In the following code I have two DataFrames and my goal is to update values in a specific row in the first df from values of the second df. How can I achieve this?


    import pandas as pd
    df = pd.DataFrame({&#39;filename&#39; :  [&#39;test0.dat&#39;, &#39;test2.dat&#39;], 
                                      &#39;m&#39;: [12, 13], &#39;n&#39; : [None, None]})
    df2 = pd.DataFrame({&#39;filename&#39; :  &#39;test2.dat&#39;, &#39;n&#39;:16}, index=[0])

    # this overwrites the first row but we want to update the second
    # df.update(df2)

    # this does not update anything
    df.loc[df.filename == &#39;test2.dat&#39;].update(df2)

    print(df)

gives 
    
       filename   m     n
    0  test0.dat  12  None
    1  test2.dat  13  None
    
    [2 rows x 3 columns]
    
but how can I achieve this:

        filename   m     n
    0  test0.dat  12  None
    1  test2.dat  13  16
    
    [2 rows x 3 columns]

How to update values in a specific row in a Python Pandas DataFrame?

I would like to import product descriptions that need to be logically broken according by things like description, dimensions, finishes etc. How  can I insert a line break so that when I import the file they will show up?

Adding a newline character within a cell (CSV)

I have a CSV file with about 2000 records. 

Each record has a string, and a category to it:

```none
This is the first line,Line1
This is the second line,Line2
This is the third line,Line3
```

I need to read this file into a list that looks like this:

```python
data = [(&#39;This is the first line&#39;, &#39;Line1&#39;),
        (&#39;This is the second line&#39;, &#39;Line2&#39;),
        (&#39;This is the third line&#39;, &#39;Line3&#39;)]
```

How can import this CSV to the list I need using Python?

Python import csv to list

I&#39;m trying to output some data to a .csv file and it is outputting it to the file but it isn&#39;t separating the data into different columns and seems to be outputting the data incorrectly. 

        ofstream Morison_File (&quot;linear_wave_loading.csv&quot;);         //Opening file to print info to
        Morison_File &lt;&lt; &quot;Time Force(N/m)&quot; &lt;&lt; endl;          //Headings for file
        for (t = 0; t &lt;= 20; t++) {
          u = sin(omega * t);
          du = cos(omega * t); 
          F = (0.5 * rho * C_d * D * u * fabs(u)) + rho * Area * C_m * du; 
          
          cout &lt;&lt; &quot;t = &quot; &lt;&lt; t &lt;&lt; &quot;\t\tF = &quot; &lt;&lt; F &lt;&lt; endl;
          Morison_File &lt;&lt; t;                                 //Printing to file
          Morison_File &lt;&lt; F;

        }
     
         Morison_File.close();

Time and Force(N/m) are in columns A and B respectively but the t and F values are both printing the first row. 

What is the syntax to separate them to print t into column A and F into column B?

Writing .csv files from C++

I&#39;ve read something about a Python 2 limitation with respect to Pandas&#39; to_csv( ... etc ...).  Have I hit it? I&#39;m on Python 2.7.3

This turns out trash characters for ≥ and - when they appear in strings. Aside from that the export is perfect.

    df.to_csv(&quot;file.csv&quot;, encoding=&quot;utf-8&quot;) 

Is there any workaround?

df.head() is this:


    demography  Adults ≥49 yrs  Adults 18−49 yrs at high risk||  \
    state                                                           
    Alabama                 32.7                             38.6   
    Alaska                  31.2                             33.2   
    Arizona                 22.9                             38.8   
    Arkansas                31.2                             34.0   
    California              29.8                             38.8  

csv output is this

	state,	Adults &#226;‰&#165;49 yrs,	Adults 18&#226;ˆ’49 yrs at high risk||
    0,	Alabama,	32.7,	38.6
    1,	Alaska,	31.2,	33.2
    2,	Arizona,	22.9,	38.8
    3,	Arkansas,31.2,  34
    4,	California,29.8, 38.8


the whole code is this:  

    import pandas
    import xlrd
    import csv
    import json

    df = pandas.DataFrame()
    dy = pandas.DataFrame()
    # first merge all this xls together


    workbook = xlrd.open_workbook(&#39;csv_merger/vaccoverage.xls&#39;)
    worksheets = workbook.sheet_names()


    for i in range(3,len(worksheets)):
	    dy = pandas.io.excel.read_excel(workbook, i, engine=&#39;xlrd&#39;, index=None)
	    i = i+1
	    df = df.append(dy)
    
    df.index.name = &quot;index&quot;

    df.columns = [&#39;demography&#39;, &#39;area&#39;,&#39;state&#39;, &#39;month&#39;, &#39;rate&#39;, &#39;moe&#39;]

    #Then just grab month = &#39;May&#39;

    may_mask = df[&#39;month&#39;] == &quot;May&quot;
    may_df = (df[may_mask])

    #then delete some columns we dont need

    may_df = may_df.drop(&#39;area&#39;, 1)
    may_df = may_df.drop(&#39;month&#39;, 1)
    may_df = may_df.drop(&#39;moe&#39;, 1)


    print may_df.dtypes #uh oh, it sees &#39;rate&#39; as type &#39;object&#39;, not &#39;float&#39;.  Better change that.

    may_df = may_df.convert_objects(&#39;rate&#39;, convert_numeric=True)

    print may_df.dtypes #that&#39;s better

    res = may_df.pivot_table(&#39;rate&#39;, &#39;state&#39;, &#39;demography&#39;)
    print res.head()


    #and this is going to spit out an array of Objects, each Object a state containing its demographics
    res.reset_index().to_json(&quot;thejson.json&quot;, orient=&#39;records&#39;)
    #and a .csv for good measure
    res.reset_index().to_csv(&quot;thecsv.csv&quot;, orient=&#39;records&#39;, encoding=&quot;utf-8&quot;)

Pandas df.to_csv(&quot;file.csv&quot; encode=&quot;utf-8&quot;) still gives trash characters for minus sign

I am trying to read a large csv file (aprox. 6 GB) in pandas and i am getting a memory error:

    MemoryError                               Traceback (most recent call last)
    &lt;ipython-input-58-67a72687871b&gt; in &lt;module&gt;()
    ----&gt; 1 data=pd.read_csv(&#39;aphro.csv&#39;,sep=&#39;;&#39;)

    ...
    
    MemoryError: 

Any help on this?



How do I read a large csv file with pandas?

I need to read and write data to/from a text file, but I haven&#39;t been able to figure out how.

I found this sample code in the Swift&#39;s iBook, but I still don&#39;t know how to write or read data.

    import Cocoa
    
    class DataImporter
    {
    	/*
    	DataImporter is a class to import data from an external file.
    	The class is assumed to take a non-trivial amount of time to initialize.
    	*/
    	var fileName = &quot;data.txt&quot;
    	// the DataImporter class would provide data importing functionality here
    }
    
    class DataManager
    {
    	@lazy var importer = DataImporter()
    	var data = String[]()
    	// the DataManager class would provide data management functionality here
    }
    
    let manager = DataManager()
    manager.data += &quot;Some data&quot;
    manager.data += &quot;Some more data&quot;
    // the DataImporter instance for the importer property has not yet been created”
    
    println(manager.importer.fileName)
    // the DataImporter instance for the importer property has now been created
    // prints &quot;data.txt”
    
    
    
    var str = &quot;Hello World in Swift Language.&quot;

Read and write a String from text file

I want to have a place to store my image files to use in my Java project (a really simple class that just loads an image onto a panel). I have looked everywhere and cannot find how to do this. How do I do this?

I have tried adding a new folder to the project, adding a new class folder to the project, and adding a new source folder to the project. No matter what I do, I always get a `IOException`. The folders always say they are on the build path, so I&#39;m not sure what to do.

    import java.awt.Color;
    import java.awt.Dimension;
    import java.awt.Graphics;
    import java.awt.image.BufferedImage;
    import java.io.File;
    import java.io.IOException;
    import javax.imageio.ImageIO;
    import javax.swing.JFrame;
    import javax.swing.JPanel;

    public class PracticeFrame extends JFrame{
	
	private static BufferedImage image;
	Thread thread;
	
	public PracticeFrame() {
		super();
		setPreferredSize(new Dimension(640,480));
		setResizable(false);
		setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
		pack();
		setVisible(true);
	}
	
	public static void main (String[] args) {
		PracticeFrame pframe = new PracticeFrame();
		try {
			image = ImageIO.read(new File(&quot;/islands.png&quot;));
		} catch (IOException e) {
			e.printStackTrace();
		}
		
		JPanel panel = new JPanel() {
			@Override
			protected void  paintComponent(Graphics g) {
				super.paintComponent(g);
				g.drawImage(image,0,0,null);
			}
		};
		
		panel.setBackground(Color.BLUE);
		panel.repaint();
		pframe.add(panel);
		
		
	}
	
	
    }

EDIT: Something that worked for me, and I have no idea why, was adding the `main/res/` folder as a class folder and then removing it.  I ran it while the `/main/res/` was part of the build path as a class folder and it still didn&#39;t work. When i added it, i got a popup that told me something about excluded filters. But when i removed the folder from the libraries in the build path, and changed my file path to:

    image = ImageIO.read(new File(&quot;src/main/res/islands.png&quot;));

I at least stopped getting the `IOException` thrown. I must not be adding the image to the panel correctly, because it&#39;s not showing up, but at least it found the file (I think).

How do I add a resources folder to my Java project in Eclipse

I am writing a piece of code:

    OutputStream outputStream = new FileOutputStream(createdFile);
    GZIPOutputStream gzipOutputStream = new GZIPOutputStream(outputStream);
    BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(gzipOutputStream));

Do I need to close every stream or writer like the following?

    gzipOutputStream.close();
    bw.close();
    outputStream.close();

Or will just closing the last stream be fine?

    bw.close();


Is it necessary to close each nested OutputStream and Writer separately?

For some reason I keep getting `java.nio.file.AccessDeniedException` every time I try to write to a folder on my computer using a java webapp on Tomcat. This folder has permissions set to full control for everyone on my computer (Windows). Does anybody know why I get this exception?


Here&#39;s my code:

&lt;!-- language: lang-js --&gt;

    public void saveDocument(String name, String siteID, byte doc[]) {
        try {
            Path path = Paths.get(rootDirectory + siteID);
            if (Files.exists(path)) {
                System.out.println(&quot;Exists: &quot; + path.toString());
                Files.write(path, doc);
            } else {
                System.out.println(&quot;DOesn&#39;t exist&quot;);
                throw new Exception(&quot;Directory for Site with ID &quot; + siteID + &quot;doesn&#39;t exist&quot;);
            }
        } catch (FileSystemException e) {
            System.out.println(&quot;Exception: &quot; + e);
            e.printStackTrace();
        } catch (IOException e ) {
            System.out.println(&quot;Exception: &quot; + e);
            e.printStackTrace();
        } catch (Exception e) {
            System.out.println(&quot;Exception: &quot; + e);
            e.printStackTrace();
        }


And here is the error:

&gt; Exception: java.nio.file.AccessDeniedException: C:\safesite_documents\site1
    java.nio.file.AccessDeniedException: C:\safesite_documents\site1
	at         sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83)
	at     sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
	at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(WindowsFileSystemProvider.java:230)
	at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:430)
	at java.nio.file.Files.newOutputStream(Files.java:172)
	at java.nio.file.Files.write(Files.java:3092)


Possible reason why: [See my post on supersuser about how I can&#39;t uncheck &#39;Read Only&#39; for any of my folders on windows 7. Even though all the folders aren&#39;t read only to anything but java.][1]


  [1]: https://superuser.com/questions/881348/cant-unset-read-only-only-applies-to-files-in-folder-in-windows-7

Getting &quot;java.nio.file.AccessDeniedException&quot; when trying to write to a folder

I&#39;m using [Requests][1] to upload a PDF to an API. It is stored as &quot;response&quot; below. I&#39;m trying to write that out to Excel.

    import requests

    files = {&#39;f&#39;: (&#39;1.pdf&#39;, open(&#39;1.pdf&#39;, &#39;rb&#39;))}
    response = requests.post(&quot;https://pdftables.com/api?&amp;format=xlsx-single&quot;,files=files)
    response.raise_for_status() # ensure we notice bad responses
    file = open(&quot;out.xls&quot;, &quot;w&quot;)
    file.write(response)
    file.close()

I&#39;m getting the error:

    file.write(response)
    TypeError: expected a character buffer object


  [1]: http://docs.python-requests.org/en/latest/api/

Content Type	Original Author	Original Content on Stackoverflow
Question	bensw	View Question on Stackoverflow
Solution 1 - Python	smci	View Answer on Stackoverflow

Python Pandas: How to read only first n rows of CSV files in?

Python Problem Overview

Python Solutions

Solution 1 - Python

What is the difference between up-casting and down-casting with respect to class variable

How to delete large data of table in SQL without log?

Attributions