Remove Top Line of Text File with PowerShell

Powershell

Powershell Problem Overview


I am trying to just remove the first line of about 5000 text files before importing them.

I am still very new to PowerShell so not sure what to search for or how to approach this. My current concept using pseudo-code:

set-content file (get-content unless line contains amount)

However, I can't seem to figure out how to do something like contains.

Powershell Solutions


Solution 1 - Powershell

While I really admire the answer from @hoge both for a very concise technique and a wrapper function to generalize it and I encourage upvotes for it, I am compelled to comment on the other two answers that use temp files (it gnaws at me like fingernails on a chalkboard!).

Assuming the file is not huge, you can force the pipeline to operate in discrete sections--thereby obviating the need for a temp file--with judicious use of parentheses:

(Get-Content $file | Select-Object -Skip 1) | Set-Content $file

... or in short form:

(gc $file | select -Skip 1) | sc $file

Solution 2 - Powershell

It is not the most efficient in the world, but this should work:

get-content $file |
    select -Skip 1 |
    set-content "$file-temp"
move "$file-temp" $file -Force

Solution 3 - Powershell

Using variable notation, you can do it without a temporary file:

${C:\file.txt} = ${C:\file.txt} | select -skip 1

function Remove-Topline ( [string[]]$path, [int]$skip=1 ) {
  if ( -not (Test-Path $path -PathType Leaf) ) {
    throw "invalid filename"
  }

  ls $path |
    % { iex "`${$($_.fullname)} = `${$($_.fullname)} | select -skip $skip" }
}

Solution 4 - Powershell

I just had to do the same task, and gc | select ... | sc took over 4 GB of RAM on my machine while reading a 1.6 GB file. It didn't finish for at least 20 minutes after reading the whole file in (as reported by Read Bytes in Process Explorer), at which point I had to kill it.

My solution was to use a more .NET approach: StreamReader + StreamWriter. See this answer for a great answer discussing the perf: https://stackoverflow.com/questions/2159763/in-powershell-whats-the-most-efficient-way-to-split-a-large-text-file-by-recor

Below is my solution. Yes, it uses a temporary file, but in my case, it didn't matter (it was a freaking huge SQL table creation and insert statements file):

PS> (measure-command{
    $i = 0
    $ins = New-Object System.IO.StreamReader "in/file/pa.th"
    $outs = New-Object System.IO.StreamWriter "out/file/pa.th"
    while( !$ins.EndOfStream ) {
        $line = $ins.ReadLine();
        if( $i -ne 0 ) {
            $outs.WriteLine($line);
        }
        $i = $i+1;
    }
    $outs.Close();
    $ins.Close();
}).TotalSeconds

It returned:

188.1224443

Solution 5 - Powershell

Inspired by AASoft's answer, I went out to improve it a bit more:

  1. Avoid the loop variable $i and the comparison with 0 in every loop
  2. Wrap the execution into a try..finally block to always close the files in use
  3. Make the solution work for an arbitrary number of lines to remove from the beginning of the file
  4. Use a variable $p to reference the current directory

These changes lead to the following code:

$p = (Get-Location).Path

(Measure-Command {
    # Number of lines to skip
    $skip = 1
    $ins = New-Object System.IO.StreamReader ($p + "\test.log")
    $outs = New-Object System.IO.StreamWriter ($p + "\test-1.log")
    try {
        # Skip the first N lines, but allow for fewer than N, as well
        for( $s = 1; $s -le $skip -and !$ins.EndOfStream; $s++ ) {
            $ins.ReadLine()
        }
        while( !$ins.EndOfStream ) {
            $outs.WriteLine( $ins.ReadLine() )
        }
    }
    finally {
        $outs.Close()
        $ins.Close()
    }
}).TotalSeconds

The first change brought the processing time for my 60 MB file down from 5.3s to 4s. The rest of the changes is more cosmetic.

Solution 6 - Powershell

$x = get-content $file
$x[1..$x.count] | set-content $file

Just that much. Long boring explanation follows. Get-content returns an array. We can "index into" array variables, as demonstrated in this and other Scripting Guys posts.

For example, if we define an array variable like this,

$array = @("first item","second item","third item")

so $array returns

first item
second item
third item

then we can "index into" that array to retrieve only its 1st element

$array[0]

or only its 2nd

$array[1]

or a range of index values from the 2nd through the last.

$array[1..$array.count]

Solution 7 - Powershell

I just learned from a website:

Get-ChildItem *.txt | ForEach-Object { (get-Content $_) | Where-Object {(1) -notcontains $_.ReadCount } | Set-Content -path $_ }

Or you can use the aliases to make it short, like:

gci *.txt | % { (gc $_) | ? { (1) -notcontains $_.ReadCount } | sc -path $_ }

Solution 8 - Powershell

Another approach to remove the first line from file, using multiple assignment technique. Refer Link

 $firstLine, $restOfDocument = Get-Content -Path $filename 
 $modifiedContent = $restOfDocument 
 $modifiedContent | Out-String | Set-Content $filename

Solution 9 - Powershell

skip` didn't work, so my workaround is

$LinesCount = $(get-content $file).Count
get-content $file |
    select -Last $($LinesCount-1) | 
    set-content "$file-temp"
move "$file-temp" $file -Force

Solution 10 - Powershell

For smaller files you could use this:

& C:\windows\system32\more +1 oldfile.csv > newfile.csv | out-null

... but it's not very effective at processing my example file of 16MB. It doesn't seem to terminate and release the lock on newfile.csv.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBuddy LindseyView Question on Stackoverflow
Solution 1 - PowershellMichael SorensView Answer on Stackoverflow
Solution 2 - PowershellRichard BergView Answer on Stackoverflow
Solution 3 - PowershellhogeView Answer on Stackoverflow
Solution 4 - PowershellAASoftView Answer on Stackoverflow
Solution 5 - PowershellOliverView Answer on Stackoverflow
Solution 6 - PowershellnoamView Answer on Stackoverflow
Solution 7 - PowershellLuke DuView Answer on Stackoverflow
Solution 8 - PowershellVenkataraman RView Answer on Stackoverflow
Solution 9 - PowershellMariusz BiesiekierskiView Answer on Stackoverflow
Solution 10 - PowershelldanielmbarlowView Answer on Stackoverflow