Find and replace text in a 47GB large file

Command LineReplace

Command Line Problem Overview


I have to do some find and replace tasks on a rather big file , about 47 GB in size .

Does anybody know how to do this ? I tried using services like TextCrawler , EditpadLite and more but nothing supports this large a file .

I'm assuming this can be done via the commandline .

Do you have an idea how this can be accomplished ?

Command Line Solutions


Solution 1 - Command Line

Sed (stream editor for filtering and transforming text) is your friend.

sed -i 's/old text/new text/g' file

Sed performs text transformations in a single pass.

Solution 2 - Command Line

I use FART - Find And Replace Text by Lionello Lunesu.

It works very well on Windows Seven x64.

You can find and replace the text using this command:

fart -c big_filename.txt "find_this_text" "replace_to_this"

github

Solution 3 - Command Line

On Unix or Mac:

sed 's/oldstring/newstring/g' oldfile.txt > newfile.txt

fast and easy...

Solution 4 - Command Line

I solved the problem usig, before, split to reduce the large file in smalls with 100 MB each.

Solution 5 - Command Line

If you are using a Unix like system then you can use cat | sed to do this

cat hosted_domains.txt | sed s/com/net/g

Example replaces com with net in a list of domain names and then you can pipe the output to a file.

Solution 6 - Command Line

For me none of the tools suggested here work well. Textcrawler ate all my computer's memory, SED didn't work at all, Editpad complained about memory...

The solution is: create your own script in python, perl or even C++.

Or use the tool PowerGrep, this is the easiest and fastest option.

I have't tried fart, it's only command line and maybe not very friendly.
Some hex editor, such as Ultraedit also work well.

Solution 7 - Command Line

I used

sed 's/[nN]//g' oldfile.fasta > newfile.fasta

to replace all the instances of n's in my 7Gb file.

If I omitted the > newfile.fasta aspect it took ages as it scrolled up the screen showing me every line of the file.

With the > newfile it ran it in a matter of seconds on an ubuntu server

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionShrayasView Question on Stackoverflow
Solution 1 - Command LineRyanView Answer on Stackoverflow
Solution 2 - Command LineJorgeKlemmView Answer on Stackoverflow
Solution 3 - Command LineIgnacio CarvajalView Answer on Stackoverflow
Solution 4 - Command LineAntonio Vandré P F GomesView Answer on Stackoverflow
Solution 5 - Command LineDevrajView Answer on Stackoverflow
Solution 6 - Command LineskanView Answer on Stackoverflow
Solution 7 - Command LineJulianView Answer on Stackoverflow