Notes on how to use the find command to find and replace text.
Index
- Delete everything up to the first occurence of a regexp
- Keep everything between <START> and <END>
- Find and replace over multiple lines with sed
- Find and replace text in multiple files
Delete everything up to the first occurence of a regexp
find . -type f -name "*.html" | sed -e 's/.*/"&"/' | xargs sed -n '/<div id="title"/,$p' -i.bak
The magic is in the last sed command (from http://www.linuxquestions.org/questions/linux-newbie-8/how-to-use-sed-to-delete-all-lines-before-the-first-match-of-a-pattern-802069/):
sed -n '/sweet/,$p' filename
-n don’t print lines by default
/sweet/,$ is a restriction for the following command p, meaning ‘only look from the first occurence of ‘sweet’ to ‘$’ (the end of the file)
p print
Give filename containing:
whatever
whatever else
sweet and some
other stuff
You’d end up with:
sweet and some
other stuff
Keep everything between <START> and <END>
I.e. delete everything outside <START> and <END>:
find . -type f -name "computing.1.html" | sed -e 's/.*/"&"/' | xargs sed '/<START>/,/<END>/ !d' -i.bak
Find and replace over multiple lines with sed
Source: http://austinmatzko.com/2008/04/26/sed-multi-line-search-and-replace/
Replace <OLD> with <NEW>:
sed -n '1h;1!H;${;g;s/<OLD>/<NEW>/g;p;}'
-n supresses automatic printing of pattern space.
1h loads first line into hold space.
1!H loads other lines into hold space.
g copies hold space to pattern space.
s/
Find and replace text in multiple files
As always, back up all files before doing this. In the regexp, remember to escape the necessary characters, e.g. slashes.
Use the find command to list all files matching the criteria and pipe through xargs to perl to find and replace the text with a regular expression (regexp) match. This example will replace oldtext with newtext in all .html files:
find . -name '*.html' | xargs perl -pi -e 's/oldtext/newtext/g'
Dealing with spaces in the filename
The following examples are more complex, but they do handle spaces in the filename (the sed line adds double quotes around the filenames). Files that have their text replaced are backed up with a .bak extension.
Find .html files that contain myDiv412 and delete the matching line (whole line match):
find ./ -name "*.*html" -exec grep myDiv412 -l {} \; | sed -e 's/.*/"&"/' | xargs sed -e '/^the line to match*/d' -i.bak
Find .html files that contain myDiv412 and delete the matching line (partial line match):
find ./ -name "*.*html" -exec grep myDiv412 -l {} \; | sed -e 's/.*/"&"/' | xargs sed -e '/^.*part of the line to match.*/d' -i.bak
Find .html files that contain myDiv412 and replace all instances of foo with bar. Note: Must be tested.
find ./ -name "*.*html" -exec grep myDiv412 -l {} \; | sed -e 's/.*/"&"/' | xargs sed -e 's/foo/bar/g' -i.bak