Re: Cut (change in question)



On Nov 30, 9:57 am, Ed Morton <mor...@xxxxxxxxxxxxxx> wrote:
On 11/29/2007 10:44 PM, sant...@xxxxxxxxx wrote:



On Nov 29, 8:49 am, Ed Morton <mor...@xxxxxxxxxxxxxx> wrote:

On 11/28/2007 12:23 AM, sant...@xxxxxxxxx wrote:

Sorry I have posted this before but I have a slight change in the
question.

I have an html file.The entire script is in one line only. The
following is the script.

<table><tbody><tr><td class="r">Chapter
1: ................................................. </td></tr></
tbody></table><p Hare Krishna ......................</p> <p Hare
Rama ......................</p>

where .............. is a variable text

In the above script I want to delete the text

<table><tbody><tr><td class="r">Chapter
1: ................................................. </td></tr></
tbody></table>

where ........ represents variable content.

I have 100 files with names 1.htm to 100.htm

How can i do this using unix commands rather than selecting the text
and deleting.

Depending on whether not "<table>" or "</table>" can occur multiple times on a
line, this may be all you need:

for file in *.htm
do
sed 's:<table>.*</table>::' "$file" > tmp &&
mv tmp "$file"
done

Regards,

Ed.

"<table>" or "</table>" can occur multiple times on a line then what
can be done

Use all of the unique text you mentioned and replace the chain of periods with ".*":

sed 's:<table><tbody><tr><td class="r">Chapter 1:.*</td></tr></tbody></table>::'

If the text on either side of the ".*" can appear elsewhere on the same line,
then it's a harder problem that needs a different approach.

Ed.

Thank you.
.