Re: Sed: removing XML headers
- From: bruce_phipps@xxxxxxxxxxx
- Date: 29 Mar 2007 02:31:39 -0700
On 28 Mar, 19:08, Janis Papanagnou <Janis_Papanag...@xxxxxxxxxxx>
wrote:
bruce_phi...@xxxxxxxxxxx wrote:
I am trying to concatenate several XML files (test01.xml, test02.xml,
test03.xml) into a single XML file.
cat test*.xml > out.xml
concatenates the files into one big file.
But the resulting XML file is invalid due to having several XML header
and DOCTYPE tags within the document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://
www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
If it is always the first two lines that you have to skip you may use
awk 'FNR==NR||FNR>2' test*.xml
If you you want to match against the specific patterns (assuming that
the xml header patterns don't span across many lines)
awk 'FNR==NR||!(/<!DOCTYPE/||/<\?xml version/)' test*.xml
The FNR==NR part assures that one (the first) header remains included.
Janis
How can I use sed to remove the XML headers within the output file?
Why sed?
Thanks for all the replies.
The problem seems to be that the XML is not line-based. It all wraps
into one continuous stream.
I think sed is line-based.
So maybe I should consider other alternatives...
Bruce
.
- Follow-Ups:
- Re: Sed: removing XML headers
- From: Janis
- Re: Sed: removing XML headers
- References:
- Sed: removing XML headers
- From: bruce_phipps
- Re: Sed: removing XML headers
- From: Janis Papanagnou
- Sed: removing XML headers
- Prev by Date: New to Shell scripting...
- Next by Date: Find and list all files containing <string>
- Previous by thread: Re: Sed: removing XML headers
- Next by thread: Re: Sed: removing XML headers
- Index(es):