Re: Extracting html links from text
- From: pk <pk@xxxxxxxxxx>
- Date: Tue, 24 Feb 2009 22:24:58 +0100
On Tuesday 24 February 2009 21:55, admin@xxxxxxxxxx wrote:
<span class="Apple-style-span" style="font-size:small;">: </span></
span></span></span><span style="font-style: itali
c;"><span style="font-weight: bold;"><a href="http://www.4shared.com/
file/66558677/7c74f732
/N.html" target="_self"><span class="Apple-style-span" style="font-
family:verdana;"><span class="Apple-style-span" style="font-
size:small;">DOWNLOAD</span></span></a><span class="Apple-style-span"
style="font-family:verdana;"><span class="Apple-style-span"
style="font-size:
small;"><br /><br />4)A </span></span></span></span><span style="font-
style: italic;"><span style="font-weight: bold;"><span class="Apple-
style-span" style="font-family:verdana;"><span class="Apple-style-
span" style="font-size:small;">English<br />Movie</span></span></
span></span><span class="Apple-style-span" style="font-
family:verdana;"><span class="Apple-style-span" style="font-size:
small;">: </span></span><span style="font-style: italic;"><span
style="font-weight: bold;"><span class="Apple-style-span" style="font-
family:verdana;"><span class="Apple-style-span" style="font-
size:small;">B<br />English: </span></span><a href="http://www.
4shared.com/file/28821701
/7c575b14/B.html" target="_self"><span class="Apple-style-span"
style="font-family:verdana;"><span class="Apple-style-span"
style="font-size:small;">DOWNLOAD</span></span></a><span class="Apple-
style-span" style="font-family:verdana;"><span class="Apple-style-
span" style="font-size:small;"><br /><br />5)E<br />Movie : P<br
/>English : </span></span><a href="http://www.4shared.com/
file/67065459/e90e54d3/E.html" target="_self"><span class="Apple-style-
span" style="font-family:verdana;"><span class="Apple-style-span"
style="font-size:small;">DOWNLOAD</span></span>
I need to extract all html links that start with http and end with
html ... lines could span multiple rows .. the expected output is:
http://www.4shared.com/file/66558677/7c74f732/N.htmlhttp://www.4share...
Thanx
-Ad
This GUN awk one-liner will work on your sample input:
gawk -F\" -v RS="<a href=\"" 'NR>1{gsub(/[[:space:]]/,""); print $1}'
file
but if the input is anything more complicated, take a look at XMLawk.
Ed.- Hide quoted text -
- Show quoted text -
We don't have gawk installed ... anything that works with the regular
awk, perl, sed etc would be fine ....
You can exploit Perl's non-greedy quantifiers with its ability to "pull out"
matching subexpressions:
perl -n0e '@m = m|<a href="(http://.*?\.html)"|gs;print $_."\n" for(@m)'
.
- Follow-Ups:
- Re: Extracting html links from text
- From: admin
- Re: Extracting html links from text
- References:
- Extracting html links from text
- From: admin
- Re: Extracting html links from text
- From: Ed Morton
- Re: Extracting html links from text
- From: admin
- Extracting html links from text
- Prev by Date: Re: find directory with display subdirectory
- Next by Date: Re: Turn lines of text into one, continuous line
- Previous by thread: Re: Extracting html links from text
- Next by thread: Re: Extracting html links from text
- Index(es):
Relevant Pages
|