Re: how to text edit from html format page

From: Adam Price (adam+usenet_at_pappnase.co.uk)
Date: 07/30/03


Date: Wed, 30 Jul 2003 05:36:15 +0100

In news:ae45506e.0307291230.72a825b4@posting.google.com,
steverourke <steve750813@yahoo.com> typed:
> is there anyone who knows how to text-edit on html page using
> tools such as sed, awk, or perl?
> for example, in finance.yahoo.com, i find out financial statement,
> then, i want to substract only numbers that i want to put into my
> database.
>
> based upon your expertise, is there any way to do such things by
> using text-editing tools with regular expression? it would be
> great to have respone on how to do that, but i would be stil good
> to learn just ways.
> thank you

Easiest way is to use a 'text mode' browser such as links or lynx
to process the html into a plain text format and work from there.
HTH
Adam



Relevant Pages

  • Re: how to text edit from html format page
    ... > is there anyone who knows how to text-edit on html page using tools ... Then you can retrieve the lines you want with grep, ... awk or sed or cut to extract the information you want. ...
    (comp.unix.shell)
  • how to text edit from html format page
    ... is there anyone who knows how to text-edit on html page using tools ... such as sed, awk, or perl? ... for example, in finance.yahoo.com, i find out financial statement, ...
    (comp.unix.shell)
  • Re: Walking a tree and extracting info... Problems
    ... Learn to use the Perl debugger and to use the ... foreach $file (@thefiles) { ... push @lines, $_; # push the data line onto the array ... Perl has allocated "@lines" once for the whole program; when you process the next file in the directory you push the lines on the bottom; the match for the HTML title then fires every time. ...
    (comp.lang.perl.misc)
  • Re: HTTP Filtering and Threads...
    ... You are trying to parse HTML with regular expressions. ... This is not Perl. ... # Some irrelevant code stuff... ... foreach $userID { ...
    (comp.lang.perl.misc)
  • Re: GUI Awk
    ... here is the core of an awk script for gawki ... Contains server parameters, like DocumentRoot etc. ... All .html pages should conform w3c XHTML 1.0 Strict ... if (!DirectoryIndex) DirectoryIndex = "index.html" ...
    (comp.lang.awk)