Re: How to wget download all PDF files larger than 100 Kbytes

From: Alan Connor (zzzzzz_at_xxx.yyy)
Date: 05/06/04

  • Next message: William Park: "Re: How to wget download all PDF files larger than 100 Kbytes"
    Date: Thu, 06 May 2004 20:17:43 GMT
    
    

    On 6 May 2004 12:34:47 -0700, Orak Listalavostok <oraklistal@yahoo.com> wrote:
    >
    >
    > How do I get GNU web get (wget) to download all the PDFs
    > (potentially thousands) on a stated web page but ignore
    > any PDF smaller than than a given size?
    >
    > I read the fine manual (wget -help), soon arriving with:
    > % wget -prA.pdf http://foo.bar.com
    >
    > Which means (roughly): Copy all the PDF files (A.pdf) from the
    > specified web page (p), recursively (r) to the default 5 levels.
    >
    > But how do I eliminate the copying of files smaller than
    > a certain size; that is, how do I tell wget to ignore PDF
    > files of (say) 100 Kbytes or smaller?
    >
    > Orak

    Doesn't seem to be anything wget can do unaided. If you were to download the
    webpage with the pdf links, extract them into a file, you can do this:

    $ wget --spider http://home.earthlink.net/~alanconnor/elrav1/er1.tar.gz
    --13:06:51-- http://home.earthlink.net/%7Ealanconnor/elrav1/er1.tar.gz
               => `er1.tar.gz'
    Resolving home.earthlink.net... done.
    Connecting to home.earthlink.net[207.217.98.29]:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 21,736 [application/x-tar]
    200 OK

    Notice the "Length: ..." header? Feed the list to wget with the -i file option,
    parse out the URLS with the size you want and feed THAT list to wget.

    comp.unix.shell for help writing the script you'll need.

    Not really hard with sed and/or awk.

    Perhaps there is another web-tool that will do the job, but I'm not aware of it.

    AC

    -- 
    Pass-List -----> Block-List ----> Challenge-Response
    The key to taking control of your mailbox.  Design Parameters:
    http://tinyurl.com/2t5kp ||   http://tinyurl.com/3c3ag
    Challenge-Response links -- http://tinyurl.com/yrfjb
    

  • Next message: William Park: "Re: How to wget download all PDF files larger than 100 Kbytes"

    Relevant Pages