Re: processing a large textfile
- From: Janis <janis_papanagnou@xxxxxxxxxxx>
- Date: Wed, 04 Jul 2007 00:52:10 -0700
On 4 Jul., 04:29, "Dave" <dmehle...@xxxxxxxxxx> wrote:
Hello,
Thanks for your reply. The below is giving me context errors when i put
it in.
Fine. Then apply the Context Specific Problem Solution Process [TM].
Seriously, you should be more concrete if you have any problems and
post at least context information about any error you may have got.
And please stop top-posting.
Currently i don't need the regex part and sorting i want to sort the
domains alphabetically not the ip's.
Then apply the sort command to the file with domain information.
If it wan't apparent, yourfile.ips, was just an arbitrarily choosen
name which you should change to take the name of your files and
sortedfile.ips was choosen with a different name so that you won't
accidentally overwrite any original file (if you intend to apply that
also in other contexts).
Janis
Thanks.
Dave.
"Janis Papanagnou" <Janis_Papanag...@xxxxxxxxxxx> wrote in message
news:f6e5o0$qjj$1@xxxxxxxxxxxx
Dave wrote:
Hello,
Thanks for your reply. I've solved one issue i raised. There were
lines starting with equals signs, i used grep '^=' to find them, then
went in to vi, and manually removed all other lines, and resaved that new
file under a different name. There was probably a faster way of doing it
but there were only 7 lines i needed so it wasn't difficult.
Now my file now has either a domain name or host.domain name or an ip
address i need to split those. Some sample output, this is not what is
actually in the file, but it should work for ilustrative purposes:
example.com
192.168.3.32
host.example.com
example2.com
example.com
222.234.333.2
host2.example.com
etc. The domains are not ordered at all, i want to order them
alphabetically and get any lines with ip addresses out of that file and
place them in another file.
Thanks.
Dave.
The following should separate the three components into three files...
awk '
/^=/ { print > FILENAME ".regex" ; next }
/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/ { print > FILENAME ".ips" ; next }
{ print > FILENAME ".domains" }
'
You can of course also use a couple of grep calls, instead of the one
awk call. To sort and remove duplicates use, e.g.
sort -u yourfile.ips >sortedfile.ips
Janis
"Janis Papanagnou" <Janis_Papanag...@xxxxxxxxxxx> wrote in message
news:f6e1vs$l3v$1@xxxxxxxxxxxx
Dave wrote:
Hello,
I've got a rather large like 15000+ lines, text file that i need to
process. This is a squid blacklist of urls, ip's and domain lists. I
need to split it out in to several files, one for url rejections lines
beginning with equals '=' the next type ip addresses, such as
192.168.0.0 though they might not all start with one, and the third
edit domain lists, i want to sort them alphabetically and eliminate any
duplicate entries, from the error i'm getting there are something like
500+ duplicated lines manual editing this file would not be fun. I'm
wanting to use something like shell or perl or anything but manually
editing this thing to pull this off.
Basically any line that begins with an = sign, gets cut from the
master file, and put in another file with the basename of the master
file .regex. Any ip's get cut and placed in another file basename.ips
and the remaining master file is sorted alphabetically and duplicate
lines removed, but it's name is unchanged.
Six lines of an example file would be clearer than so much text. The
only thing I think to have understood you want is this part
awk '/^=/ { print > FILENAME ".regex" }'
Though I am not even sure about that without knowing the exact input
file structure.
It may be possible for you to implement the other two cases yourself
or give some example data to be able to suport you further.
Janis
Suggestions welcome.
Thanks.
Dave.
.
- References:
- processing a large textfile
- From: Dave
- Re: processing a large textfile
- From: Janis Papanagnou
- Re: processing a large textfile
- From: Dave
- Re: processing a large textfile
- From: Janis Papanagnou
- Re: processing a large textfile
- From: Dave
- processing a large textfile
- Prev by Date: Re: Bash process substitution error
- Next by Date: SED: Replacing text between two tags recursively
- Previous by thread: Re: processing a large textfile
- Next by thread: Getting storage space for each sub directory?
- Index(es):
Relevant Pages
|