Re: Sorting speedy
- From: "news.t-online.de" <suregeonde@xxxxxxxx>
- Date: Thu, 03 Aug 2006 07:41:46 +0200
Vassilis wrote:
Hi,Exactly this is the problem, it's the order.
news.t-online.de wrote:
Now we have to compare these two sets of files,
and to confirm whether the functionalty in regards
to X and pre Y related code creates the same data.
We tried it in SQL, but this is too slow,
so we export the data. The data is presumably
identical, but the rows of a file in one set
is not in the same order as they are in the file
of the other set, we cannot diff or cmp.
When we sort a file and the corresponding file in
the other set, we can.
from what I gather, your problem is the different order of data
in the rows. Why don't you try remedying this?
If you know, for the two versions, the order of the rows
you can, on the fly, transform the rows of the files in set
X and compare them against the files in set Y.
This can be easily done with a tool like awk or perl.
I would be reluctant to sort up front.
Hope this helps.
For a 8 GB file with 40,000,000 records there
world be two arrays in memory of more than 10GB
each.
In the example program below,
one could also read a line from file 1,
than a line from file 2, if they match
or are already in the other set, we ignore and do not
insert into the array or delete from the other array.
This would save memory, but only when there is
a kind of order already in the files.
BEGIN{
while( getline < "file1" >0){
file1++
if($0 in FILE1){
print "Duplicate " $0
}
FILE1[$0]=1 #or line number or whatsoever
}
close("file2")
while( getline < "file2" >0){
file2++
if($0 in FILE2){
print "Duplicate " $0
}
FILE2[$0]=1 #or line number or whatsoever
}
close("file2")
if(file1 != file2){
print "numbers of read lines not identical"
}
for(i in FILE1){
if( i in FILE2){
}else{
print i" not found in file2"
}
}
for(i in FILE2){
if( i in FILE1){
}else{
print i" not found in file1"
}
}
}
.
- Follow-Ups:
- Re: Sorting speedy
- From: Jon LaBadie
- Re: Sorting speedy
- From: Vassilis
- Re: Sorting speedy
- References:
- Sorting speedy
- From: news.t-online.de
- Re: Sorting speedy
- From: Vassilis
- Sorting speedy
- Prev by Date: prob relating sed
- Next by Date: Re: prob relating sed
- Previous by thread: Re: Sorting speedy
- Next by thread: Re: Sorting speedy
- Index(es):
Relevant Pages
|