Re: Sorting and then removing sort-by cols from a fixed-width flat file



Hey Logan,

Thanks for your inputs. I have some doubts and concerns rgding same. I
have answer your questions (comments) below:

Logan Shaw wrote:
aditya.chaudhary@xxxxxxxxx wrote:
Basically I would be merging 3 flat files and wanted to sort and group
it's records so that they can be transmitted in a specified format.

Transmitted? I thought you were just merging them.

I will first merge them, then do the sorting, then remove the cols
which were added just for making sorting easy....finally I have to ftp
this file.

But
the problem is the group by col is situated at different position in
each of the 3 flat files.

Does the "group by col" mean the column which contains the sort key?

No. 'Group by col' means that there is a col exisiting in the records
which can be used for sorting the data alongwith Record_Type, but the
issue is its existing at different positions in each of 3 flat files'
records. If for each file it had been exisiting from say position 15-18
then I could have used it for sorting.


So I thought of adding 2 common cols in the
beginning of each of the 3 files and then sort the records using the
same 2 cols so that they can be grouped and then remove those cols
after sorting operation as they are not required to be present in the
final file.

That's one way to do it. The other way to do it is to do all that
work in your comparison function, so that the information it is never
added to any files but is temporarily created only when you are
comparing two elements.

I don't get what you meant by this. Maybe because I'm not familiar with
much shell scripting techniques.

I would be creating a shell script to first merge the 3
files, then sorting them and then removing the cols.

Merging has a specific meaning when you are talking about sorting.
I believe what you are saying is that you will first convert all
three files into a common format, then sort them, then remove
the extra columns.

Yes. Common format is already defined - it has to be fixed-width file.
So the 3 files have to be merged and then sorted so that data appears
in some 'order by' fashion when it goes to the user.

The 2 cols are Request_Id (string of rec length 15, i.e. position 1-15
in file) and Serial_Number (string of rec length 5, i.e. position 16-20
in file).

So I need to know:
1) how to write the 'sort' command of Unix so that I can use both these
cols for sorting the recs of the file.

Sorting by character column numbers is generally not the easiest thing
in Unix, at least if you are using the "sort" command. The "sort"
command expects a field separator character rather than using fixed-
length fields. There may be a way to "trick" it into using a fixed
range of columns, but it's much easier to use some character, like
":" as a separator. Then you can do

sort -t: +0 +1

in order to sort by the 1st and 2nd colon-delimited fields. Of course
you can use any character instead of ":" as long as that character
does not occur within your sort keys.

I cannot use a delimiter for just 2 new cols. The file is in
fixed-width format and I have to format such a file. So kindly suggest
the sort syntax for fixed width.

2) the remove command or process so that I can drop/remove these 2 cols
after all the recs have been sorted out.

Removing the first 20 characters from every line of a file is easy.
It's just as easy as this:

sed -e 's/.\{20\}//'

That matches the pattern

.\{20\}

which is 20 of any character ("." stands for any character) and
replaces the pattern it matches with the empty string.

Thanks for this solution. I would try it once and let u know if it
works fine or not.


However, above I recommended that you use a delimiter instead of
fixed-width fields. In that case, to remove the first two
colon-delimited fields, you would instead want to use the cut
command:

cut -d: -f3-

That prints field 3 and following ("3-") with ":" as the delimiter.
Note that this is 1-based indexing, whereas the sort command (at
least in the syntax I gave -- it accepts more than one syntax)
uses 0-based indexing.

- Logan

.



Relevant Pages

  • Re: Sorting and then removing sort-by cols from a fixed-width flat file
    ... beginning of each of the 3 files and then sort the records using the ... same 2 cols so that they can be grouped and then remove those cols ... how to write the 'sort' command of Unix so that I can use both these ... command expects a field separator character rather than using fixed- ...
    (comp.unix.programmer)
  • Re: Sorting and then removing sort-by cols from a fixed-width flat file
    ... Basically I would be merging 3 flat files and wanted to sort and group ... So I thought of adding 2 common cols in the ... then sorting them and then removing the cols. ... after all the recs have been sorted out. ...
    (comp.unix.programmer)
  • Re: Sorting files recursively
    ... sort -rnz | ... What's the purpose of the string terminator? ... The \n character is a valid one in the name of the file, ... That's because ls does the sorting. ...
    (comp.unix.shell)
  • Re: Sorting numbers
    ... I think Excel is seeing your 'numbers' as text and so is sorting them one ... character at a time from the left regardless of how many characters, ... highlight the only the data that you want to sort. ... > first digit is smaller than the first digit of the five figure number. ...
    (microsoft.public.excel.newusers)
  • Sorting and then removing sort-by cols from a fixed-width flat file
    ... Basically I would be merging 3 flat files and wanted to sort and group ... So I thought of adding 2 common cols in the ... then sorting them and then removing the cols. ... after all the recs have been sorted out. ...
    (comp.unix.programmer)