Re: system crash produces empty files if a copy was in place




scott_doyland@xxxxxxxxxxxxxxx wrote:
Jurjen Oskam wrote:
On 2006-03-15, scott_doyland@xxxxxxxxxxxxxxx <scott_doyland@xxxxxxxxxxxxxxx> wrote:

Has anyone else come across this.

Yes, it seems pretty normal.

2. I simulate a system crash/power loss by powering off the server
'immediately' from the HMC while the copy of the file is in progress
(ie I dont do a shutdown but a power off)
[...]
4. The server (partition) goes down and when I bring it back up the
file I was copying to is empty, ie an 'ls -l' shows the file with '0'
size and a cat/more of the file shows its empty.

This is expected. The SAN didn't lose data, not even if some data was
cached anywhere on the SAN (it would be a pretty worthless SAN if it lost
cached data when a host crashes ). What happened was that when the host
came back on, it noticed that its filesystems were not properly unmounted.
It rolls back transactions that were going on at the time of the crash,
thereby guaranteeing filesystem consistency. Note that this is not even
nearly unique to AIX.

The reason I am doing this test is that we are looking at pprc'ing
across our two SAN's,

If I'm not mistaken PPRC works below the filesystem level on the block
level. It results in a remote LUN having exactly the same contents as
a local LUN.

This alone is not enough to achieve high availability. In addition to
duplicating storage devices, you'd also need to duplicate application
state.
--
Jurjen Oskam

Hi Jurjen,

Firstly I now know this happens on all disks, not just the SAN, so...

I understand your post. So I setup a script like this


while true
do
cat /tmp/lots |
while read LINE
do
cp /test/$LINE /test1
done
done


in /tmp/lots I have the names of two files, called a and b say, and
/test and /test1 are filesystems.

One file is 800MB and one is 200MB. So I run the script and let it
copy both files, then when it starts to copy the first one again (the
larger one - it takes about 60secs to copy - maybe a bit longer) I
power off the server. On a reboot Id expect the 2nd file to at least
be there, but what I have found is that one file had some data and the
other was empty.

So it seems the FS is rolling back to a time it was consistent.

But surely the FS log should handle some of this stuff and we shouldnt
lose this much data. Wont the FS be consistent just before the crash.
AIX knows the data is written to the disk up to this point.

Questions, questions.

TIA,
Scott

syncd syncs the data to disk once every 60 seconds by default. speed up
its frequency if you so desire.

.



Relevant Pages

  • Re: system crash produces empty files if a copy was in place
    ... Jurjen Oskam wrote: ... The SAN didn't lose data, not even if some data was ... it noticed that its filesystems were not properly unmounted. ... So it seems the FS is rolling back to a time it was consistent. ...
    (comp.unix.aix)
  • SUMMARY: moving quota to new disk
    ... I ended up just copying the entire filesystem, including the quotas ... from locally attached disk to filesystems on a SAN. ...
    (SunManagers)
  • Re: How to verify/fix High Disk Read Latencies in Exch2003 ?
    ... Exchange production servers are, the SAN is an EMC CX600. ... >>> current log file on disk and continues until data in the log buffers ... Comingling occurs whe two or more LUNs reside ...
    (microsoft.public.exchange.admin)
  • Re: HP EVA4000 / IBM DS4300 / EMC CX3-20/40
    ... disk array with the virtual raidsets on top. ... So, the system admin, and the DBAs had to create and manage lots of ... separate LUNs and *manually* manage the performance among them to ... applications on the SAN. ...
    (comp.arch.storage)
  • RE: [00/17] Large Blocksize Support V3
    ... The definition of what is meant by "large" filesystems has to change ... with the advances in disk drive technology. ... sizes for disk drives, this is going to become a ... Once disk sector sizes increase, ...
    (Linux-Kernel)