Re: gjournal: journaled slices vs. journaled partitions



Hello,
I built a similar setup last weekend on a new home server with two
500GB drives. I didn't want to only put gmirror and have full drives rebuild
on power failure/reset on the system. I was told that putting bsdlabels on a
gjournal provider wasn't a good idea but I have yet to have an answer about
why... I went with this setup anyway and I made some reset tests to see what
happens on reboot and everything always went fine.

When building this setup I got one big problem. If the root filesystem (/)
was on a gjournal provider, an unclean shutdown when data was being written
on the disk rendered the system completely unbootable. I got this message:

GEOM_MIRROR: Device mirror/gm launched (2/2)
GEOM_JOURNAL: Journal 3672855181: mirror/gma contains data.
GEOM_JOURNAL: Journal 3672855181: mirror/gma contains journal.
GEOM_JOURNAL: Journal 3868799910: mirror/gmd contains data.
GEOM_JOURNAL: Journal 3868799910: mirror/gmd contains journal.
GEOM_JOURNAL: Journal mirror/gmd consistent.
Trying to mount root from ufs:/dev/mirror/gm.journal

Manual root filesystem specification:
<fstype>:<device> Mount <device> using filesystem <fstype>
eg. ufs:da0s1a
? List valid disk boot devices
<empty line> Abort manual input


mountroot> ?

List of GEOM managed disk devices:
mirror/gmd.journal mirror/gmd mirror/gmc mirror/gma mirror/gm ad10s1c
ad10s1b ad8s1c ad8s1b ad10s2 ad10s1 ad8s1 ad10 ad8 acd0


As you can see, in the proposed list of disk devices devices to boot on,
"mirror/gm.journala" is absent. As I and Ivan Voras, that I contacted about
this problem, found, the GEOM_JOURNAL thread that is supposed to mark the
journal consistent takes too much time to do it with the root filesystem's
provider and the kernel try to mount a device that doesn't yet exist. A bug
report has been opened about this problem. For my final setup I decided to
put the root filesystem on a separate mirrorred slice of 1GB. Since this
slice isn't often written on, not many rebuilds should occur in case of
power failure. And I made my "power failure" test by hitting the reset
button while writing data on this filesystem and the rebuild on 1GB doesn't
takes too much time (at most 20-30 seconds).

Now I have the question. Why the "load" algorith wasn't recommended? Is it
fixed in 7.0-RELEASE-p5?

Here is my complete setup that seems to boot correctly every times I made my
reset tests while writing data on each filesystems. The 2GB gjournal
provider is directly on the mirror provider for all mirrored filesystems
exept the root one and I made my bsd labels on the gjournal provider,
instead of creating a journal for every filesystem.


[root@headless ~]# cat /etc/fstab
# Device Mountpoint FStype Options Dump
Pass#
/dev/ad10s1b none swap sw 0 0
/dev/ad8s1b none swap sw 0 0
/dev/mirror/root / ufs rw 1 1
/dev/ufs/usr /usr ufs rw,async 2 2
/dev/ufs/var /var ufs rw,async 2 2
/dev/ufs/tmp /tmp ufs rw,async 2 2
/dev/ufs/home /home ufs rw,async 2 2
/dev/ufs/data /mnt/data ufs rw,async 2 2
/dev/acd0 /cdrom cd9660 ro,noauto 0 0


[root@headless ~]# mount
/dev/mirror/root on / (ufs, local, soft-updates)
devfs on /dev (devfs, local)
/dev/ufs/usr on /usr (ufs, asynchronous, local, gjournal)
/dev/ufs/var on /var (ufs, asynchronous, local, gjournal)
/dev/ufs/tmp on /tmp (ufs, asynchronous, local, gjournal)
/dev/ufs/home on /home (ufs, asynchronous, local, acls, gjournal)
/dev/ufs/data on /mnt/data (ufs, asynchronous, local, acls, gjournal)


[root@headless ~]# glabel status
Name Status Components
ufs/usr N/A mirror/data.journald
ufs/var N/A mirror/data.journale
ufs/tmp N/A mirror/data.journalf
ufs/home N/A mirror/data.journalg
ufs/data N/A mirror/data.journalh


[root@headless ~]# gjournal list
Geom name: gjournal 372943514
ID: 372943514
Providers:
1. Name: mirror/data.journal
Mediasize: 495810966528 (462G)
Sectorsize: 512
Mode: r5w5e11
Consumers:
1. Name: mirror/data
Mediasize: 497958450688 (464G)
Sectorsize: 512
Mode: r1w1e1
Jend: 497958450176
Jstart: 495810966528
Role: Data,Journal


[root@headless ~]# gmirror list
Geom name: data
State: COMPLETE
Components: 2
Balance: split
Slice: 4096
Flags: NOFAILSYNC
GenID: 0
SyncID: 1
ID: 990032118
Providers:
1. Name: mirror/data
Mediasize: 497958450688 (464G)
Sectorsize: 512
Mode: r1w1e1
Consumers:
1. Name: ad8s2
Mediasize: 497958451200 (464G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: HARDCODED
GenID: 0
SyncID: 1
ID: 235591066
2. Name: ad10s2
Mediasize: 497958451200 (464G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: HARDCODED
GenID: 0
SyncID: 1
ID: 2007880058

Geom name: root
State: COMPLETE
Components: 2
Balance: split
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 4098555256
Providers:
1. Name: mirror/root
Mediasize: 1073022976 (1.0G)
Sectorsize: 512
Mode: r1w1e1
Consumers:
1. Name: ad8s1a
Mediasize: 1073023488 (1.0G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: HARDCODED
GenID: 0
SyncID: 1
ID: 3394521634
2. Name: ad10s1a
Mediasize: 1073023488 (1.0G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: HARDCODED
GenID: 0
SyncID: 1
ID: 3774466459


Gabriel


2008/11/4 Volodymyr Kostyrko <c.kworr@xxxxxxxxx>

Carl wrote:

Volodymyr Kostyrko wrote:

I have some setups were gjournal was put on device rather the on
partition, i.e.:

[umgah] ~> gmirror status
Name Status Components
mirror/umgah0 COMPLETE ad0
ad1
[umgah] ~> gjournal status
Name Status Components
mirror/umgah0.journal N/A mirror/umgah0
[umgah] ~> glabel status
Name Status Components
ufs/umgah0root N/A mirror/umgah0.journala
label/umgah0swap N/A mirror/umgah0.journalb
ufs/umgah0usr N/A mirror/umgah0.journald
ufs/umgah0var N/A mirror/umgah0.journale


Does the above suggest that you've ended up with individual journal
providers for each partition anyway? If so, where are they and have you
really achieved anything functionally different? Are they at the end of
their individually associated partitions or all together somewhere else? Has
the ill-advised journaled small partition issue been successfully overcome
through what you've done?


First, there is only one journal - for /dev/mirror/umgah0 and it is named
/dev/mirror/umgah0.journal. Anything else is just a bsdlabel partitions,
there are four of 'em.


[umgah] ~> mount
/dev/ufs/umgah0root on / (ufs, asynchronous, local, noatime, gjournal)
devfs on /dev (devfs, local)
/dev/md0 on /tmp (ufs, asynchronous, local)
/dev/ufs/umgah0var on /var (ufs, asynchronous, local, noatime, gjournal)
/dev/ufs/umgah0usr on /usr (ufs, asynchronous, local, noatime, gjournal)
devfs on /var/named/dev (devfs, local)

And yes, mirror autosynchronization is turned off, gjournal takes care of
that too.

It's not stated in manual, but gjournal is typically transparent for any
type of access, just in case of UFS file system is marked as journaled so
any metadata writes can be distinguished from data writes. Without that
gjournal does literally nothing.


And what does this mean for your swap partition?


Just nothing, it's just swap. It can't be journaled.

Laszlo Nagy wrote earlier:

Another tricky question: why would you journal a SWAP partition?


Volodymyr, does your assertion that gjournal does nothing when a file
system is not UFS mean that there is no penalty with regard to your swap
partition despite the existence of "mirror/umgah0.journalb"?


I haven't seen any perfomance decrease in this configuration. And according
to manual and articles about gjournal it should work this way.

Any chance you'd like to share your command sequence for constructing your
gmirror'd and gjournal'd filesystem, Volodymyr? :-)


If we have two disks (ad0, ad1) it should look like this:

gmirror label -b load -n umgah0 ad1

We are getting all drive gmirrored without synchronization (we don't need
it - journal would take care of any discrepancies) and with load balance
(load was fixed not so long ago in stable and should be fine to go with).

gjournal label mirror/umgah0

We are creating a journal on top of our gmirror. It eats 1G from the end of
the disks and gives us the rest to use.

bsdlabel -wB mirror/umgah0.journal

We are writing the standard bsdlabel to the disk and making it bootable.
After that we will get one partition 'a'.

<spam>
Yes, no fdisk. I don't think this old piece of rough junk is ever needed on
machine running FreeBSD solely. It just takes space, it requires
compatibility to forgotten-and-abandoned standards and gives nothing more.
You have your server dual-booting Windows or Linux? This is the only case
you need fdisk for.
</spam>

bsdlabel -e mirror/umgah0.journal

Now we are splitting our journal to some partitions. I did it this way:

# /dev/mirror/umgah0.journal:
8 partitions:
# size offset fstype [fsize bsize bps/cpg]
a: 524288 16 4.2BSD
b: 16777216 * swap
c: 779325614 0 unused 0 0 # "raw" part, don't
edit
d: 33554432 * 4.2BSD
e: * * 4.2BSD

After that we can format this filesystems:

newfs -J -L umgah0root /dev/mirror/umgah0.journala
newfs -J -L umgah0var /dev/mirror/umgah0.journald
newfs -J -L umgah0usr /dev/mirror/umgah0.journale

And label the swap:

glabel label umgah0swap /dev/mirror/umgah0.journalb

You can skip all this glabel thing, I just prefer to have slim fstab, as
slim as possible.

<fstab>
/dev/label/umgah0swap none swap sw 0 0

md /tmp mfs rw,-s1024m,-S,-oasync 0 0

/dev/ufs/umgah0root / ufs rw,async,noatime 0 1
/dev/ufs/umgah0var /var ufs rw,async,noatime 0 2
/dev/ufs/umgah0usr /usr ufs rw,async,noatime 0 2
</fstab>

There's a lot more here to describe from moving system to newly created
partitions to inserting and rebuilding our first disk to gmirror. All this
issues are described in handbook or other articles found on the net.


--
Sphinx of black quartz judge my vow.

_______________________________________________
freebsd-questions@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "
freebsd-questions-unsubscribe@xxxxxxxxxxx"




--
Gabriel Lavoie
glavoie@xxxxxxxxx
_______________________________________________
freebsd-questions@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe@xxxxxxxxxxx"