Re: file locking.



On Thu, 16 Aug 2007, John Baldwin wrote:

On Thursday 16 August 2007 04:18:51 pm Jeff Roberson wrote:
On Thu, 16 Aug 2007, John Baldwin wrote:

On Thursday 16 August 2007 03:01:18 am Jeff Roberson wrote:
I have been looking at file locking for 8.0 and have come up with a way
to
completely eliminate the file lock, reduce the size of struct file by
~4 pointers (22%), remove the global list of files and associated lock,
and restrict the scope of unp_gc while removing several race conditions.


Thanks for the review.

I like finit(). I would maybe change socketpair() to pass so1 and so2 to
finit() rather than setting f_data twice.

I'm not sure what you mean?

In socketpair() the new code does this:

fp1->f_data = so1;
...
fp2->f_data = so2;
...
finit(fp1, ..., fp1->f_data, ...);
finit(fp2, ..., fp2->f_data, ...);

It might be cleaner to do this:

...
...
finit(fp1, ..., so1, ...);
finit(fp2, ..., so2, ...);

I did not want to have one file pointing at another without an initialized f_data field. However, I guess the underlying sockets are already setup so this may not be important. The code did go to some effort to setup f_data early before as well so I didn't want to change that.


Did you consider using refcount_* for f_count rather than using the atomic
operations directly?

Yes, I may change it. I was investigating some schemes where it may not
have been sufficient but I think it will work fine for what I've settled
on.

I also think I'll wrap some atomics for manipulating f_type in macros.

That would be good.

It appears you never call unp_discard() and thus 'closef()' on the sockets
in
unp_gc() now. Perhaps the fdrop()'s in the end of the loop should be
unp_discard()'s instead?

unp_discard() happens as a side-effect of sorflush().

So, in the old code there's a really big comment about how it makes sure to
only do closef() (via unp_discard()) once but does a sorflush() for each
f_msgcount. Was that comment no longer true?

The comment actually says:

*
* It is incorrect to simply unp_discard each entry for f_msgcount
* times

What we do is grab an extra ref to each struct file that is dead and then explicitly sorflush() them. This closes all of the references held by that socket, which would free any unreferenced non-unp descriptors. However, we want to prevent the algorithm from recursing in on itself so we hold the extra file ref for unp sockets that would be closed. Then when we loop releasing this one last ref at the end the actually fo_close will be called.

This portion of the algorithm is not significantly different from before. I just introduced an extra flag so I could remove the race from dropping the lock inbetween operations and get an accurate count of how big the array needs to be.


Also, it's a bit of a shame to lose 'show files' from ddb.

Yes, I can re-implement that using the same technique as sysctl kern.file.
What's more troubling is the continued erosion of support for libkvm as it
uses filelist. I don't think libkvm is a strong enough case to keep
filelist around. I guess I will have to hack it to work similarly to
sysctl as well.

Do we have an official stance on libkvm? Now that we have sysctl for
run-time it's only useful for crashdump debugging. Really in most cases
it could be replaced with a reasonable set of gcc scripts.

s/gcc/gdb/. At work we do mostly post-mortem analysis, so having working
libkvm is still very important for us. xref the way I just fixed netstat to
work again on coredumps recently. Breaking fstat on coredumps would probably
be very annoying. libkvm can always use the same algo as the sysctl if
necessary though.

Yes, I'll do that.


How much overhead is the filelist if it is only used in file
creation/destruction?

Well shaving off two pointers gets us into cacheline size for struct file which has some perf improvement. Furthermore, having a global lock in open/close will impact even single threaded workloads on larger machines. Truthfully I haven't seen contention on this yet but I suspect it's just because I haven't tried the right workload. :-) If we can do it without hindering other areas, which we can, I think it's a good idea. I included it because it was easy to do so after the gc changed.

fwiw this is about a 2% improvement in peak throughput on my mysql test on the 8way with better scaling at very high numbers of threads. At peak throughput there was no measured contention on this lock. That improvement is purely from fewer atomics and smaller struct file.

Kris created a synthetic benchmark that had over 50% improvement in throughput with lots of threads contending on the same descriptor. single-threaded performance is probably much better with this as well, although less dramaticaly so.


--
John Baldwin

_______________________________________________
freebsd-arch@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: [PATCH] MPSAFE/LOOKUP_SHARED cd9660
    ... With this patch I'm unable to kldunload libiconv.ko once it is ... John Baldwin ... but without success. ... lock order reversal: ...
    (freebsd-current)
  • Re: [PATCH] MPSAFE/LOOKUP_SHARED cd9660
    ... With this patch I'm unable to kldunload libiconv.ko once it is ... getting unloaded after 'foo' gets unloaded and using a destroyed lock ... John Baldwin ... Do you get the same messages w/o the patch with the same ISO image / ...
    (freebsd-current)
  • Re: [PATCH] MPSAFE/LOOKUP_SHARED cd9660
    ... With this patch I'm unable to kldunload libiconv.ko once it is ... John Baldwin ... lock order reversal: ... Do you get the same messages w/o the patch with the same ISO image ...
    (freebsd-current)
  • Re: [PATCH] MPSAFE/LOOKUP_SHARED cd9660
    ... With this patch I'm unable to kldunload libiconv.ko once it is ... John Baldwin ... lock order reversal: ... Do you get the same messages w/o the patch with the same ISO image / ...
    (freebsd-current)
  • Re: [PATCH] MPSAFE/LOOKUP_SHARED cd9660
    ... With this patch I'm unable to kldunload libiconv.ko once it is ... John Baldwin ... but without success. ... lock order reversal: ...
    (freebsd-current)