Re: parallel computing - to use NFS, or not
- From: Paul <nospam@xxxxxxxxxx>
- Date: Sat, 09 Jun 2012 01:57:41 -0400
Joshua Maurice wrote:
So, I put my money where my mouth was, and I'm seeing as to the best
way to write a distributed build system. Let's suppose that I'm trying
to distribute an existing Maven build (because I am). The build is
broken up into separate pieces in a DAG, like any other build. Any
build step can use any of its dependencies (or transitive
dependencies).
So, naively, I want to have a bunch of agents running on a bunch of
computers on a local network with access to some shared NFS where the
build takes place, with a single one coordinating it all by assigning
jobs to the agents. (I don't care at the moment about redundancy, fail-
over, etc. Let's just get it working. Doesn't need to have high
availability.) So, naively, when an agent is told to run a job, it
needs to guarantee that the results of all dependency jobs (the files
on NFS) are visible, then do that job (which is basically writing
files to NFS), then do whatever is needed to guarantee that the files
will be visible to dependent jobs on different computers.
Am I barking up the wrong tree? Would I be better off copying the
files myself to each agent computer? That seems like a lot of work,
especially to get it right. This seems like a job for NFS - I think.
So, let's go with the NFS approach. Is there a way to implement the
NFS equivalence of a mutex lock and unlock? I understand "close-to-
open consistency", but that doesn't help me here I think. Each job
writes lots of files.
Would something like the following work?
An agent runs a job by:
1- Gets the job details over some TCP socket from the coordinater.
2- Busy loops or sleep loops, opening and closing a sentry file for
each dependency job, waiting until the sentry file contents for each
dependency job become "done".
3- Do the job. Write out files to NFS. This will probably be done from
other processes, ex: gcc, javac, etc.
4- Wait for the other processes to finish and die, specifically wait
for all of the files for the job to be closed.
5- Call sync().
6- Open the sentry file for this job, write "done", and close it.
I don't know if this would work. I half-suspect not. The idea of what
I'm trying to do is guarantee visibility ordering ala a read and write
memory barriers, aka acquire and release memory semantics, aka C++11
memory_order_acquire and memory_order_relaxed. The above scheme will
work if I can get the guarantee that the write of the sentry file will
definitely hit the server only after all of the writes of the job hit
the server. I think "close-to-open consistency" gets me the rest.
Would sync() work, even if called from a different process than the
process doing the file writes? Would this give me the ordering
guarantee I need over NFS? What gives me this guarantee - which NFS
version, which NFS options, which OS, etc.? Will this work in
practice? Or again would I be better off re-implementing NFS in some
small part by manually copying the files of a job to each computer
node where they're needed?
Thank you for your time.
You mean like the "distcc" I use in Gentoo, to speed up building ?
http://en.wikipedia.org/wiki/Distcc
That doesn't work as well as you'd think. In that, not all aspects
of the build process, are accelerated. There is still a bottleneck.
Still, it does result in a reduction in clock time for builds. And
the main advantage, it's ready to use.
It might be better to study an existing system, see what mistakes
were made, where the scheme could be improved, before re-inventing
the wheel. That particular one works that way, for a reason.
Paul
.
- Follow-Ups:
- Re: parallel computing - to use NFS, or not
- From: Joshua Maurice
- Re: parallel computing - to use NFS, or not
- References:
- parallel computing - to use NFS, or not
- From: Joshua Maurice
- parallel computing - to use NFS, or not
- Prev by Date: Re: parallel computing - to use NFS, or not
- Next by Date: Re: parallel computing - to use NFS, or not
- Previous by thread: Re: parallel computing - to use NFS, or not
- Next by thread: Re: parallel computing - to use NFS, or not
- Index(es):
Relevant Pages
|