SPECIFICATION OF A FILESYSTEM SYNCHRONIZER (Draft: Version 10) DEFINITION [Basic sets]: x,y : XX are FILE NAMES p,q,r : PP are PATHS where PP is the set of sequences of file names (empty path is written <>) F,G : FF are FILE CONTENTS Write q<=p for "q is a prefix of p," i.e. p = q.r for some r. DEFINITION: A function S@ : PP -> ({NIL,DIR} union FF) represents a FILESYSTEM CONTENTS if 1) S@(p.x) != NIL ==> S@(p) = DIR 2) exists n, for all q, |q| > n ==> S@(q) = NIL DEFINITION: A FILESYSTEM S is a triple of functions (S@,S~,S#) satisfying the following conditions: 1) S@ is a filesystem contents 2) S~, S# : PP -> Boolean 3) S~(p) ==> S@(p) != NIL 4) S#(p) ==> S@(p) = DIR 5) S#(p) ==> S~(q) for some q <= p Write FS for the set of all filesystems. DEFINITION: Write |S| for the length of the longest path p such that S@(p) != NIL. DEFINITION: The PARENT of a path p in a filesystem S, written parent(S,p), is defined as follows: parent(S,p) = q if p=q.x for some x S@(q) = DIR parent(S,p) undefined otherwise. DEFINITION: The set of CHILDREN of a path p in a filesystem S, written children(S,p) is defined as follows: children(S,p) = { q | q = p.x for some x } if S@(p) = DIR = {} otherwise. DEFINITION: S is said to be (POSSIBLY) CHANGED at p, written changed(S,p), if S@(p) = DIR /\ (S~(p) \/ S#(p)) \/ S@(p) != DIR /\ (S~(p) \/ (parent(S,p) defined /\ (S#(parent(S,p)) \/ S~(parent(S,p)))) DEFINITION: A filesystem S is (POSSIBLY) CHANGED BELOW p, written changed*(S,p), if, for some q (possibly empty), changed(S,p.q). DEFINITION: S is said to be OLD at p, written old(S,p), if !changed*(S,p) FACT [OLD*]: If old(S,p) then old(S,p.q) for any q. DEFINITION: When f is a function on paths, write f/p for "f after p", defined as follows: (f/p)(q) = f(p.q). DEFINITION [Declarative presentation]: The pair of new filesystem contents (RA@,RB@) is said to be a SYNCHRONIZATION of original filesystems (A,B) if, for each path p: 1) A@(p) = B@(p) ==> UNCHANGED-AT(p) 2) old(A,p) ==> CHOOSE-B-AFTER(p) 3) old(B,p) ==> CHOOSE-A-AFTER(p) 4) changed(A,p) /\ changed(B,p) ==> UNCHANGED-AFTER(p) 5) changed(A,p) /\ changed*(B,p) ==> UNCHANGED-AFTER(p) 6) changed*(A,p) /\ changed(B,p) ==> UNCHANGED-AFTER(p) where UNCHANGED-AT(p) == RA@(p) = A@(p) /\ RB@(p) = B@(p) UNCHANGED-AFTER(p) == RA@/p = A@/p /\ RB@/p = B@/p CHOOSE-A-AFTER(p) == RA@/p = RB@/p = A@/p CHOOSE-B-AFTER(p) == RA@/p = RB@/p = B@/p. LEMMA: If !changed(S,p) /\ changed*(S,p), then S@(p) = DIR. PROOF: Suppose that S@(p) != DIR and that !changed(S,p). Then we must show !changed*(S,p), i.e., that !changed(S,p.q) for any nonempty q. Since S@(p.q)=NIL (by condition (1) in the definition of filesystem contents), we must show (a) !S~(p.q) (b) !(parent(S,p.q) defined /\ (S#(parent(S,p.q)) \/ S~(parent(S,p.q)))). But (a) holds by condition (3) in the definition of filesystem, while (b) holds because parent(S,p.q) is always undefined. DEFINITION: A filesystem S WAS SYNCHRONIZED AT O if !old(S,p) whenever O@/p != S@/p. FACT [OLD]: If A and B were both synchronized at O and old(A,p) and old(B,p), then A@/p = B@/p. DEFINITION: Let S and T be filesystems and p a path. We write S|p <-- T for the filesystem formed by COPYING S FROM T AFTER p, defined formally as follows: S|p <-- T = ((\q. if p <= q then T@(q) else S@(q)), S~, S#) (That is, the "touched" and "id-changed" maps of S are unchanged, while the contents part of S is overwritten with the contents of t for all paths extending p. The symbol \ stands for lambda: i.e., "\q. if p <= q then T@(q) else S@(q)" is the function that, for each path q, returns T@(q) if p<=q and S@(q) otherwise.) DEFINITION [Algorithm Snc]: The SYNCHRONIZATION ALGORITHM Snc : FS * FS * PP -> FS * FS is defined as follows: Snc(A,B,p) = 1) if old(A,p) /\ old(B,p) then (A,B) 2) else if A@(p) = B@(p) = DIR then let x1,x2,...,xn be some enumeration of the set { x | A@(p.x) != NIL or B@(p.x) != NIL } let (A0, B0) = (A,B) let (Ai+1, Bi+1) = Snc(Ai, Bi, p.(xi+1)) in (An,Bn) 3) else if old(A,p) then (A|p <-- B, B) 4) else if old(B,p) then (A, B|p <--A) 5) else (A,B) FACT [Snc is a total function]: 1) For each A, B, and p, Snc(A,B,p) terminates. 2) The result returned by Snc(A,B,p) is insensitive to the enumeration chosen in clause 2 of the definition. PROOF: By induction on max(|A|,|B|) - |p|. CONJECTURE [Equivalence of the declarative and algorithmic versions]: Suppose that A and B were both synchronized at O and that Snc(A,B,p) = (C,D). Then: 1) (C,D) is a synchronization of (A,B). 2) If (C',D') is a synchronization of (A,B) after p, then (C,D) = (C',D').