CSE 480 Lecture Notes February 22, 1996 A distributed system consists of a finite set of processes and a finite set of channels. Global State - Fig. 5-6 [SiSh94] - global state diagram n = # of msgs sent by A along the channel before A's state was recorded n' = # of msgs sent by A along the channel before the channel's state was recorded A consistent global state requires: n = n' m = # of msgs received along the channel before B's state was recorded m' = # of msgs received along the channel by B before the channel's state was recorded Similarly, m = m' Since the number of msgs sent along the channel cannot be less than the number of msgs received along that channel, we have n' >= m. Thus, n >= m for a consistent global state. Def. LSi is the local state of site Si. 1. values of local variables 2. seq of msgs sent or received Note: Site Si could be process Pi. The global state of a distributed computation is n-tuples of local states; i.e., GS = (LS1,LS2,...,LSn). - send(Mij): the send event of msg Mij from Pi to Pj - recv(Mij): the receive event of msg Mij by Pj - time(x): the time at which state x was recorded; time(send(m)) denote the time at which event send(m) occurred. For local states LSi and LSj, - transit(LSi,LSj) = {Mij | send(Mij) in LSi and recv(Mij) not_in LSj} - inconsistent(LSi,LSj) = {Mij | send(Mij) not_in LSi and recv(Mij) in LSj} A global state GS = (LS1,LS2,...,LSn) is consistent if for all i, j : inconsistent(LSi,LSj) = \emtpyset A global state GS = (LS1,LS2,...,LSn) is transitless if for all i, j : transit(LSi,LSj) = \emtpyset A global state is strongly consistent if it is consistent and transitless. - Fig. 5-8 [SiSh94] {LS12,LS23,LS33} consistent {LS11,LS22,LS32} inconsistent {LS11,LS21,LS31} is strongly consistent ----- Distributed Snapshots Goal: to capture a consistent global state. Ex: global snapshots of Penn campus or nature. used in debugging - Assume that communication channels are FIFO. - To determine a global state, a process needs cooperation from other processes. - The algorithm is to be superimposed on the underlying computation: it must run concurrently with, but not alter the underlying computation. - A process can record its own state and the msgs it sends and receives, but nothing else. - The state of a channel is the sequence of msgs sent along the channel, excluding the msgs received along the channel. Figures from [ChLa85] Fig 1-4 The Algorithm: - Marker-Sending Rule for a Process P. - Marker-Receiving Rule for a Process Q. Fig 8. - Assume P records in S0, i.e., A - P sends a marker along c - The system moves to S1, S2 and then S3 while the marker in transit - Q receives the marker in S3 - Q records its state, D, and c empty - Q sends the marder along c' - P receives it on c' and records - the recorded global state is Fig 8 How to collect and assemble the recorded information? Properties of the recorded global state - may not correspond to a global state that actually occured. Stable Property of a distribute system D -Let y be a predicate function defined on the global state of a distributed system D; that is, y(S) is true or false for a global state S of D. - The predicate y is said to be a stable property of D if y(S) implies y(S') for all global states S' of D reachable from the global state S of D. I.e., Once it holds, it holds true for all later points in that computation. - Examples: "computation has terminiated", "the system is deadlocked", "all tokens in a token righ have disappeared" ----- The local histrory of Pi is a (possibly infinite) seq of events h_i = e_i^1, e_i^2, ... Let h_i^k = e_i^1, e_i^2, ..., e_i^k h_i^0 = empty seq The global history of the computation is a set H = h_1 U ... U h_n The notion of a CUT captures a global state in a distributed computation. Def. A cut of a distributed comuptation is a set C = {c1,c2, ..., cn}, where ci is the cut event at site Si, i.e., local state LSi at that time. Def. Let ek denote an event at site Sk. A cut C = {c1,c2, ..., cn} is a consistent cut if for all Si,Sj, there is no ei, ej such that (ei --> ej) and (ej --> cj) and (ei -/-> ci) where ci,cj in C. - A cut is consistent if every msg that was received before a cut event was sent before the cut event at the sender site in the cut. Fig. 5-10. ----- Termination Detection - A process active or idle - A distributed computation is terminated iff all the processes are idle and there are no msgs in transit. Huang's Termination Detection Algorithm - based on weight - use controlling agent