- A table may be best replicated to another site where it is frequently required to minimize the network traffic.
- A table may be replicated to a site where it is convenient to join it with the table on that site when users use join queries again not only to minimize the network traffic than otherwise if the tables were to be stored on different sites, but also remove the complexity in join when tables are dispersed.
- If a table is accessed heavily at more than two sites then we violate our own premise. (security and concurrency control exposure will escalate).
- The centralized place where all the tables are residing (New York) will be heavily loaded for all join requests will have to be handled there along with many of the other access requests.
It is possible to implement the transmission of update on a table to its copies in different ways, particularly if the nature of the data and of the applications that use it can tolerate delay (backlash) in such update. Those methods are said to be "asynchronous". It could be either done by sending the message from site of actual update to the remote site where copy of the table exists and apply it there right away, or alternatively choosing one of the sites to accumulate all the updates (for all the tables) and transmission of changes may be scheduled. In a third "asynchronous" approach, each table may have one of the sites declared as "dominant" (for that table) to which all of the updates may be sent and from there transmit it to other copies of the table on scheduled basis.
However some popular commercial DBMS systems like Oracle use Two Phase Commit (2PC) process which is a "synchronous" algorithm. The "2PC" consists of two phases namely "Prepare Phase" and "Commit" phase. The mechanism uses a special "log" file to hold the update temporarily at all the sites. When an update is initialized at one site in a certain table (say D at Singapore) then the actual update is first written temporarily to the log file at every site (say Los Angeles & Paris) and NOT the copies of the table anywhere at this stage. The "Prepare Phase" first confirms with the remote locations (LA & Paris) having the copies of a table (D) under update at one site (update initializing site, Singapore) if they are in an operating condition and if they could lock the respective data to be updated (Oracle implements row level locking) in their copy of the table (D at LA & Paris) and if any of this condition is not met then either the transaction waits until favorable conditions are reached or transaction is aborted or otherwise (if conditions are met) the data makes way to the copies of the tables at all sites from the respective log files; which is part of the "commit" phase. So in this way all the copies of the table either stay at the original state if the update proceeds to fail at any location or get updated "synchronously".
Distributed Joins -
"Distributed Joins" involve tables at discreet locations required to be joined to satisfy a "join query" (query that required data being combined from two or more tables in a join condition). Various permutations and combinations may emerge for transmitting the data from one location to another or even third depending upon -
- if or not the location from where the request for join originated has copy of any of the tables involved in the join.
- the location where the join may be actually performed, transmitting the final or intermediate result.
- The number and size of the records from each table involved in the join.
- The distances and costs of transmitting the records from one location to another to execute the join.
- The distance and cost of shipping the result of the join back to the location that initialized the join query.