@Extensible @SynchronizationServerExtension(appliesToLocalContent=false, appliesToSynchronizedContent=true) public abstract class ScriptedSyncSource extends java.lang.Object implements Configurable
fetchEntry(SyncOperation)
method will be called once for every
change that is returned by
getNextBatchOfChanges(int, AtomicLong)
.
This is a generic interface and there is no protocol-specific connection
management provided. It is expected that implementers will provide their own
libraries for talking to the source endpoint and handle the connection
lifecycle in the initializeSyncSource(SyncServerContext,
SyncSourceConfig, ArgumentParser)
and finalizeSyncSource()
methods of this extension.
During realtime synchronization (i.e. when a Sync Pipe is running), there is a sliding window of changes being processed, and this API provides a distinction between some different points along that window:
getNextBatchOfChanges()
but not completely processed and
acknowledged back to the Sync Source.getNextBatchOfChanges()
should return the first changes
that have not been detected. This should be somewhere at or ahead of
the startpoint.
Several of these methods throw EndpointException
, which should be
used in the case of any connection or endpoint error. For other types of
errors, runtime exceptions may be used (IllegalStateException,
NullPointerException, etc.). The Data Sync Server will automatically
retry operations that fail, up to a configurable amount of attempts. The
EndpointException class allows you to specify a retry policy as well.
dsconfig create-sync-source \ --source-name "{source-name}" \ --type groovy-scripted \ --set "script-class:{class-name}" \ --set "script-argument:{name=value}"where "{source-name}" is the name to use for the Sync Source instance, "{class-name}" is the fully-qualified name of the Groovy class written using this API, and "{name=value}" represents name-value pairs for any arguments to provide to the sync source. If multiple arguments should be provided to the sync source, then the "
--set script-argument:{name=value}
" option
should be provided multiple times.Constructor and Description |
---|
ScriptedSyncSource() |
Modifier and Type | Method and Description |
---|---|
abstract void |
acknowledgeCompletedOps(java.util.LinkedList<SyncOperation> completedOps)
Provides a way for the Data Sync Server to acknowledge back to the
script which sync operations it has processed.
|
void |
defineConfigArguments(ArgumentParser parser)
Updates the provided argument parser to define any configuration arguments
which may be used by this extension.
|
abstract Entry |
fetchEntry(SyncOperation operation)
Return a full source entry (in LDAP form) from the source, corresponding
to the
ChangeRecord that is passed in through the
SyncOperation . |
void |
finalizeSyncSource()
This hook is called when a Sync Pipe shuts down, when the resync
process shuts down, or when the set-startpoint subcommand (from the
realtime-sync command line tool) is finished.
|
abstract java.lang.String |
getCurrentEndpointURL()
Return the URL or path identifying the source endpoint
from which this extension is transmitting data.
|
abstract java.util.List<ChangeRecord> |
getNextBatchOfChanges(int maxChanges,
java.util.concurrent.atomic.AtomicLong numStillPending)
Return the next batch of change records from the source.
|
abstract java.io.Serializable |
getStartpoint()
Gets the current value of the startpoint for change detection.
|
void |
initializeSyncSource(SyncServerContext serverContext,
SyncSourceConfig config,
ArgumentParser parser)
This hook is called when a Sync Pipe first starts up, when the
resync process first starts up, or when the set-startpoint
subcommand is called from the realtime-sync command line tool.
|
void |
listAllEntries(java.util.concurrent.BlockingQueue<ChangeRecord> outputQueue)
Gets a list of all the entries in the source endpoint.
|
void |
listAllEntries(java.util.Iterator<java.lang.String> inputLines,
java.util.concurrent.BlockingQueue<ChangeRecord> outputQueue)
Gets a list of all the entries in the source from a given file input.
|
abstract void |
setStartpoint(SetStartpointOptions options)
This method should effectively set the starting point for synchronization
to the place specified by the
options parameter. |
public ScriptedSyncSource()
public void defineConfigArguments(ArgumentParser parser) throws ArgumentException
defineConfigArguments
in interface Configurable
parser
- The argument parser to be updated with the configuration
arguments which may be used by this extension.ArgumentException
- If a problem is encountered while updating the
provided argument parser.public void initializeSyncSource(SyncServerContext serverContext, SyncSourceConfig config, ArgumentParser parser)
SyncServerContext
in a class
member so that it can be used elsewhere in the implementation.
The default implementation is empty.
serverContext
- A handle to the server context for the server in
which this extension is running.config
- The general configuration for this sync source.parser
- The argument parser which has been initialized from
the configuration for this JDBC sync source.public void finalizeSyncSource()
The default implementation is empty.
public abstract java.lang.String getCurrentEndpointURL()
public abstract void setStartpoint(SetStartpointOptions options) throws EndpointException
options
parameter. This should
cause all changes previous to the specified start point to be disregarded
and only changes after that point to be returned by
getNextBatchOfChanges(int, AtomicLong)
.
There are several different startpoint types (see
SetStartpointOptions
), and this implementation is not required to
support them all. If the specified startpoint type is unsupported, this
method should throw an UnsupportedOperationException
.
IMPORTANT: The RESUME_AT_SERIALIZABLE
startpoint type
must be supported by your implementation, because this is used when a Sync
Pipe first starts up. The Serializable
in this case is the same
type that is returned by getStartpoint()
; the Sync Server persists
it and passes it back in on a restart.
This method can be called from two different contexts:
getNextBatchOfChanges(int, AtomicLong)
)options
- an object which indicates where exactly to start synchronizing
(e.g. the end of the changelog, specific change number, a certain
time ago, etc)EndpointException
- if there is any error while setting the start pointpublic abstract java.io.Serializable getStartpoint()
acknowledgeCompletedOps(LinkedList)
.
This method is called periodically and the return value is saved in the persistent state for the Sync Pipe that uses this extension as its Sync Source.
IMPORTANT: The internal value for the startpoint should only be
updated after a sync operation is acknowledged back to this script (via
acknowledgeCompletedOps(LinkedList)
).
Otherwise it will be possible for changes to be missed when the
Data Sync Server is restarted or a connection error occurs.
setStartpoint(SetStartpointOptions)
when the sync pipe starts up.public abstract java.util.List<ChangeRecord> getNextBatchOfChanges(int maxChanges, java.util.concurrent.atomic.AtomicLong numStillPending) throws EndpointException
On the first invocation, this should return changes starting from the
startpoint that was set by
setStartpoint(SetStartpointOptions)
. This method is also
responsible for updating the internal state such that subsequent
invocations do not return duplicate changes.
The resulting list should be limited by maxChanges
. The
numStillPending
reference should be set to the estimated
number of changes that haven't yet been retrieved from the source endpoint
when this method returns, or zero if all the current changes have been
retrieved.
IMPORTANT: While this method needs to keep track of which changes
have already been returned so that it does not return them again, it should
NOT modify the official startpoint. The internal value for the
startpoint should only be updated after a sync operation is acknowledged
back to this script (via
acknowledgeCompletedOps(LinkedList)
).
Otherwise it will be possible for changes to be missed when the
Data Sync Server is restarted or a connection error occurs. The
startpoint should not change as a result of this method.
This method does not need to be thread-safe. It will be invoked repeatedly by a single thread, based on the polling interval set in the Sync Pipe configuration.
maxChanges
- the maximum number of changes to retrievenumStillPending
- this should be set to the number of unretrieved changes that
are still pending after this batch has been retrieved. This will
be passed in
as zero, and may be left that way if the actual value cannot be
determined.ChangeRecord
instances, each
corresponding to a single change at the source endpoint.
If there are no new changes to return, this method should return
an empty list.EndpointException
- if there is any error while retrieving the next batch of changespublic abstract Entry fetchEntry(SyncOperation operation) throws EndpointException
ChangeRecord
that is passed in through the
SyncOperation
. This method should perform any queries necessary to
gather the latest values for all the attributes to be synchronized.
This method must be thread safe, as it will be called repeatedly and concurrently by each of the Sync Pipe worker threads as they process entries.
If the original ChangeRecord has the full entry already set on it (which
can be done using ChangeRecord.Builder#fullEntry(Entry)
,
then this method will not get called, and the Sync Server will
automatically use the full entry from the ChangeRecord. In this case, the
implementation can always return null
.
operation
- the SyncOperation which identifies the source "entry" to
fetch. The ChangeRecord can be obtained by calling
operation.getChangeRecord()
.
These ChangeRecords are generated by
getNextBatchOfChanges(int, AtomicLong)
or by
listAllEntries(BlockingQueue)
.EndpointException
- if there is an error fetching the entrypublic abstract void acknowledgeCompletedOps(java.util.LinkedList<SyncOperation> completedOps) throws EndpointException
setStartpoint(SetStartpointOptions)
and is
returned by getStartpoint()
.
IMPORTANT: The internal value for the startpoint should only be updated after a sync operation is acknowledged back to this extension (via this method). Otherwise it will be possible for changes to be missed when the Data Sync Server is restarted or a connection error occurs.
completedOps
- a list of SyncOperation
s that have finished processing.
The records are listed in the order they were first detected.EndpointException
- if there is an error acknowledging the changes back to the
databasepublic void listAllEntries(java.util.concurrent.BlockingQueue<ChangeRecord> outputQueue) throws EndpointException
UnsupportedOperationException
; subclasses should override if the
resync functionality is needed.
The outputQueue
should contain ChangeRecord
objects
with the ChangeType
set to null
to indicate that
these are resync operations.
This method should not return until all the entries at the source
have been added to the output queue. Separate threads will concurrently
drain entries from the queue and process them. The queue typically should
not contain full entries, but rather ChangeRecord objects which identify
the full source entries. These objects are then individually passed in to
fetchEntry(SyncOperation)
. Therefore, it is important to make sure
that the ChangeRecord instances contain enough identifiable information
(e.g. primary keys) for each entry so that the entry can be found again.
The lifecycle of resync is similar to that of real-time sync, with a few differences:
Alternatively, the full entry can be set on the ChangeRecord within this method, which will cause the "fetch full entry" step to be skipped. In this case the Sync Server will just use the entry specified on the ChangeRecord.
If the total set of entries is very large, it is fine to split up the work into multiple network queries within this method. The queue will not grow out of control because it blocks when it becomes full. The queue capacity is fixed at 1000.
outputQueue
- a queue of ChangeRecord objects which will be individually
fetched via fetchEntry(SyncOperation)
EndpointException
- if there is an error retrieving the list of entries to resyncpublic void listAllEntries(java.util.Iterator<java.lang.String> inputLines, java.util.concurrent.BlockingQueue<ChangeRecord> outputQueue) throws EndpointException
UnsupportedOperationException
; subclasses should override
if the resync functionality is needed for specific records, which
can be specified in the input file.
The format for the inputLines
(e.g. the content of the file)
is user-defined; it may be key/value pairs, primary keys, or full SQL
statements, for example. The use of this method is triggered via the
--sourceInputFile argument on the resync CLI. The
outputQueue
should contain ChangeRecord
objects with the ChangeType
set to null
to
indicate that these are resync operations.
This method should not return until all the entries specified by the input
file have been added to the output queue. Separate threads will
concurrently drain entries from the queue and process them. The queue
typically should not contain full entries, but rather ChangeRecord
objects which identify the full source entries. These objects are then
individually passed in to fetchEntry(SyncOperation)
. Therefore,
it is important to make sure that the ChangeRecord instances
contain enough identifiable information (e.g. primary keys) for each entry
so that the entry can be found again.
The lifecycle of resync is similar to that of real-time sync, with a few differences:
Alternatively, the full entry can be set on the ChangeRecord within this method, which will cause the "fetch full entry" step to be skipped. In this case the Sync Server will just use the entry specified on the ChangeRecord.
If the total set of entries is very large, it is fine to split up the work into multiple network queries within this method. The queue will not grow out of control because it blocks when it becomes full. The queue capacity is fixed at 1000.
inputLines
- an Iterator containing the lines from the specified input file to
resync (this is specified on the CLI for the resync command).
These lines can be any format, for example a set of primary keys,
a set of WHERE clauses, a set of full SQL queries, etc.outputQueue
- a queue of ChangeRecord objects which will be individually
fetched via fetchEntry(SyncOperation)
EndpointException
- if there is an error retrieving the list of entries to resync