Data Governance Server Documentation Index
Command-Line Tool Reference Home

sanitize-log

Description
Examples
Arguments

Description

Sanitize the contents of a server log file to remove potentially sensitive information while still attempting to retain enough information to make it useful for diagnosing problems or understanding load patterns. The sanitization process operates on fields that consist of name-value pairs. The field name is always preserved, but field values might be tokenized or redacted if they might include sensitive information. Supported log file types include the file-based access, error, sync, and resync logs, as well as the operation timing access log and the detailed HTTP operation log. Sanitize the audit log using the scramble-ldif tool.

Examples

Write a sanitized version of log file 'logs/access' into the file 'logs/access.sanitized', preserving any comments that may be included in the log file:
sanitize-log --inputFile logs/access --outputFile logs/access.sanitized \
     --preserveComments


Write a sanitized version of log file 'logs/access' into the file 'logs/access.sanitized', displaying any unparseable lines, and changing the sanitization behavior so that the 'pipe' field is tokenized rather than preserved, and so that the 'instanceName' field is redacted rather than tokenized:
sanitize-log --inputFile logs/access --outputFile logs/access.sanitized \
     --displayUnparseableLines --tokenizeField pipe --redactField instanceName


Write a sanitized version of JSON-formatted log file 'logs/errors.json' into the file 'logs/errors.json.sanitized':
sanitize-log --inputFile logs/errors.json \
     --outputFile logs/errors.json.sanitized --json

Arguments

-V
--version

Description Display Data Governance Server version information

-H
--help

Description Display general usage information

--help-debug

Description Display help for using debug options
Advanced Yes

-i {path}
--inputFile {path}

Description The path to the log file containing the data to be sanitized
Required Yes
Multi-Valued No

--inputEncryptionPassphraseFile {path}

Description The path to a file that contains the encryption passphrase needed to decrypt the input file if it is encrypted. If the input file is encrypted and this argument is not provided, then the tool will interactively prompt for the encryption passphrase. If a passphrase file is provided, then it must contain exactly one line that consists entirely of the passphrase
Required No
Multi-Valued No

-o {path}
--outputFile {path}

Description The path to the log file to which the sanitized log data should be written. If this is not specified, then the output file will use the same name as the input file, but with a '.sanitized' extension
Required No
Multi-Valued No

--compressOutput

Description GZIP-Compress the data written to the output file

--encryptOutput

Description Encrypt the data written to the output file. if the --outputEncryptionPassphraseFile argument is provided, then that passphrase will be used to encrypt the file; otherwise, the tool will interactively prompt for the passphrase

--outputEncryptionPassphraseFile {path}

Description The path to a file that contains the encryption passphrase needed to encrypt the output file if the --encryptOutput argument is provided. If the output is to be encrypted but no passphrase file is provided, then the tool will interactively prompt for the passphrase. If a passphrase file is given, then it must contain exactly one line that consists entirely of the passphrase
Required No
Multi-Valued No

-j
--json

Description Indicates that the log file is JSON-formatted

-p {name}
--preserveField {name}

Description The name of a log field whose value should be preserved without alteration. The default set of fields to preserve is: authenticationFailureReason, authFailureID, authorizationType, assuranceTimeoutMillis, assuredReplicationRequirements, attr, attribute, attrs, attributes, attrsReturned, authType, authenticationType, category, changeNumber, changeToSoftDeletedEntry, cipher, class, clientConnectionPolicy, conn, connectionID, deleteOldRDN, deref, dereferenceAliases, disconnectReason, entriesAddedToTarget, entriesDeletedFromSource, entriesReadFromSource, entriesReturned, etime, id, idToAbandon, indexesWithKeysAccessedExceedingEntryLimit, indexesWithKeysAccessedNearEntryLimit, intermediateResponsesReturned, isIndexed, localAssuranceLevel, localAssuranceSatisfied, messageID, messageType, method, missingPrivileges, mostExpensiveAggregatePhase, mostExpensiveAggregatePhaseTimeMicros, mostExpensivePhase, mostExpensivePhaseTimeMicros, msgID, name, negotiationProperties, oid, op, opID, operationID, operationType, origin, originDetails, phaseTimesMicros, pipe, preAuthorizationUsedPrivileges, processingTimeMillis, product, protocol, protocolVersion, qtime, rebalancingOp, rebalancingOperationID, remoteAssuranceLevel, remoteAssuranceSatisfied, replicaID, replicationChangeID, replicationCSN, replicationServerID, requestContentLength, requestContentType, requestControlOIDs, requestControls, requestCookieName, requestHeaderName, requestID, requestOID, requestParameterName, requestProtocol, requestType, requestedAttributes, requestedSizeLimit, requestedTimeSizeLimitSSeconds, responseContentLength, responseContentType, responseControlOIDs, responseControls, responseCookieName, responseDelayedByAssurance, responseHeaderName, responseOID, responseType, resultCode, resultCodeName, retiredPasswordUsed, saslMechanism, scope, scopeName, serverAssuranceResults, severity, sizeLimit, sourceAltered, sourceServerAltered, startupID, statusCode, syncClass, threadID, targetAltered, targetProtocol, targetServerAltered, timestamp, timeLimit, triggeredByConn, triggeredByConnectionID, triggeredByOp, triggeredByOperationID, typesOnly, uncachedDataAccessed, unindexed, usedPrivileges, usingAdminSessionWorkerThread, version, workQueueWaitTimeMillis
Required No
Multi-Valued Yes

-t {name}
--tokenizeField {name}

Description The name of a log field whose value should be tokenized. If the value of the specified field appears to be a DN or filter, then the attribute names will be preserved and only the values will be tokenized. If the value of the field does not appear to be a DN or filter, then the entire value will be tokenized. The default set of fields to tokenize is: authDN, authenticationDN, authorizationDN, authzDN, autoAuthenticatedAs, base, baseDN, dn, filter, from, fromAddress, instanceName, matchedDN, newRDN, newSuperior, redirectURI, requesterDN, requesterIP, softDeleteEntryDN, sourceBackendSet, sourceServer, targetBackendSet, targetHost, targetPort, targetServer, to, toAddress, toPort, undeleteFromDN, undeletedFromDN, url
Required No
Multi-Valued Yes

-r {name}
--redactField {name}

Description The name of a log field whose value should be redacted. Any field not configured to be preserved or tokenized will be redacted
Required No
Multi-Valued Yes

-c
--preserveComments

Description Indicates that comment lines (i.e., any line beginning with the '#' character) and blank lines should be included in the sanitized output. If comment lines are to be included, then they will not be altered

-d
--displayUnparseableLines

Description Display a message for each line contained in the log file that cannot be parsed as a valid log message

--interactive

Description Launch the tool in interactive mode.

--propertiesFilePath {path}

Description The path to a properties file used to specify default values for arguments not supplied on the command line.
Required No
Multi-Valued No

--generatePropertiesFile {path}

Description Write an empty properties file that may be used to specify default values for arguments.
Required No
Multi-Valued No

--noPropertiesFile

Description Do not obtain any argument values from a properties file.

--suppressPropertiesFileComment

Description Suppress output listing the arguments obtained from a properties file.