Sanitize the contents of a server log file to remove potentially sensitive information while still attempting to retain enough information to make it useful for diagnosing problems or understanding load patterns. The sanitization process operates on fields that consist of name-value pairs. The field name is always preserved, but field values might be tokenized or redacted if they might include sensitive information. Supported log file types include the file-based access, error, sync, and resync logs, as well as the operation timing access log and the detailed HTTP operation log. Sanitize the audit log using the scramble-ldif tool.
sanitize-log --inputFile logs/access --outputFile logs/access.sanitized \ --preserveComments
sanitize-log --inputFile logs/access --outputFile logs/access.sanitized \ --displayUnparseableLines --tokenizeField pipe --redactField instanceName
sanitize-log --inputFile logs/errors.json \ --outputFile logs/errors.json.sanitized --json
-V
--version
Description | Display Data Governance Server version information |
-H
--help
Description | Display general usage information |
--help-debug
Description | Display help for using debug options |
Advanced | Yes |
-i {path}
--inputFile {path}
Description | The path to the log file containing the data to be sanitized |
Required | Yes |
Multi-Valued | No |
--inputEncryptionPassphraseFile {path}
Description | The path to a file that contains the encryption passphrase needed to decrypt the input file if it is encrypted. If the input file is encrypted and this argument is not provided, then the tool will interactively prompt for the encryption passphrase. If a passphrase file is provided, then it must contain exactly one line that consists entirely of the passphrase |
Required | No |
Multi-Valued | No |
-o {path}
--outputFile {path}
Description | The path to the log file to which the sanitized log data should be written. If this is not specified, then the output file will use the same name as the input file, but with a '.sanitized' extension |
Required | No |
Multi-Valued | No |
--compressOutput
Description | GZIP-Compress the data written to the output file |
--encryptOutput
Description | Encrypt the data written to the output file. if the --outputEncryptionPassphraseFile argument is provided, then that passphrase will be used to encrypt the file; otherwise, the tool will interactively prompt for the passphrase |
--outputEncryptionPassphraseFile {path}
Description | The path to a file that contains the encryption passphrase needed to encrypt the output file if the --encryptOutput argument is provided. If the output is to be encrypted but no passphrase file is provided, then the tool will interactively prompt for the passphrase. If a passphrase file is given, then it must contain exactly one line that consists entirely of the passphrase |
Required | No |
Multi-Valued | No |
-j
--json
Description | Indicates that the log file is JSON-formatted |
-p {name}
--preserveField {name}
Description | The name of a log field whose value should be preserved without alteration. The default set of fields to preserve is: authenticationFailureReason, authFailureID, authorizationType, assuranceTimeoutMillis, assuredReplicationRequirements, attr, attribute, attrs, attributes, attrsReturned, authType, authenticationType, category, changeNumber, changeToSoftDeletedEntry, cipher, class, clientConnectionPolicy, conn, connectionID, deleteOldRDN, deref, dereferenceAliases, disconnectReason, entriesAddedToTarget, entriesDeletedFromSource, entriesReadFromSource, entriesReturned, etime, id, idToAbandon, indexesWithKeysAccessedExceedingEntryLimit, indexesWithKeysAccessedNearEntryLimit, intermediateResponsesReturned, isIndexed, localAssuranceLevel, localAssuranceSatisfied, messageID, messageType, method, missingPrivileges, mostExpensiveAggregatePhase, mostExpensiveAggregatePhaseTimeMicros, mostExpensivePhase, mostExpensivePhaseTimeMicros, msgID, name, negotiationProperties, oid, op, opID, operationID, operationType, origin, originDetails, phaseTimesMicros, pipe, preAuthorizationUsedPrivileges, processingTimeMillis, product, protocol, protocolVersion, qtime, rebalancingOp, rebalancingOperationID, remoteAssuranceLevel, remoteAssuranceSatisfied, replicaID, replicationChangeID, replicationCSN, replicationServerID, requestContentLength, requestContentType, requestControlOIDs, requestControls, requestCookieName, requestHeaderName, requestID, requestOID, requestParameterName, requestProtocol, requestType, requestedAttributes, requestedSizeLimit, requestedTimeSizeLimitSSeconds, responseContentLength, responseContentType, responseControlOIDs, responseControls, responseCookieName, responseDelayedByAssurance, responseHeaderName, responseOID, responseType, resultCode, resultCodeName, retiredPasswordUsed, saslMechanism, scope, scopeName, serverAssuranceResults, severity, sizeLimit, sourceAltered, sourceServerAltered, startupID, statusCode, syncClass, threadID, targetAltered, targetProtocol, targetServerAltered, timestamp, timeLimit, triggeredByConn, triggeredByConnectionID, triggeredByOp, triggeredByOperationID, typesOnly, uncachedDataAccessed, unindexed, usedPrivileges, usingAdminSessionWorkerThread, version, workQueueWaitTimeMillis |
Required | No |
Multi-Valued | Yes |
-t {name}
--tokenizeField {name}
Description | The name of a log field whose value should be tokenized. If the value of the specified field appears to be a DN or filter, then the attribute names will be preserved and only the values will be tokenized. If the value of the field does not appear to be a DN or filter, then the entire value will be tokenized. The default set of fields to tokenize is: authDN, authenticationDN, authorizationDN, authzDN, autoAuthenticatedAs, base, baseDN, dn, filter, from, fromAddress, instanceName, matchedDN, newRDN, newSuperior, redirectURI, requesterDN, requesterIP, softDeleteEntryDN, sourceBackendSet, sourceServer, targetBackendSet, targetHost, targetPort, targetServer, to, toAddress, toPort, undeleteFromDN, undeletedFromDN, url |
Required | No |
Multi-Valued | Yes |
-r {name}
--redactField {name}
Description | The name of a log field whose value should be redacted. Any field not configured to be preserved or tokenized will be redacted |
Required | No |
Multi-Valued | Yes |
-c
--preserveComments
Description | Indicates that comment lines (i.e., any line beginning with the '#' character) and blank lines should be included in the sanitized output. If comment lines are to be included, then they will not be altered |
-d
--displayUnparseableLines
Description | Display a message for each line contained in the log file that cannot be parsed as a valid log message |
--interactive
Description | Launch the tool in interactive mode. |
--propertiesFilePath {path}
Description | The path to a properties file used to specify default values for arguments not supplied on the command line. |
Required | No |
Multi-Valued | No |
--generatePropertiesFile {path}
Description | Write an empty properties file that may be used to specify default values for arguments. |
Required | No |
Multi-Valued | No |
--noPropertiesFile
Description | Do not obtain any argument values from a properties file. |
--suppressPropertiesFileComment
Description | Suppress output listing the arguments obtained from a properties file. |