sanitize-log

Description Examples Arguments

Description

Sanitize the contents of a server log file to remove potentially sensitive information while still attempting to retain enough information to make it useful for diagnosing problems or understanding load patterns. The sanitization process operates on fields that consist of name-value pairs. The field name is always preserved, but field values might be tokenized or redacted if they might include sensitive information. Supported log file types include the file-based access, error, sync, and resync logs, as well as the operation timing access log and the detailed HTTP operation log. Sanitize the audit log using the scramble-ldif tool.

Examples

Write a sanitized version of log file 'logs/access' into the file 'logs/access.sanitized', preserving any comments that may be included in the log file:

sanitize-log --inputFile logs/access --outputFile logs/access.sanitized \
     --preserveComments

Write a sanitized version of log file 'logs/access' into the file 'logs/access.sanitized', displaying any unparseable lines, and changing the sanitization behavior so that the 'pipe' field is tokenized rather than preserved, and so that the 'instanceName' field is redacted rather than tokenized:

sanitize-log --inputFile logs/access --outputFile logs/access.sanitized \
     --displayUnparseableLines --tokenizeField pipe --redactField instanceName

Write a sanitized version of JSON-formatted log file 'logs/errors.json' into the file 'logs/errors.json.sanitized':

sanitize-log --inputFile logs/errors.json \
     --outputFile logs/errors.json.sanitized --json

Arguments

-V
--version

Description Display Directory Server version information

-H
--help

Description Display general usage information

--help-debug

Description Display help for using debug options
Advanced Yes

-i {path}
--inputFile {path}

Description The path to the log file containing the data to be sanitized
Required Yes
Multi-Valued No

--inputEncryptionPassphraseFile {path}

Description The path to a file that contains the encryption passphrase needed to decrypt the input file if it is encrypted. If the input file is encrypted and this argument is not provided, then the tool will interactively prompt for the encryption passphrase. If a passphrase file is provided, then it must contain exactly one line that consists entirely of the passphrase
Required No
Multi-Valued No

-j
--json

Description Indicates that the log file is JSON-formatted

-o {path}
--outputFile {path}

Description The path to the log file to which the sanitized log data should be written. If this is not specified, then the output file will use the same name as the input file, but with a '.sanitized' extension
Required No
Multi-Valued No

--compressOutput

Description GZIP-Compress the data written to the output file

--encryptOutput

Description Encrypt the data written to the output file. if the --outputEncryptionPassphraseFile argument is provided, then that passphrase will be used to encrypt the file; otherwise, the tool will interactively prompt for the passphrase

--outputEncryptionPassphraseFile {path}

Description The path to a file that contains the encryption passphrase needed to encrypt the output file if the --encryptOutput argument is provided. If the output is to be encrypted but no passphrase file is provided, then the tool will interactively prompt for the passphrase. If a passphrase file is given, then it must contain exactly one line that consists entirely of the passphrase
Required No
Multi-Valued No

-c
--preserveComments

Description Indicates that comment lines (i.e., any line beginning with the '#' character) and blank lines should be included in the sanitized output. If comment lines are to be included, then they will not be altered

-d
--displayUnparseableLines

Description Display a message for each line contained in the log file that cannot be parsed as a valid log message

--logFieldBehaviorConfiguration {name}

Description The name of the log field behavior object in the server's configuration that specifies the behaviors that the tool should use when sanitizing log fields. If this argument is provided, then the server's default syntax configuration will also be used
Required No
Multi-Valued No

--defaultBooleanFieldBehavior {behavior}

Description The default behavior that the tool should use for any fields whose values are either 'true' or 'false' and for which no field-specific behavior is defined. Allowed values are 'preserve', 'omit', 'redact-entire-value', 'redact-value-components', 'tokenize-entire-value', and 'tokenize-value-components'. If this is not specified, the behavior specified in the --defaultUndefinedBehavior argument will be used
Required No
Multi-Valued No

--defaultDNFieldBehavior {behavior}

Description The default behavior that the tool should use for any fields whose values are LDAP distinguished names and for which no field-specific behavior is defined. Allowed values are 'preserve', 'omit', 'redact-entire-value', 'redact-value-components', 'tokenize-entire-value', and 'tokenize-value-components'. If this is not specified, the behavior specified in the --defaultUndefinedBehavior argument will be used
Required No
Multi-Valued No

--defaultFilterFieldBehavior {behavior}

Description The default behavior that the tool should use for any fields whose values are LDAP filters and for which no field-specific behavior is defined. Allowed values are 'preserve', 'omit', 'redact-entire-value', 'redact-value-components', 'tokenize-entire-value', and 'tokenize-value-components'. If this is not specified, the behavior specified in the --defaultUndefinedBehavior argument will be used
Required No
Multi-Valued No

--defaultFloatingPointFieldBehavior {behavior}

Description The default behavior that the tool should use for any fields whose values are floating-point numbers and for which no field-specific behavior is defined. Allowed values are 'preserve', 'omit', 'redact-entire-value', 'redact-value-components', 'tokenize-entire-value', and 'tokenize-value-components'. If this is not specified, the behavior specified in the --defaultUndefinedBehavior argument will be used
Required No
Multi-Valued No

--defaultIntegerFieldBehavior {behavior}

Description The default behavior that the tool should use for any fields whose values are integers and for which no field-specific behavior is defined. Allowed values are 'preserve', 'omit', 'redact-entire-value', 'redact-value-components', 'tokenize-entire-value', and 'tokenize-value-components'. If this is not specified, the behavior specified in the --defaultUndefinedBehavior argument will be used
Required No
Multi-Valued No

--defaultJSONFieldBehavior {behavior}

Description The default behavior that the tool should use for any fields whose values are JSON objects and for which no field-specific behavior is defined. Allowed values are 'preserve', 'omit', 'redact-entire-value', 'redact-value-components', 'tokenize-entire-value', and 'tokenize-value-components'. If this is not specified, the behavior specified in the --defaultUndefinedBehavior argument will be used
Required No
Multi-Valued No

--defaultStringFieldBehavior {behavior}

Description The default behavior that the tool should use for any fields whose values are strings and for which no field-specific behavior is defined. Allowed values are 'preserve', 'omit', 'redact-entire-value', 'redact-value-components', 'tokenize-entire-value', and 'tokenize-value-components'. If this is not specified, the behavior specified in the --defaultUndefinedBehavior argument will be used
Required No
Multi-Valued No

--defaultStringListFieldBehavior {behavior}

Description The default behavior that the tool should use for any fields whose values are lists of strings and for which no field-specific behavior is defined. Allowed values are 'preserve', 'omit', 'redact-entire-value', 'redact-value-components', 'tokenize-entire-value', and 'tokenize-value-components'. If this is not specified, the behavior specified in the --defaultUndefinedBehavior argument will be used
Required No
Multi-Valued No

--defaultTimestampFieldBehavior {behavior}

Description The default behavior that the tool should use for any fields whose values are timestamps in either the generalized time or RFC 3339 formats and for which no field-specific behavior is defined. Allowed values are 'preserve', 'omit', 'redact-entire-value', 'redact-value-components', 'tokenize-entire-value', and 'tokenize-value-components'. If this is not specified, the behavior specified in the --defaultUndefinedBehavior argument will be used
Required No
Multi-Valued No

-p {name}
--preserveField {name}

Description The name of a log field whose value should be preserved without alteration
Required No
Multi-Valued Yes

--omitField {name}

Description The name of a log field that should be completely omitted from the sanitized output. If a field is omitted, then neither the field name nor its value will be included
Required No
Multi-Valued Yes

--redactEntireValueField {name}

Description The name of a log field whose value should be completely redacted in the sanitized output. The entire value will be replaced with a fixed string (based on the syntax for that field) that does not have any relation to the original value for the field
Required No
Multi-Valued Yes

-r {name}
--redactValueComponentsField {name}

Description The name of a log field whose values should be redacted in a manner that will attempt to preserve the format for the value but will redact any potentially sensitive information in that value. This primarily applies to fields whose values are LDAP DNs and filters (in which case attribute names will be preserved but attribute values will be redacted) and JSON objects (in which JSON field names will be preserved but field values will be redacted). For values of fields with other syntaxes, the entire value will be redacted
Required No
Multi-Valued Yes

--tokenizeEntireValueField {name}

Description The name of a log field whose value should be completely tokenized in the sanitized output. The entire value will be replaced with a value that is generated from the original field value, but that cannot be reversed in a way that will reveal the original value. If the same value appears multiple times in the log, the same token will be used for each occurrence of that value so it will be possible to correlate sanitized information across log messages
Required No
Multi-Valued Yes

-t {name}
--tokenizeValueComponentsField {name}

Description The name of a log field whose values should be tokenized in a manner that will attempt to preserve the format for the value but will tokenize any potentially sensitive information in that value. This primarily applies to fields whose values are LDAP DNs and filters (in which case attribute names will be preserved but attribute values will be tokenized) and JSON objects (in which JSON field names will be preserved but field values will be tokenized). For values of fields with other syntaxes, the entire value will be tokenized
Required No
Multi-Valued Yes

--defaultUndefinedFieldBehavior {behavior}

Description The default behavior that should be used for any log fields for which no explicit or default behavior is defined, and for which no syntax-specific default behavior has been defined. Allowed values are 'preserve', 'omit', 'redact-entire-value', 'redact-value-components', 'tokenize-entire-value', and 'tokenize-value-components'. If this is not specified, a default behavior of 'redact-entire-value' will be used
Required No
Multi-Valued No

--displayDefaultFieldBehaviors

Description Display a list of the behaviors that will be used for log fields by default. If the --json argument is also provided, then the default field behaviors for JSON-formatted messages will be displayed; otherwise, the default behaviors for text-formatted messages will be displayed

--suppressDefaultFieldBehaviors

Description Suppress the default log field behaviors that the tool normally uses. If this argument is provided, then only the behaviors explicitly specified on the command line (whether based on field name or syntax) will be used

--interactive

Description Launch the tool in interactive mode.

--propertiesFilePath {path}

Description The path to a properties file used to specify default values for arguments not supplied on the command line.
Required No
Multi-Valued No

--generatePropertiesFile {path}

Description Write an empty properties file that may be used to specify default values for arguments.
Required No
Multi-Valued No

--noPropertiesFile

Description Do not obtain any argument values from a properties file.

--suppressPropertiesFileComment

Description Suppress output listing the arguments obtained from a properties file.