Class: Google::Privacy::Dlp::V2::PrivacyMetric

Inherits:

Object

Object
Google::Privacy::Dlp::V2::PrivacyMetric

show all

Defined in:: lib/google/cloud/dlp/v2/doc/google/privacy/dlp/v2/dlp.rb

Overview

Privacy metric to compute for reidentification risk analysis.

Defined Under Namespace

Classes: CategoricalStatsConfig, KAnonymityConfig, KMapEstimationConfig, LDiversityConfig, NumericalStatsConfig

Instance Attribute Summary collapse

#categorical_stats_config ⇒ Google::Privacy::Dlp::V2::PrivacyMetric::CategoricalStatsConfig
#k_anonymity_config ⇒ Google::Privacy::Dlp::V2::PrivacyMetric::KAnonymityConfig
#k_map_estimation_config ⇒ Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig
#l_diversity_config ⇒ Google::Privacy::Dlp::V2::PrivacyMetric::LDiversityConfig
#numerical_stats_config ⇒ Google::Privacy::Dlp::V2::PrivacyMetric::NumericalStatsConfig

Instance Attribute Details

#categorical_stats_config ⇒ `Google::Privacy::Dlp::V2::PrivacyMetric::CategoricalStatsConfig`

Returns:

(Google::Privacy::Dlp::V2::PrivacyMetric::CategoricalStatsConfig)

# File 'lib/google/cloud/dlp/v2/doc/google/privacy/dlp/v2/dlp.rb', line 584

class PrivacyMetric
  # Compute numerical stats over an individual column, including
  # min, max, and quantiles.
  # @!attribute [rw] field
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Field to compute numerical stats on. Supported types are
  #     integer, float, date, datetime, timestamp, time.
  class NumericalStatsConfig; end

  # Compute numerical stats over an individual column, including
  # number of distinct values and value count distribution.
  # @!attribute [rw] field
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Field to compute categorical stats on. All column types are
  #     supported except for arrays and structs. However, it may be more
  #     informative to use NumericalStats when the field type is supported,
  #     depending on the data.
  class CategoricalStatsConfig; end

  # k-anonymity metric, used for analysis of reidentification risk.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::FieldId>]
  #     Set of fields to compute k-anonymity over. When multiple fields are
  #     specified, they are considered a single composite key. Structs and
  #     repeated data types are not supported; however, nested fields are
  #     supported so long as they are not structs themselves or nested within
  #     a repeated field.
  # @!attribute [rw] entity_id
  #   @return [Google::Privacy::Dlp::V2::EntityId]
  #     Optional message indicating that multiple rows might be associated to a
  #     single individual. If the same entity_id is associated to multiple
  #     quasi-identifier tuples over distict rows, we consider the entire
  #     collection of tuples as the composite quasi-identifier. This collection
  #     is a multiset: the order in which the different tuples appear in the
  #     dataset is ignored, but their frequency is taken into account.
  #
  #     Important note: a maximum of 1000 rows can be associated to a single
  #     entity ID. If more rows are associated with the same entity ID, some
  #     might be ignored.
  class KAnonymityConfig; end

  # l-diversity metric, used for analysis of reidentification risk.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::FieldId>]
  #     Set of quasi-identifiers indicating how equivalence classes are
  #     defined for the l-diversity computation. When multiple fields are
  #     specified, they are considered a single composite key.
  # @!attribute [rw] sensitive_attribute
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Sensitive field for computing the l-value.
  class LDiversityConfig; end

  # Reidentifiability metric. This corresponds to a risk model similar to what
  # is called "journalist risk" in the literature, except the attack dataset is
  # statistically modeled instead of being perfectly known. This can be done
  # using publicly available data (like the US Census), or using a custom
  # statistical model (indicated as one or several BigQuery tables), or by
  # extrapolating from the distribution of values in the input dataset.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::TaggedField>]
  #     Fields considered to be quasi-identifiers. No two columns can have the
  #     same tag. [required]
  # @!attribute [rw] region_code
  #   @return [String]
  #     ISO 3166-1 alpha-2 region code to use in the statistical modeling.
  #     Required if no column is tagged with a region-specific InfoType (like
  #     US_ZIP_5) or a region code.
  # @!attribute [rw] auxiliary_tables
  #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::AuxiliaryTable>]
  #     Several auxiliary tables can be used in the analysis. Each custom_tag
  #     used to tag a quasi-identifiers column must appear in exactly one column
  #     of one auxiliary table.
  class KMapEstimationConfig
    # A column with a semantic tag attached.
    # @!attribute [rw] field
    #   @return [Google::Privacy::Dlp::V2::FieldId]
    #     Identifies the column. [required]
    # @!attribute [rw] info_type
    #   @return [Google::Privacy::Dlp::V2::InfoType]
    #     A column can be tagged with a InfoType to use the relevant public
    #     dataset as a statistical model of population, if available. We
    #     currently support US ZIP codes, region codes, ages and genders.
    #     To programmatically obtain the list of supported InfoTypes, use
    #     ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
    # @!attribute [rw] custom_tag
    #   @return [String]
    #     A column can be tagged with a custom tag. In this case, the user must
    #     indicate an auxiliary table that contains statistical information on
    #     the possible values of this column (below).
    # @!attribute [rw] inferred
    #   @return [Google::Protobuf::Empty]
    #     If no semantic tag is indicated, we infer the statistical model from
    #     the distribution of values in the input data
    class TaggedField; end

    # An auxiliary table contains statistical information on the relative
    # frequency of different quasi-identifiers values. It has one or several
    # quasi-identifiers columns, and one column that indicates the relative
    # frequency of each quasi-identifier tuple.
    # If a tuple is present in the data but not in the auxiliary table, the
    # corresponding relative frequency is assumed to be zero (and thus, the
    # tuple is highly reidentifiable).
    # @!attribute [rw] table
    #   @return [Google::Privacy::Dlp::V2::BigQueryTable]
    #     Auxiliary table location. [required]
    # @!attribute [rw] quasi_ids
    #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::AuxiliaryTable::QuasiIdField>]
    #     Quasi-identifier columns. [required]
    # @!attribute [rw] relative_frequency
    #   @return [Google::Privacy::Dlp::V2::FieldId]
    #     The relative frequency column must contain a floating-point number
    #     between 0 and 1 (inclusive). Null values are assumed to be zero.
    #     [required]
    class AuxiliaryTable
      # A quasi-identifier column has a custom_tag, used to know which column
      # in the data corresponds to which column in the statistical model.
      # @!attribute [rw] field
      #   @return [Google::Privacy::Dlp::V2::FieldId]
      # @!attribute [rw] custom_tag
      #   @return [String]
      class QuasiIdField; end
    end
  end
end

#k_anonymity_config ⇒ `Google::Privacy::Dlp::V2::PrivacyMetric::KAnonymityConfig`

Returns:

(Google::Privacy::Dlp::V2::PrivacyMetric::KAnonymityConfig)

# File 'lib/google/cloud/dlp/v2/doc/google/privacy/dlp/v2/dlp.rb', line 584

class PrivacyMetric
  # Compute numerical stats over an individual column, including
  # min, max, and quantiles.
  # @!attribute [rw] field
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Field to compute numerical stats on. Supported types are
  #     integer, float, date, datetime, timestamp, time.
  class NumericalStatsConfig; end

  # Compute numerical stats over an individual column, including
  # number of distinct values and value count distribution.
  # @!attribute [rw] field
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Field to compute categorical stats on. All column types are
  #     supported except for arrays and structs. However, it may be more
  #     informative to use NumericalStats when the field type is supported,
  #     depending on the data.
  class CategoricalStatsConfig; end

  # k-anonymity metric, used for analysis of reidentification risk.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::FieldId>]
  #     Set of fields to compute k-anonymity over. When multiple fields are
  #     specified, they are considered a single composite key. Structs and
  #     repeated data types are not supported; however, nested fields are
  #     supported so long as they are not structs themselves or nested within
  #     a repeated field.
  # @!attribute [rw] entity_id
  #   @return [Google::Privacy::Dlp::V2::EntityId]
  #     Optional message indicating that multiple rows might be associated to a
  #     single individual. If the same entity_id is associated to multiple
  #     quasi-identifier tuples over distict rows, we consider the entire
  #     collection of tuples as the composite quasi-identifier. This collection
  #     is a multiset: the order in which the different tuples appear in the
  #     dataset is ignored, but their frequency is taken into account.
  #
  #     Important note: a maximum of 1000 rows can be associated to a single
  #     entity ID. If more rows are associated with the same entity ID, some
  #     might be ignored.
  class KAnonymityConfig; end

  # l-diversity metric, used for analysis of reidentification risk.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::FieldId>]
  #     Set of quasi-identifiers indicating how equivalence classes are
  #     defined for the l-diversity computation. When multiple fields are
  #     specified, they are considered a single composite key.
  # @!attribute [rw] sensitive_attribute
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Sensitive field for computing the l-value.
  class LDiversityConfig; end

  # Reidentifiability metric. This corresponds to a risk model similar to what
  # is called "journalist risk" in the literature, except the attack dataset is
  # statistically modeled instead of being perfectly known. This can be done
  # using publicly available data (like the US Census), or using a custom
  # statistical model (indicated as one or several BigQuery tables), or by
  # extrapolating from the distribution of values in the input dataset.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::TaggedField>]
  #     Fields considered to be quasi-identifiers. No two columns can have the
  #     same tag. [required]
  # @!attribute [rw] region_code
  #   @return [String]
  #     ISO 3166-1 alpha-2 region code to use in the statistical modeling.
  #     Required if no column is tagged with a region-specific InfoType (like
  #     US_ZIP_5) or a region code.
  # @!attribute [rw] auxiliary_tables
  #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::AuxiliaryTable>]
  #     Several auxiliary tables can be used in the analysis. Each custom_tag
  #     used to tag a quasi-identifiers column must appear in exactly one column
  #     of one auxiliary table.
  class KMapEstimationConfig
    # A column with a semantic tag attached.
    # @!attribute [rw] field
    #   @return [Google::Privacy::Dlp::V2::FieldId]
    #     Identifies the column. [required]
    # @!attribute [rw] info_type
    #   @return [Google::Privacy::Dlp::V2::InfoType]
    #     A column can be tagged with a InfoType to use the relevant public
    #     dataset as a statistical model of population, if available. We
    #     currently support US ZIP codes, region codes, ages and genders.
    #     To programmatically obtain the list of supported InfoTypes, use
    #     ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
    # @!attribute [rw] custom_tag
    #   @return [String]
    #     A column can be tagged with a custom tag. In this case, the user must
    #     indicate an auxiliary table that contains statistical information on
    #     the possible values of this column (below).
    # @!attribute [rw] inferred
    #   @return [Google::Protobuf::Empty]
    #     If no semantic tag is indicated, we infer the statistical model from
    #     the distribution of values in the input data
    class TaggedField; end

    # An auxiliary table contains statistical information on the relative
    # frequency of different quasi-identifiers values. It has one or several
    # quasi-identifiers columns, and one column that indicates the relative
    # frequency of each quasi-identifier tuple.
    # If a tuple is present in the data but not in the auxiliary table, the
    # corresponding relative frequency is assumed to be zero (and thus, the
    # tuple is highly reidentifiable).
    # @!attribute [rw] table
    #   @return [Google::Privacy::Dlp::V2::BigQueryTable]
    #     Auxiliary table location. [required]
    # @!attribute [rw] quasi_ids
    #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::AuxiliaryTable::QuasiIdField>]
    #     Quasi-identifier columns. [required]
    # @!attribute [rw] relative_frequency
    #   @return [Google::Privacy::Dlp::V2::FieldId]
    #     The relative frequency column must contain a floating-point number
    #     between 0 and 1 (inclusive). Null values are assumed to be zero.
    #     [required]
    class AuxiliaryTable
      # A quasi-identifier column has a custom_tag, used to know which column
      # in the data corresponds to which column in the statistical model.
      # @!attribute [rw] field
      #   @return [Google::Privacy::Dlp::V2::FieldId]
      # @!attribute [rw] custom_tag
      #   @return [String]
      class QuasiIdField; end
    end
  end
end

#k_map_estimation_config ⇒ `Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig`

Returns:

(Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig)

# File 'lib/google/cloud/dlp/v2/doc/google/privacy/dlp/v2/dlp.rb', line 584

class PrivacyMetric
  # Compute numerical stats over an individual column, including
  # min, max, and quantiles.
  # @!attribute [rw] field
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Field to compute numerical stats on. Supported types are
  #     integer, float, date, datetime, timestamp, time.
  class NumericalStatsConfig; end

  # Compute numerical stats over an individual column, including
  # number of distinct values and value count distribution.
  # @!attribute [rw] field
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Field to compute categorical stats on. All column types are
  #     supported except for arrays and structs. However, it may be more
  #     informative to use NumericalStats when the field type is supported,
  #     depending on the data.
  class CategoricalStatsConfig; end

  # k-anonymity metric, used for analysis of reidentification risk.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::FieldId>]
  #     Set of fields to compute k-anonymity over. When multiple fields are
  #     specified, they are considered a single composite key. Structs and
  #     repeated data types are not supported; however, nested fields are
  #     supported so long as they are not structs themselves or nested within
  #     a repeated field.
  # @!attribute [rw] entity_id
  #   @return [Google::Privacy::Dlp::V2::EntityId]
  #     Optional message indicating that multiple rows might be associated to a
  #     single individual. If the same entity_id is associated to multiple
  #     quasi-identifier tuples over distict rows, we consider the entire
  #     collection of tuples as the composite quasi-identifier. This collection
  #     is a multiset: the order in which the different tuples appear in the
  #     dataset is ignored, but their frequency is taken into account.
  #
  #     Important note: a maximum of 1000 rows can be associated to a single
  #     entity ID. If more rows are associated with the same entity ID, some
  #     might be ignored.
  class KAnonymityConfig; end

  # l-diversity metric, used for analysis of reidentification risk.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::FieldId>]
  #     Set of quasi-identifiers indicating how equivalence classes are
  #     defined for the l-diversity computation. When multiple fields are
  #     specified, they are considered a single composite key.
  # @!attribute [rw] sensitive_attribute
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Sensitive field for computing the l-value.
  class LDiversityConfig; end

  # Reidentifiability metric. This corresponds to a risk model similar to what
  # is called "journalist risk" in the literature, except the attack dataset is
  # statistically modeled instead of being perfectly known. This can be done
  # using publicly available data (like the US Census), or using a custom
  # statistical model (indicated as one or several BigQuery tables), or by
  # extrapolating from the distribution of values in the input dataset.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::TaggedField>]
  #     Fields considered to be quasi-identifiers. No two columns can have the
  #     same tag. [required]
  # @!attribute [rw] region_code
  #   @return [String]
  #     ISO 3166-1 alpha-2 region code to use in the statistical modeling.
  #     Required if no column is tagged with a region-specific InfoType (like
  #     US_ZIP_5) or a region code.
  # @!attribute [rw] auxiliary_tables
  #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::AuxiliaryTable>]
  #     Several auxiliary tables can be used in the analysis. Each custom_tag
  #     used to tag a quasi-identifiers column must appear in exactly one column
  #     of one auxiliary table.
  class KMapEstimationConfig
    # A column with a semantic tag attached.
    # @!attribute [rw] field
    #   @return [Google::Privacy::Dlp::V2::FieldId]
    #     Identifies the column. [required]
    # @!attribute [rw] info_type
    #   @return [Google::Privacy::Dlp::V2::InfoType]
    #     A column can be tagged with a InfoType to use the relevant public
    #     dataset as a statistical model of population, if available. We
    #     currently support US ZIP codes, region codes, ages and genders.
    #     To programmatically obtain the list of supported InfoTypes, use
    #     ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
    # @!attribute [rw] custom_tag
    #   @return [String]
    #     A column can be tagged with a custom tag. In this case, the user must
    #     indicate an auxiliary table that contains statistical information on
    #     the possible values of this column (below).
    # @!attribute [rw] inferred
    #   @return [Google::Protobuf::Empty]
    #     If no semantic tag is indicated, we infer the statistical model from
    #     the distribution of values in the input data
    class TaggedField; end

    # An auxiliary table contains statistical information on the relative
    # frequency of different quasi-identifiers values. It has one or several
    # quasi-identifiers columns, and one column that indicates the relative
    # frequency of each quasi-identifier tuple.
    # If a tuple is present in the data but not in the auxiliary table, the
    # corresponding relative frequency is assumed to be zero (and thus, the
    # tuple is highly reidentifiable).
    # @!attribute [rw] table
    #   @return [Google::Privacy::Dlp::V2::BigQueryTable]
    #     Auxiliary table location. [required]
    # @!attribute [rw] quasi_ids
    #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::AuxiliaryTable::QuasiIdField>]
    #     Quasi-identifier columns. [required]
    # @!attribute [rw] relative_frequency
    #   @return [Google::Privacy::Dlp::V2::FieldId]
    #     The relative frequency column must contain a floating-point number
    #     between 0 and 1 (inclusive). Null values are assumed to be zero.
    #     [required]
    class AuxiliaryTable
      # A quasi-identifier column has a custom_tag, used to know which column
      # in the data corresponds to which column in the statistical model.
      # @!attribute [rw] field
      #   @return [Google::Privacy::Dlp::V2::FieldId]
      # @!attribute [rw] custom_tag
      #   @return [String]
      class QuasiIdField; end
    end
  end
end

#l_diversity_config ⇒ `Google::Privacy::Dlp::V2::PrivacyMetric::LDiversityConfig`

Returns:

(Google::Privacy::Dlp::V2::PrivacyMetric::LDiversityConfig)

# File 'lib/google/cloud/dlp/v2/doc/google/privacy/dlp/v2/dlp.rb', line 584

class PrivacyMetric
  # Compute numerical stats over an individual column, including
  # min, max, and quantiles.
  # @!attribute [rw] field
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Field to compute numerical stats on. Supported types are
  #     integer, float, date, datetime, timestamp, time.
  class NumericalStatsConfig; end

  # Compute numerical stats over an individual column, including
  # number of distinct values and value count distribution.
  # @!attribute [rw] field
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Field to compute categorical stats on. All column types are
  #     supported except for arrays and structs. However, it may be more
  #     informative to use NumericalStats when the field type is supported,
  #     depending on the data.
  class CategoricalStatsConfig; end

  # k-anonymity metric, used for analysis of reidentification risk.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::FieldId>]
  #     Set of fields to compute k-anonymity over. When multiple fields are
  #     specified, they are considered a single composite key. Structs and
  #     repeated data types are not supported; however, nested fields are
  #     supported so long as they are not structs themselves or nested within
  #     a repeated field.
  # @!attribute [rw] entity_id
  #   @return [Google::Privacy::Dlp::V2::EntityId]
  #     Optional message indicating that multiple rows might be associated to a
  #     single individual. If the same entity_id is associated to multiple
  #     quasi-identifier tuples over distict rows, we consider the entire
  #     collection of tuples as the composite quasi-identifier. This collection
  #     is a multiset: the order in which the different tuples appear in the
  #     dataset is ignored, but their frequency is taken into account.
  #
  #     Important note: a maximum of 1000 rows can be associated to a single
  #     entity ID. If more rows are associated with the same entity ID, some
  #     might be ignored.
  class KAnonymityConfig; end

  # l-diversity metric, used for analysis of reidentification risk.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::FieldId>]
  #     Set of quasi-identifiers indicating how equivalence classes are
  #     defined for the l-diversity computation. When multiple fields are
  #     specified, they are considered a single composite key.
  # @!attribute [rw] sensitive_attribute
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Sensitive field for computing the l-value.
  class LDiversityConfig; end

  # Reidentifiability metric. This corresponds to a risk model similar to what
  # is called "journalist risk" in the literature, except the attack dataset is
  # statistically modeled instead of being perfectly known. This can be done
  # using publicly available data (like the US Census), or using a custom
  # statistical model (indicated as one or several BigQuery tables), or by
  # extrapolating from the distribution of values in the input dataset.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::TaggedField>]
  #     Fields considered to be quasi-identifiers. No two columns can have the
  #     same tag. [required]
  # @!attribute [rw] region_code
  #   @return [String]
  #     ISO 3166-1 alpha-2 region code to use in the statistical modeling.
  #     Required if no column is tagged with a region-specific InfoType (like
  #     US_ZIP_5) or a region code.
  # @!attribute [rw] auxiliary_tables
  #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::AuxiliaryTable>]
  #     Several auxiliary tables can be used in the analysis. Each custom_tag
  #     used to tag a quasi-identifiers column must appear in exactly one column
  #     of one auxiliary table.
  class KMapEstimationConfig
    # A column with a semantic tag attached.
    # @!attribute [rw] field
    #   @return [Google::Privacy::Dlp::V2::FieldId]
    #     Identifies the column. [required]
    # @!attribute [rw] info_type
    #   @return [Google::Privacy::Dlp::V2::InfoType]
    #     A column can be tagged with a InfoType to use the relevant public
    #     dataset as a statistical model of population, if available. We
    #     currently support US ZIP codes, region codes, ages and genders.
    #     To programmatically obtain the list of supported InfoTypes, use
    #     ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
    # @!attribute [rw] custom_tag
    #   @return [String]
    #     A column can be tagged with a custom tag. In this case, the user must
    #     indicate an auxiliary table that contains statistical information on
    #     the possible values of this column (below).
    # @!attribute [rw] inferred
    #   @return [Google::Protobuf::Empty]
    #     If no semantic tag is indicated, we infer the statistical model from
    #     the distribution of values in the input data
    class TaggedField; end

    # An auxiliary table contains statistical information on the relative
    # frequency of different quasi-identifiers values. It has one or several
    # quasi-identifiers columns, and one column that indicates the relative
    # frequency of each quasi-identifier tuple.
    # If a tuple is present in the data but not in the auxiliary table, the
    # corresponding relative frequency is assumed to be zero (and thus, the
    # tuple is highly reidentifiable).
    # @!attribute [rw] table
    #   @return [Google::Privacy::Dlp::V2::BigQueryTable]
    #     Auxiliary table location. [required]
    # @!attribute [rw] quasi_ids
    #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::AuxiliaryTable::QuasiIdField>]
    #     Quasi-identifier columns. [required]
    # @!attribute [rw] relative_frequency
    #   @return [Google::Privacy::Dlp::V2::FieldId]
    #     The relative frequency column must contain a floating-point number
    #     between 0 and 1 (inclusive). Null values are assumed to be zero.
    #     [required]
    class AuxiliaryTable
      # A quasi-identifier column has a custom_tag, used to know which column
      # in the data corresponds to which column in the statistical model.
      # @!attribute [rw] field
      #   @return [Google::Privacy::Dlp::V2::FieldId]
      # @!attribute [rw] custom_tag
      #   @return [String]
      class QuasiIdField; end
    end
  end
end

#numerical_stats_config ⇒ `Google::Privacy::Dlp::V2::PrivacyMetric::NumericalStatsConfig`

Returns:

(Google::Privacy::Dlp::V2::PrivacyMetric::NumericalStatsConfig)

# File 'lib/google/cloud/dlp/v2/doc/google/privacy/dlp/v2/dlp.rb', line 584

class PrivacyMetric
  # Compute numerical stats over an individual column, including
  # min, max, and quantiles.
  # @!attribute [rw] field
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Field to compute numerical stats on. Supported types are
  #     integer, float, date, datetime, timestamp, time.
  class NumericalStatsConfig; end

  # Compute numerical stats over an individual column, including
  # number of distinct values and value count distribution.
  # @!attribute [rw] field
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Field to compute categorical stats on. All column types are
  #     supported except for arrays and structs. However, it may be more
  #     informative to use NumericalStats when the field type is supported,
  #     depending on the data.
  class CategoricalStatsConfig; end

  # k-anonymity metric, used for analysis of reidentification risk.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::FieldId>]
  #     Set of fields to compute k-anonymity over. When multiple fields are
  #     specified, they are considered a single composite key. Structs and
  #     repeated data types are not supported; however, nested fields are
  #     supported so long as they are not structs themselves or nested within
  #     a repeated field.
  # @!attribute [rw] entity_id
  #   @return [Google::Privacy::Dlp::V2::EntityId]
  #     Optional message indicating that multiple rows might be associated to a
  #     single individual. If the same entity_id is associated to multiple
  #     quasi-identifier tuples over distict rows, we consider the entire
  #     collection of tuples as the composite quasi-identifier. This collection
  #     is a multiset: the order in which the different tuples appear in the
  #     dataset is ignored, but their frequency is taken into account.
  #
  #     Important note: a maximum of 1000 rows can be associated to a single
  #     entity ID. If more rows are associated with the same entity ID, some
  #     might be ignored.
  class KAnonymityConfig; end

  # l-diversity metric, used for analysis of reidentification risk.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::FieldId>]
  #     Set of quasi-identifiers indicating how equivalence classes are
  #     defined for the l-diversity computation. When multiple fields are
  #     specified, they are considered a single composite key.
  # @!attribute [rw] sensitive_attribute
  #   @return [Google::Privacy::Dlp::V2::FieldId]
  #     Sensitive field for computing the l-value.
  class LDiversityConfig; end

  # Reidentifiability metric. This corresponds to a risk model similar to what
  # is called "journalist risk" in the literature, except the attack dataset is
  # statistically modeled instead of being perfectly known. This can be done
  # using publicly available data (like the US Census), or using a custom
  # statistical model (indicated as one or several BigQuery tables), or by
  # extrapolating from the distribution of values in the input dataset.
  # @!attribute [rw] quasi_ids
  #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::TaggedField>]
  #     Fields considered to be quasi-identifiers. No two columns can have the
  #     same tag. [required]
  # @!attribute [rw] region_code
  #   @return [String]
  #     ISO 3166-1 alpha-2 region code to use in the statistical modeling.
  #     Required if no column is tagged with a region-specific InfoType (like
  #     US_ZIP_5) or a region code.
  # @!attribute [rw] auxiliary_tables
  #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::AuxiliaryTable>]
  #     Several auxiliary tables can be used in the analysis. Each custom_tag
  #     used to tag a quasi-identifiers column must appear in exactly one column
  #     of one auxiliary table.
  class KMapEstimationConfig
    # A column with a semantic tag attached.
    # @!attribute [rw] field
    #   @return [Google::Privacy::Dlp::V2::FieldId]
    #     Identifies the column. [required]
    # @!attribute [rw] info_type
    #   @return [Google::Privacy::Dlp::V2::InfoType]
    #     A column can be tagged with a InfoType to use the relevant public
    #     dataset as a statistical model of population, if available. We
    #     currently support US ZIP codes, region codes, ages and genders.
    #     To programmatically obtain the list of supported InfoTypes, use
    #     ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
    # @!attribute [rw] custom_tag
    #   @return [String]
    #     A column can be tagged with a custom tag. In this case, the user must
    #     indicate an auxiliary table that contains statistical information on
    #     the possible values of this column (below).
    # @!attribute [rw] inferred
    #   @return [Google::Protobuf::Empty]
    #     If no semantic tag is indicated, we infer the statistical model from
    #     the distribution of values in the input data
    class TaggedField; end

    # An auxiliary table contains statistical information on the relative
    # frequency of different quasi-identifiers values. It has one or several
    # quasi-identifiers columns, and one column that indicates the relative
    # frequency of each quasi-identifier tuple.
    # If a tuple is present in the data but not in the auxiliary table, the
    # corresponding relative frequency is assumed to be zero (and thus, the
    # tuple is highly reidentifiable).
    # @!attribute [rw] table
    #   @return [Google::Privacy::Dlp::V2::BigQueryTable]
    #     Auxiliary table location. [required]
    # @!attribute [rw] quasi_ids
    #   @return [Array<Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig::AuxiliaryTable::QuasiIdField>]
    #     Quasi-identifier columns. [required]
    # @!attribute [rw] relative_frequency
    #   @return [Google::Privacy::Dlp::V2::FieldId]
    #     The relative frequency column must contain a floating-point number
    #     between 0 and 1 (inclusive). Null values are assumed to be zero.
    #     [required]
    class AuxiliaryTable
      # A quasi-identifier column has a custom_tag, used to know which column
      # in the data corresponds to which column in the statistical model.
      # @!attribute [rw] field
      #   @return [Google::Privacy::Dlp::V2::FieldId]
      # @!attribute [rw] custom_tag
      #   @return [String]
      class QuasiIdField; end
    end
  end
end

Class: Google::Privacy::Dlp::V2::PrivacyMetric

Overview

Defined Under Namespace

Instance Attribute Summary collapse

Instance Attribute Details

#categorical_stats_config ⇒ Google::Privacy::Dlp::V2::PrivacyMetric::CategoricalStatsConfig

#k_anonymity_config ⇒ Google::Privacy::Dlp::V2::PrivacyMetric::KAnonymityConfig

#k_map_estimation_config ⇒ Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig

#l_diversity_config ⇒ Google::Privacy::Dlp::V2::PrivacyMetric::LDiversityConfig

#numerical_stats_config ⇒ Google::Privacy::Dlp::V2::PrivacyMetric::NumericalStatsConfig

#categorical_stats_config ⇒ `Google::Privacy::Dlp::V2::PrivacyMetric::CategoricalStatsConfig`

#k_anonymity_config ⇒ `Google::Privacy::Dlp::V2::PrivacyMetric::KAnonymityConfig`

#k_map_estimation_config ⇒ `Google::Privacy::Dlp::V2::PrivacyMetric::KMapEstimationConfig`

#l_diversity_config ⇒ `Google::Privacy::Dlp::V2::PrivacyMetric::LDiversityConfig`

#numerical_stats_config ⇒ `Google::Privacy::Dlp::V2::PrivacyMetric::NumericalStatsConfig`