Skip to content

Gini Coefficient Metric

The gini metric calculates the Gini coefficient, measuring a model's ability to discriminate between defaults and non-defaults.

Metric Type: gini

Gini Calculation

The Gini coefficient is calculated as Gini = 2 × AUC - 1, where AUC is the Area Under the ROC Curve:

  • 1.0: Perfect discrimination
  • 0.0: No discrimination (random)
  • -1.0: Perfect inverse discrimination

The Gini coefficient is a normalized version of AUC that ranges from -1 to 1, making it easier to interpret than AUC alone.

Configuration Fields

Record-Level Data Format

For individual loan/account records:

collections:
  discrimination_analysis:
    metrics:
    - name:
      - model_discrimination
      data_format: record
      prob_def: model_score
      default: default_flag
      segment:
      - - product_type
      metric_type: gini
    dataset: loan_portfolio

Summary-Level Data Format

For pre-aggregated risk-ordered data:

collections:
  summary_discrimination:
    metrics:
    - name:
      - risk_grade_gini
      data_format: summary
      mean_pd: avg_probability
      defaults: default_count
      volume: total_count
      segment:
      - - model_version
      metric_type: gini
    dataset: risk_grade_summary

Required Fields by Format

Record-Level Required

  • name: Metric name(s)
  • data_format: Must be "record"
  • prob_def: Probability column name
  • default: Default indicator column name
  • dataset: Dataset reference

Summary-Level Required

  • name: Metric name(s)
  • data_format: Must be "summary"
  • mean_pd: Mean probability column name (used for risk ordering)
  • defaults: Default count column name
  • volume: Volume count column name
  • dataset: Dataset reference

Optional Fields

  • segment: List of column names for grouping

Output Columns

The metric produces the following output columns:

  • group_key: Segmentation group identifier (struct of segment values)
  • volume: Total number of observations
  • defaults: Total number of defaults
  • odr: Observed Default Rate (Defaults/Volume)
  • pd: Mean Predicted Default probability
  • gini: Calculated Gini coefficient (-1.0 to 1.0)
  • curve_data: Plot data for Gini/Lorenz curves (struct array with pct_observations and pct_defaults)

Curve Data for Visualization

The curve_data column contains plot points for creating Gini coefficient/Lorenz curve visualizations in BI tools like Power BI. Each data point contains:

  • pct_observations: Cumulative percentage of observations (x-axis)
  • pct_defaults: Cumulative percentage of defaults (y-axis)

The data starts at (0,0) and ends at (100,100), with points ordered by descending risk score. This allows direct plotting of:

  • Lorenz Curve: Shows concentration of defaults
  • Gini Curve: Visual representation of model discrimination
  • 45-degree reference line: Represents random model performance

Example Usage in BI Tools:

  1. Extract curve data from the result
  2. Create scatter plot with pct_observations as x-axis, pct_defaults as y-axis
  3. Add 45-degree reference line from (0,0) to (100,100)
  4. Area between curves represents discrimination power

Fan-out Examples

Multiple Discrimination Tests

collections:
  gini_analysis:
    metrics:
    - name:
      - portfolio_gini
      - product_gini
      - region_gini
      - vintage_gini
      segment:
      - null
      - - product_type
      - - region
      - - origination_year
      data_format: record
      prob_def: risk_score
      default: default_indicator
      metric_type: gini
    dataset: model_validation_data

This creates four Gini metrics:

  1. Overall portfolio discrimination
  2. Discrimination by product type
  3. Discrimination by region
  4. Discrimination by origination vintage

Model Comparison

collections:
  model_gini_comparison:
    metrics:
    - name:
      - champion_model
      segment:
      - null
      data_format: record
      prob_def: champion_score
      default: default_flag
      metric_type: gini
    dataset: ab_test_data
  challenger_gini:
    metrics:
    - name:
      - challenger_model_score
      data_format: record
      prob_def: challenger_score
      default: default_flag
      metric_type: gini
    dataset: ab_test_data

Summary-Level Analysis

collections:
  risk_grade_analysis:
    metrics:
    - name:
      - overall_grade_gini
      - product_grade_gini
      segment:
      - null
      - - product_type
      data_format: summary
      mean_pd: grade_mean_pd
      defaults: grade_defaults
      volume: grade_volume
      metric_type: gini
    dataset: risk_grade_stats

Combined Discrimination Analysis

collections:
  auc_metrics:
    metrics:
    - name:
      - model_auc
      data_format: record
      prob_def: risk_score
      default: default_flag
      metric_type: auc
    dataset: validation_data
  gini_metrics:
    metrics:
    - name:
      - model_gini
      data_format: record
      prob_def: risk_score
      default: default_flag
      metric_type: gini
    dataset: validation_data

Data Requirements

Record-Level Data

  • One row per loan/account
  • Probability column: numeric values between 0.0 and 1.0
  • Default column: binary values (0/1 or boolean)
  • Sufficient data points for meaningful Gini calculation (minimum ~20 observations recommended)

Summary-Level Data

  • One row per risk grade or aggregated group
  • Data should be ordered by risk (mean_pd column used for ordering)
  • Mean probabilities: numeric values between 0.0 and 1.0
  • Default counts: positive numbers or None (negative values not allowed)
  • Volume counts: positive numbers or None (negative values not allowed)
  • At least 2 risk grades with both defaults and non-defaults

Gini Interpretation

  • Gini > 0.6: Excellent discrimination (equivalent to AUC > 0.8)
  • Gini > 0.4: Good discrimination (equivalent to AUC > 0.7)
  • Gini > 0.2: Acceptable discrimination (equivalent to AUC > 0.6)
  • Gini ≤ 0.2: Poor discrimination (equivalent to AUC ≤ 0.6)
  • Gini = 0.0: No discrimination (random model, AUC = 0.5)
  • Gini < 0.0: Inverse discrimination (model predicts opposite of reality)

Relationship to AUC

The Gini coefficient is directly related to AUC:

Gini = 2 × AUC - 1
AUC = (Gini + 1) / 2

Why use Gini instead of AUC?

  1. Centered around zero: Makes it easier to interpret no discrimination (0.0)
  2. Industry standard: Widely used in credit risk modeling
  3. Symmetric range: [-1, 1] range is more intuitive than AUC's [0, 1]
  4. Regulatory reporting: Often required for credit risk model validation

Important Notes

  1. Data Quality: Remove accounts with missing or invalid probability scores
  2. Sample Size: Larger samples provide more reliable Gini estimates
  3. Population Stability: Gini can vary across different populations or time periods
  4. Risk Ordering: For summary-level data, ensure groups are properly risk-ordered
  5. Equivalent to AUC: Gini and AUC provide identical ranking power information
  6. Calculation Method: Uses sklearn's roc_auc_score internally for consistency

Edge Cases

  • All defaults or no defaults: Gini returns None (undefined)
  • Perfect separation: Gini = 1.0
  • Completely random: Gini = 0.0
  • Perfect inverse: Gini = -1.0
  • Insufficient data: Single observation returns None