Skip to content

Latest commit

 

History

History
438 lines (318 loc) · 12.2 KB

File metadata and controls

438 lines (318 loc) · 12.2 KB

Documentation for the corr package.


Correlation and Association Analysis


Version information:

  • Package: corr
  • Version: 0.0.2
  • Generated: 2025-12-07T21:48:58
  • Author(s): Ryo Nakaya ([email protected])
  • Maintainer(s): Ryo Nakaya ([email protected])
  • License: MIT
  • File SHA256: F*7C4972E6295C76B039374C03719D6895038E0AC04B2DE58D263267DF0653F2FA for this version
  • Content SHA256: C*D05A70CDFFCEF138EB30A87596AAF779313444FA2519C8A8BA9A6663D75CFF4C for this version

The corr package, version: 0.0.2;


Provides macros for computing association measures between continuous, nominal, and ordinal variables. Supports Pearson and Spearman correlations, Cramer's V, Somers' D, and Eta coefficients. Outputs both long-format association tables and wide matrix-style datasets for further analysis. Includes a heatmap macro to visualize the strength and type of associations in a single plot. Tools for analysis or visualization related to correlation and association are to be added.

Available macros are as below.

  • %association_matrix : To generate association matrix dataset (long, wide)
  • %heatmap : To generate heatmap
  • %scatter_matrix : To generate scatter matrix plot (Kernel, Bar, Scatter, Mosaic, Cross Table, Box)


Required SAS Components:

  • Base SAS Software


SAS package generated by SAS Package Framework, version 20251017


The corr package content

The corr package consists of the following content:

  1. %association_matrix() macro

  2. %gtl_bar() macro

  3. %gtl_box() macro

  4. %gtl_crosstable() macro

  5. %gtl_ellipse() macro

  6. %gtl_kernel() macro

  7. %gtl_mosaic() macro

  8. %gtl_reg() macro

  9. %gtl_scatter() macro

  10. %heatmap() macro

  11. %scatter_matrix() macro

  12. License note


%association_matrix() macro

Macro:

%association_matrix  

Purpose:

Creates a unified association matrix between variables of different types  
(continuous, nominal, ordinal).  
It calculates:
- Correlations for continuous + ordinal variables (PEARSON or SPEARMAN)  
- Cramer's V for nominal x nominal  (range 0 to 1)  
- Somers' D (C|R) for nominal x ordinal  (range -1 to 1)  
- Eta coefficient for nominal x continuous  (range 0 to 1)  
and combines them into a single "long" association dataset and a "wide" matrix.

Parameters:

  • data (required)
    Input data set name.

  • continuous (optional)
    Space-separated list of continuous variables.

  • nominal (optional)
    Space-separated list of nominal (categorical) variables.

  • ordinal (optional)
    Space-separated list of ordinal variables.

  • method (optional, default = PEARSON)
    Correlation method used for continuous + ordinal part.

    • PEARSON : Pearson correlation
    • SPEARMAN : Spearman correlation
  • out (optional, default = association)
    Base name of output data sets.

    • Long-format association table: &out.
    • Wide-format matrix: &out._wide

Outputs:

  1. &out. (long format)

    • One row per pair (var1, var2) and association measure.
    • Key variables:
      • var1, var2 : variable names
      • value : numeric association measure
      • type : association type
        • PEARSON / SPEARMAN : correlation
        • CRAMERS_V : Cramer's V
        • SOMERS_D : Somers' D (C|R)
        • ETA : eta coefficient
        • DIAG : diagonal (self-association = 1)
      • valuetxt : formatted value with marker
      • txt : marker for plot/heatmap
        • * : Cramer's V
        • + : Somers' D
        • - : Eta
        • ``(blank) : correlations
  2. &out._wide (wide format)

    • Matrix-style dataset (wide) for numeric association values.
    • var1 is the row variable, columns are var2.

Sample code:

%association_matrix(
    data       = adsl,
    continuous = AGE HEIGHT WEIGHT,
    nominal    = SEX ARMCD,
    ordinal    = VISITN,
    method     = PEARSON,
    out        = association_all
);

Notes:

  • Diagonal elements (type = "DIAG") are set to 1 for all variables listed in continuous, ordinal, and nominal.
  • Floating point values are rounded to 1e-6 to reduce minor numeric noise.
  • Temporary work tables (e.g. _corr_co, _chisq, _measures, _ov) are created and cleaned up inside the macro.

URL:

https://github.com/Nakaya-Ryo/corr


Author: Ryo Nakaya Latest update Date: 2025-12-01


%gtl_bar() macro

Macro:

%gtl_bar

Purpose:

Internal macro `gtl_bar` provides GTL-based graph or table generation for correlation or related visualizations.

Author: Ryo Nakaya Latest udpate Date: 2025-12-04


%gtl_box() macro

Macro:

%gtl_box

Purpose:

Internal macro `gtl_box` provides GTL-based graph or table generation for correlation or related visualizations.

Author: Ryo Nakaya Latest udpate Date: 2025-12-04


%gtl_crosstable() macro

Macro:

%gtl_crosstable

Purpose:

Internal macro `gtl_crosstable` provides GTL-based graph or table generation for correlation or related visualizations.

Author: Ryo Nakaya Latest udpate Date: 2025-12-04


%gtl_ellipse() macro

Macro:

%gtl_ellipse

Purpose:

Internal macro `gtl_ellipse` provides GTL-based graph or table generation for correlation or related visualizations.

Author: Ryo Nakaya Latest udpate Date: 2025-12-04


%gtl_kernel() macro

Macro:

%gtl_kernel

Purpose:

Internal macro `gtl_kernel` provides GTL-based graph or table generation for correlation or related visualizations.

Author: Ryo Nakaya Latest udpate Date: 2025-12-04


%gtl_mosaic() macro

Macro:

%gtl_mosaic

Purpose:

Internal macro `gtl_mosaic` provides GTL-based graph or table generation for correlation or related visualizations.

Author: Ryo Nakaya Latest udpate Date: 2025-12-04


%gtl_reg() macro

Macro:

%gtl_reg

Purpose:

Internal Macro `gtl_reg` provides GTL-based graph or table generation for correlation or related visualizations.

Author: Ryo Nakaya Latest udpate Date: 2025-12-04


%gtl_scatter() macro

Macro:

%gtl_scatter

Purpose:

Internal macro `gtl_scatter` provides GTL-based graph or table generation for correlation or related visualizations.

Author: Ryo Nakaya Latest udpate Date: 2025-12-04


%heatmap() macro

Macro:

%heatmap  

Purpose:

Generates a heatmap visualization of associations between variables.  
This macro first calls `%association_matrix` to compute all pairwise
associations (continuous, nominal, ordinal), then creates a heatmap plot
with annotated values.

Parameters:

  • data (required)
    Input dataset for association calculations.

  • continuous (optional)
    Space-separated list of continuous variables.

  • nominal (optional)
    Space-separated list of nominal variables.

  • ordinal (optional)
    Space-separated list of ordinal variables.

  • method (optional, default = PEARSON)
    Correlation method for continuous + ordinal vars.

    • PEARSON
    • SPEARMAN
  • text (optional, default = Y)
    Controls what text appears in each heatmap cell:

    • Y ? Use formatted value text with marker
    • N ? Use marker symbols only for non-correlation measures
  • xreverse (optional, default = N)
    Controls reverse of xaxis order

  • yreverse (optional, default = Y)
    Controls reverse of yaxis order

  • out (optional, default = association)
    Base name of output datasets created by %association_matrix.

Output:

  • Heatmap graphic via PROC SGPLOT
  • Long-format association dataset &out.
  • Wide-format association matrix &out._wide

Display Conventions:

  • * for Cramer's V (range 0 to 1)
  • + for Somers' D (range -1 to 1)
  • - for Eta (correlation ratio) (range 0 to 1)
  • Numeric values printed when text=Y.

Sample code:

%heatmap(
    data          = adsl,
    continuous = AGE HEIGHT WEIGHT,
    nominal      = SEX ARMCD,
    ordinal       = VISITN,
    method     = SPEARMAN,
    text         = Y,
	xreverse   = N,
    yreverse   = Y,
    out          = association
);

Notes:

  • The macro requires %association_matrix to be available.

URL:

https://github.com/Nakaya-Ryo/corr


Author: Ryo Nakaya Latest update Date: 2025-12-07


%scatter_matrix() macro

Macro:

%scatter_matrix

Purpose:

Macro `scatter_matrix` provides GTL-based graph or table generation for correlation or related visualizations.

Parameters:

  • data (required)
    Input data set name

  • continuous (optional)
    Variable names for continuous measures with blank separated

  • categorical (optional)
    Variable names for categorical measures with blank separated

  • group (optional)
    A variable name for coloring scatter plots

Sample code:

%scatter_matrix(
	data = adsl,
	continuous = age weight,
	categorical = sex,
	group = race
)

Note:

  • Up to five variables are enough for visibility of lattice graphs

URL:

https://github.com/Nakaya-Ryo/corr


Author: Ryo Nakaya ([email protected]) Latest udpate Date: 2025-12-04



License

MIT License

Copyright (c) 2025 Ryo Nakaya

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.