Correlation and Association Analysis
- Package: corr
- Version: 0.0.2
- Generated: 2025-12-07T21:48:58
- Author(s): Ryo Nakaya ([email protected])
- Maintainer(s): Ryo Nakaya ([email protected])
- License: MIT
- File SHA256:
F*7C4972E6295C76B039374C03719D6895038E0AC04B2DE58D263267DF0653F2FAfor this version - Content SHA256:
C*D05A70CDFFCEF138EB30A87596AAF779313444FA2519C8A8BA9A6663D75CFF4Cfor this version
Provides macros for computing association measures between continuous, nominal, and ordinal variables. Supports Pearson and Spearman correlations, Cramer's V, Somers' D, and Eta coefficients. Outputs both long-format association tables and wide matrix-style datasets for further analysis. Includes a heatmap macro to visualize the strength and type of associations in a single plot. Tools for analysis or visualization related to correlation and association are to be added.
Available macros are as below.
- %association_matrix : To generate association matrix dataset (long, wide)
- %heatmap : To generate heatmap
- %scatter_matrix : To generate scatter matrix plot (Kernel, Bar, Scatter, Mosaic, Cross Table, Box)
Required SAS Components:
- Base SAS Software
SAS package generated by SAS Package Framework, version 20251017
The corr package consists of the following content:
%association_matrix
Creates a unified association matrix between variables of different types
(continuous, nominal, ordinal).
It calculates:
- Correlations for continuous + ordinal variables (PEARSON or SPEARMAN)
- Cramer's V for nominal x nominal (range 0 to 1)
- Somers' D (C|R) for nominal x ordinal (range -1 to 1)
- Eta coefficient for nominal x continuous (range 0 to 1)
and combines them into a single "long" association dataset and a "wide" matrix.
-
data(required)
Input data set name. -
continuous(optional)
Space-separated list of continuous variables. -
nominal(optional)
Space-separated list of nominal (categorical) variables. -
ordinal(optional)
Space-separated list of ordinal variables. -
method(optional, default = PEARSON)
Correlation method used for continuous + ordinal part.PEARSON: Pearson correlationSPEARMAN: Spearman correlation
-
out(optional, default = association)
Base name of output data sets.- Long-format association table:
&out. - Wide-format matrix:
&out._wide
- Long-format association table:
-
&out.(long format)- One row per pair (var1, var2) and association measure.
- Key variables:
var1,var2: variable namesvalue: numeric association measuretype: association typePEARSON/SPEARMAN: correlationCRAMERS_V: Cramer's VSOMERS_D: Somers' D (C|R)ETA: eta coefficientDIAG: diagonal (self-association = 1)
valuetxt: formatted value with markertxt: marker for plot/heatmap*: Cramer's V+: Somers' D-: Eta- ``(blank) : correlations
-
&out._wide(wide format)- Matrix-style dataset (wide) for numeric association values.
var1is the row variable, columns arevar2.
%association_matrix(
data = adsl,
continuous = AGE HEIGHT WEIGHT,
nominal = SEX ARMCD,
ordinal = VISITN,
method = PEARSON,
out = association_all
);- Diagonal elements (
type = "DIAG") are set to 1 for all variables listed incontinuous,ordinal, andnominal. - Floating point values are rounded to
1e-6to reduce minor numeric noise. - Temporary work tables (e.g.
_corr_co,_chisq,_measures,_ov) are created and cleaned up inside the macro.
https://github.com/Nakaya-Ryo/corr
%gtl_bar
Internal macro `gtl_bar` provides GTL-based graph or table generation for correlation or related visualizations.
%gtl_box
Internal macro `gtl_box` provides GTL-based graph or table generation for correlation or related visualizations.
%gtl_crosstable
Internal macro `gtl_crosstable` provides GTL-based graph or table generation for correlation or related visualizations.
%gtl_ellipse
Internal macro `gtl_ellipse` provides GTL-based graph or table generation for correlation or related visualizations.
%gtl_kernel
Internal macro `gtl_kernel` provides GTL-based graph or table generation for correlation or related visualizations.
%gtl_mosaic
Internal macro `gtl_mosaic` provides GTL-based graph or table generation for correlation or related visualizations.
%gtl_reg
Internal Macro `gtl_reg` provides GTL-based graph or table generation for correlation or related visualizations.
%gtl_scatter
Internal macro `gtl_scatter` provides GTL-based graph or table generation for correlation or related visualizations.
%heatmap
Generates a heatmap visualization of associations between variables.
This macro first calls `%association_matrix` to compute all pairwise
associations (continuous, nominal, ordinal), then creates a heatmap plot
with annotated values.
-
data(required)
Input dataset for association calculations. -
continuous(optional)
Space-separated list of continuous variables. -
nominal(optional)
Space-separated list of nominal variables. -
ordinal(optional)
Space-separated list of ordinal variables. -
method(optional, default = PEARSON)
Correlation method for continuous + ordinal vars.PEARSONSPEARMAN
-
text(optional, default = Y)
Controls what text appears in each heatmap cell:Y? Use formatted value text with markerN? Use marker symbols only for non-correlation measures
-
xreverse(optional, default = N)
Controls reverse of xaxis order -
yreverse(optional, default = Y)
Controls reverse of yaxis order -
out(optional, default = association)
Base name of output datasets created by%association_matrix.
- Heatmap graphic via
PROC SGPLOT - Long-format association dataset
&out. - Wide-format association matrix
&out._wide
*for Cramer's V (range 0 to 1)+for Somers' D (range -1 to 1)-for Eta (correlation ratio) (range 0 to 1)- Numeric values printed when
text=Y.
%heatmap(
data = adsl,
continuous = AGE HEIGHT WEIGHT,
nominal = SEX ARMCD,
ordinal = VISITN,
method = SPEARMAN,
text = Y,
xreverse = N,
yreverse = Y,
out = association
);- The macro requires
%association_matrixto be available.
https://github.com/Nakaya-Ryo/corr
%scatter_matrix
Macro `scatter_matrix` provides GTL-based graph or table generation for correlation or related visualizations.
-
data(required)
Input data set name -
continuous(optional)
Variable names for continuous measures with blank separated -
categorical(optional)
Variable names for categorical measures with blank separated -
group(optional)
A variable name for coloring scatter plots
%scatter_matrix(
data = adsl,
continuous = age weight,
categorical = sex,
group = race
)- Up to five variables are enough for visibility of lattice graphs
https://github.com/Nakaya-Ryo/corr
Author: Ryo Nakaya ([email protected]) Latest udpate Date: 2025-12-04
MIT License
Copyright (c) 2025 Ryo Nakaya
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.