Skip to content

Output custom object with headers when no values are present#25096

Open
jshigetomi wants to merge 31 commits intomasterfrom
fixNoValueCSVInput
Open

Output custom object with headers when no values are present#25096
jshigetomi wants to merge 31 commits intomasterfrom
fixNoValueCSVInput

Conversation

@jshigetomi
Copy link
Copy Markdown
Collaborator

@jshigetomi jshigetomi commented Feb 26, 2025

PR Summary

Fixes: #24698

This pull request includes changes to improve the handling of empty and null values in the ConvertFrom-Csv command and updates the corresponding unit tests to verify the new behavior. The most important changes include modifying the Import method in CsvCommands.cs and adding new unit tests in ConvertFrom-Csv.Tests.ps1.

Enhancements to ConvertFrom-Csv command:

Unit tests updates:

PR Context

Improve CSV parsing edge case handling and add comprehensive unit tests

PR Checklist

@jshigetomi
Copy link
Copy Markdown
Collaborator Author

I've adjusted the behavior of ConvertFrom-CSV according to the discussion on issue #24698. Essentially I removed the feature that skips blank lines in the Import method in ImportCsvHelper. It still keeps the last cell null so as to keep this from being a breaking change. #17702

@PowerShell PowerShell deleted a comment from azure-pipelines bot Feb 26, 2025
if (values.Count == 1 && string.IsNullOrEmpty(values[0]) && !(_cmdlet is ConvertFromCsvCommand))
{
// skip the blank lines
// Skip empty lines when Import-CSV but keep empty lines when ConvertFromCSV
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug (#24698) affects both ConvertFrom-Csv and Import-Csv; I think the fix should be applied to both commands for consistency. ConvertFrom-Csv was used throughout the original issue for demonstration purposes only.

Do we know if there's a specific reason why the check for values.Count == 1 && string.IsNullOrEmpty(values[0]) was included in the first place?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from Windows PowerShell.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I disable it for Import-CSV as well, `n's are captured as an extra element. Is this the behavior we want?

image

Describe "Import-Csv with different newlines" -Tags "CI" {
    It "Test import-csv with '<name>' newline" -TestCases @(
        @{ name = "CR"; newline = "`r" }
        @{ name = "LF"; newline = "`n" }
        @{ name = "CRLF"; newline = "`r`n" }
        ) {
        param($newline)
        $csvFile = Join-Path $TestDrive -ChildPath $((New-Guid).Guid)
        $delimiter = ','
        "h1,h2,h3$($newline)11,12,13$($newline)21,22,23$($newline)" | Out-File -FilePath $csvFile
        $returnObject = Import-Csv -Path $csvFile -Delimiter $delimiter
        $returnObject.Count | Should -Be 2
        $returnObject[0].h1 | Should -Be 11
        $returnObject[0].h2 | Should -Be 12
        $returnObject[0].h3 | Should -Be 13
        $returnObject[1].h1 | Should -Be 21
        $returnObject[1].h2 | Should -Be 22
        $returnObject[1].h3 | Should -Be 23
    }
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the behavior we want?

No, it isn't.

This is more about ensuring whatever fix is chosen for #24698 is applied to both ConvertFrom-Csv and Import-Csv. In other words, the example CSV data below should deserialize consistently, irrespective of the source being a file or a string.

"P1"
""
""
""

Can #24698 be fixed for both commands without introducing the newline issue?

Copy link
Copy Markdown
Collaborator Author

@jshigetomi jshigetomi Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it can be deserialized consistently but would result in just one empty string value per header.
For example,

"P1"
""
""
""
# Would produce [pscustomobject] @{ P1 = '' } ignoring the other empty cells
@'
"P1","P2"
"",
"",
'@ | ConvertFrom-Csv
# Would produce [pscustomobject] @{ P1 = ''; P2 = '' } ignoring the other empty cells again

This way we can produce a header only output consistently.

Thoughts?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I had in mind when I filed the issue. For each non-empty line in input data, an object should be emitted by ConvertFrom-Csv/Import-Csv.

The following (using either command) produces 3 objects:

@'
"P1"
"a"
"b"
"c"
'@ | ConvertFrom-Csv

Therefore, the following (using either command) should also produce 3 objects:

@'
"P1"
""
""
""
'@ | ConvertFrom-Csv

Likewise, this produces 2 objects, because the empty line is correctly ignored:

@'
"P1"
"a"

"c"
'@ | ConvertFrom-Csv

Therefore, this should also produce 2 objects:

@'
"P1"
""

""
'@ | ConvertFrom-Csv

@iSazonov
Copy link
Copy Markdown
Collaborator

@jshigetomi Root of the issue in ParseNextRecord method. It does not distinguish between an input string without characters and a string with two double quotes "".

result.Add(current.ToString());
current.Remove(0, current.Length);
// New line outside quote is end of word and end of record
break;

Here current.ToString() returns string.Empty for these two cases. We can check current.Length > 0 before call ToString(). I suggest the fix (with removing

if (values.Count == 1 && string.IsNullOrEmpty(values[0]) && !(_cmdlet is ConvertFromCsvCommand))
{
// Skip empty lines when Import-CSV but keep empty lines when ConvertFromCSV
continue;
}
)

But we get to Zugzwang - sometimes we want to, sometimes we don't. Of course, it would be possible to add more conditions, but this would require a more significant code change in same methods. My suggestion is not to complicate the code. If there are concerns, then make this change an experimental feature.

@iSazonov
Copy link
Copy Markdown
Collaborator

iSazonov commented Mar 1, 2025

@jshigetomi I checked my guess. I had to make a few more changes. You can see the result here:

master...iSazonov:PowerShell:csvblank

@jshigetomi
Copy link
Copy Markdown
Collaborator Author

@iSazonov I'm seeing test failure for Import-CSV

image

Maybe a warning to the user would be better for no value csvs?

@iSazonov
Copy link
Copy Markdown
Collaborator

iSazonov commented Mar 3, 2025

I don't see errors in my branch.

@jshigetomi
Copy link
Copy Markdown
Collaborator Author

Hmm I'm having some trouble reproducing your results.

master...iSazonov:PowerShell:csvblank#diff-2b77454e7d234e46e6ebdb6c1be4ac0cb9c02b88bbe0e61c9acdfa3f66491e81R1383-R1390

How are you resolving the brackets here?

@iSazonov
Copy link
Copy Markdown
Collaborator

iSazonov commented Mar 4, 2025

@jshigetomi Please look #25120

@jshigetomi
Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@PowerShell PowerShell deleted a comment from azure-pipelines bot Mar 6, 2025
@PowerShell PowerShell deleted a comment from azure-pipelines bot Mar 6, 2025
@jshigetomi
Copy link
Copy Markdown
Collaborator Author

jshigetomi commented Mar 27, 2025

@surfingoldelephant @iSazonov Thanks for the pointer for Set-Content. I was not aware of the newline behavior.

I made the ConvertFrom-CSV and Import-CSV tests the same and both behave the same.


Describe "Import-Csv with empty and null values" {

Context 'Empty CSV Fields' {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the PR is about edge cases I think about additional tests:

  1. only header presents without newline
  2. header presents with one empty newline

And more:

  1. empty file (-Header parameter presents)
  2. file with one empty line (-Header parameter presents)

Copy link
Copy Markdown

@surfingoldelephant surfingoldelephant Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, we're expecting those tests to produce no output (zero custom objects).

Let's also include Import-Csv/ConvertFrom-Csv tests with multiple empty lines.

  1. Header only

    @'    
    P1,P2
    '@ | ConvertFrom-Csv
  2. Header followed by one empty line

    @'    
    P1,P2
    
    '@ | ConvertFrom-Csv
  3. Header followed by multiple empty lines

    @'    
    P1,P2
    
    
    '@ | ConvertFrom-Csv
  4. Empty input and -Header specified

    '' | ConvertFrom-Csv -Header P1, P2
  5. One empty line and -Header specified

    @'
    
    '@ | ConvertFrom-Csv -Header P1, P2
  6. Multiple empty lines and -Header specified

    @'
    
    
    '@ | ConvertFrom-Csv -Header P1, P2

Perhaps additional -Header tests to cover the fix:

@'
,
A1,A2
B1,B2
,
'@ | ConvertFrom-Csv -Header P1, P2

# Expected:
[pscustomobject] @{ P1 = '';   P2 = '' }
[pscustomobject] @{ P1 = 'A1'; P2 = 'A2' }
[pscustomobject] @{ P1 = 'B1'; P2 = 'B2' }
[pscustomobject] @{ P1 = '';   P2 = $null }

Copy link
Copy Markdown

@surfingoldelephant surfingoldelephant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to take a look at no detected delimiter scenarios.

# Detected delimiter:

@'
P1,P2,P3
A1,,
B1,,
C1,,
'@ | ConvertFrom-Csv

# Current/expected behavior:
[pscustomobject] @{ P1 = 'A1'; P2 = ''; P3 = '' }
[pscustomobject] @{ P1 = 'B1'; P2 = ''; P3 = '' }
[pscustomobject] @{ P1 = 'C1'; P2 = ''; P3 = $null }
# No detected delimiter:

@'
P1,P2,P3
A1
B1
C1
'@ | ConvertFrom-Csv

# Current/expected behavior:
[pscustomobject] @{ P1 = 'A1'; P2 = $null; P3 = $null }
[pscustomobject] @{ P1 = 'B1'; P2 = $null; P3 = $null }
[pscustomobject] @{ P1 = 'C1'; P2 = $null; P3 = $null }


@'
P1;P2;P3
A1
B1
C1
'@ | ConvertFrom-Csv -Delimiter ';'

# Current/expected behavior:
[pscustomobject] @{ P1 = 'A1'; P2 = $null; P3 = $null }
[pscustomobject] @{ P1 = 'B1'; P2 = $null; P3 = $null }
[pscustomobject] @{ P1 = 'C1'; P2 = $null; P3 = $null }

Notice when there is no (detected) delimiter, the resultant value is always $null. Can we add tests for this?

@jshigetomi
Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@jshigetomi
Copy link
Copy Markdown
Collaborator Author

/azp run PowerShell-Windows-Packaging-CI, PowerShell-CI-linux-packaging

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@jshigetomi jshigetomi closed this Jan 13, 2026
@github-project-automation github-project-automation bot moved this from PR In Progress to Done in PowerShell Team Reviews/Investigations Jan 13, 2026
@jshigetomi jshigetomi reopened this Jan 13, 2026
Copilot AI review requested due to automatic review settings January 13, 2026 19:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request improves the handling of empty and null values in CSV parsing commands (Import-Csv and ConvertFrom-Csv). The changes ensure that both cmdlets properly distinguish between truly empty lines (which should be skipped) and lines containing only delimiters (which represent rows with empty field values and should be processed).

Changes:

  • Improved logic for detecting and handling blank lines in CSV parsing
  • Added comprehensive test coverage for edge cases involving empty fields, null values, and various delimiter scenarios
  • Refactored existing tests to reduce duplication and improve maintainability
  • Fixed a spelling error in a test variable name

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/Microsoft.PowerShell.Commands.Utility/commands/utility/CsvCommands.cs Updated Import method to correctly identify blank lines (values.Count == 0) instead of incorrectly treating single empty strings as blank lines. Changed ParseNextRecord to return instead of break for clarity and added logic to prevent empty results from being added at the start of records. Changed loop condition from while(true) to while(!EOF) for clarity.
test/powershell/Modules/Microsoft.PowerShell.Utility/Import-Csv.Tests.ps1 Fixed spelling of variable name (emptyValueCsv). Refactored duplicate test cases into a single loop-based test for better maintainability. Added extensive new test coverage for empty/null value handling across multiple contexts: empty CSV fields, header-only scenarios, empty input with -Header parameter, edge cases with whitespace and special characters, type information handling, delimiter edge cases, and large data scenarios.
test/powershell/Modules/Microsoft.PowerShell.Utility/ConvertFrom-Csv.Tests.ps1 Added comprehensive test coverage matching the Import-Csv tests, covering the same edge cases but using ConvertFrom-Csv instead of Import-Csv, ensuring both cmdlets handle empty and null values consistently.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +395 to +415
It 'Should handle no delimiter with empty fields' {
$result = @'
P1,P2,P3
A1
B1
C1
'@ | ConvertFrom-Csv

$result.Count | Should -Be 3
$result[0].P1 | Should -Be 'A1'
$result[0].P2 | Should -BeNullOrEmpty
$result[0].P3 | Should -BeNullOrEmpty
$result[1].P1 | Should -Be 'B1'
$result[1].P2 | Should -BeNullOrEmpty
$result[1].P3 | Should -BeNullOrEmpty
$result[2].P1 | Should -Be 'C1'
$result[2].P2 | Should -BeNullOrEmpty
$result[2].P3 | Should -BeNullOrEmpty
}
}
}
Copy link
Copy Markdown

@surfingoldelephant surfingoldelephant Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to watch out for -BeNullOrEmpty. That returns $true for both $null and empty strings (''), whereas most of the tests should be checking explicitly for $null or an empty string, not both.

For example, in the quoted test, P2 and P3 should only be $null, yet the test will pass if the value is ''.

I suggest replacing all instances of -BeNullOrEmpty to either -BeExactly $null or -BeExactly '' depending on the expected result.

Perhaps also update existing -Be usage to -BeExactly. It won't change the behavior in this case, but signals clearer intent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-General Indicates that a PR should be marked as a general cmdlet change in the Change Log Review - Needed The PR is being reviewed

Projects

Development

Successfully merging this pull request may close these issues.

CSV deserialization outputs nothing if there are only a certain number of empty fields

5 participants