Skip to content

Add TSV file support to CsvConverter#2027

Open
shrajal01 wants to merge 2 commits into
microsoft:mainfrom
shrajal01:feature-tsv-support
Open

Add TSV file support to CsvConverter#2027
shrajal01 wants to merge 2 commits into
microsoft:mainfrom
shrajal01:feature-tsv-support

Conversation

@shrajal01
Copy link
Copy Markdown

Summary

Adds support for TSV (Tab-Separated Values) files in CsvConverter.

Changes

  • Added .tsv to accepted file extensions
  • Added text/tab-separated-values MIME type support
  • Added TSV delimiter handling (\t)
  • Added test coverage and sample TSV test file

Fixes #2022

@shrajal01
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Copy link
Copy Markdown

@trippinganymess trippinganymess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the delimiter should explicitly check both the extension and the MIME type before being intialized.


# Parse CSV content
reader = csv.reader(io.StringIO(content))
delimiter = "\t" if (stream_info.extension or "").lower() == ".tsv" else ","
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the accept() function, uses the MIME type and the extension of a file to determine it's type. so In a scenario where a file arrives from a cloud service with no extension, but MIME type would be accepted. but You are currently setting the delimiter by only checking the extension type and not the MIME type. This will cause the function to crash in case a file without an extension is accepted and will automatically default to "," every time.

Example scenario :

  1. file received from cloud ( without extension but with MIME type "tab-seperated-values")
  2. the accept function allows this file and returns True.
  3. delimiter = "\t" if (stream_info.extension or "").lower() == ".tsv" else "," doesn't account for the MIME type and defaults to "," which crashes the whole program.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also has been addressed in #2021

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Good catch. I'll update the delimiter detection to consider both the file extension and MIME type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support TSV (Tab-Separated Values) files

2 participants