Geeks With Blogs

Arthur Zubarev Compudicted

Recently I was assigned a task of helping a developer to cope with an unusual input file processing.

The file we needed to load into SQL Server had double quotes surrounding those fields that contained comma(s).

So a short excerpt from it would resemble something like:

123,ABC
456,"D,E,F"
"7,89",GHS

Since SSIS’s Flat File Connection does not support alternating text qualifiers, the chief difficulty stems from the fact if we even only remove the quotes, the file structure becomes broken because the extra commas would break such a field into additional columns. I proposed to replace the comma as our delimiter to the vertical pipe (|) character thus leaving commas intact. The quotes become then unnecessary.

Initially we wanted to pre-process the file by using a PowerShell script batch, but while testing it turned out that unless you use .Net objects in PowerShell it cannot operate on relatively large files, we were getting an out of memory errors from the PowerShell when using Get-Content or similar methods. The next attempt was to use the Data Flow Task’s Script Component set as a Transformation. It worked, here is how:

We created a new DFT to just reformat the input file

Transforming_DFT

The payload is basically implemented inside the Script Transformation, but I am showing how the input is set first that came from the flat file set to be consumed as a single row. I want to stress this out – it is important to configure your Flat File Source to “see” the input file as a single column file:

InputCols

The next step was to add the output that will be our modified row:

Input_And_Output

So nothing fancy is going on here and we are now ready to code:

   1:  ' Microsoft SQL Server Integration Services user script component
   2:  ' This is your new script component in Microsoft Visual Basic .NET
   3:  ' ScriptMain is the entrypoint class for script components
   4:   
   5:  Imports System
   6:  Imports System.Data
   7:  Imports System.Math
   8:  Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
   9:  Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
  10:   
  11:  Public Class ScriptMain
  12:      Inherits UserComponent
  13:   
  14:      Public Overrides Sub UnTransformedInput_ProcessInputRow(ByVal Row As UnTransformedInputBuffer)
  15:          '
  16:          ' Unquote and replace the comma to pipe (|) to become the delimiter
  17:          '
  18:   
  19:          Dim raw_string As String = String.Empty
  20:          Dim seacrhedPos As Int32
  21:          Dim firstCommaPos As Int32
  22:          Dim nextQuotePos As Int32
  23:          Dim section_to_modify As String
  24:   
  25:          raw_string = Row.UnTransformedOrgRow
  26:   
  27:          'MesssageBox to see the initial value
  28:          System.Windows.Forms.MessageBox.Show(raw_string)
  29:   
  30:          If raw_string.Length > 0 Then
  31:   
  32:              seacrhedPos = raw_string.IndexOf(",""", 0)
  33:   
  34:              While seacrhedPos <> -1
  35:                  firstCommaPos = raw_string.IndexOf(",", seacrhedPos + 1)
  36:   
  37:                  nextQuotePos = raw_string.IndexOf(""",", firstCommaPos + 1)
  38:   
  39:                  If nextQuotePos > 0 Then
  40:                      section_to_modify = raw_string.Substring(seacrhedPos + 1, raw_string.Length - seacrhedPos - (raw_string.Length - nextQuotePos))
  41:   
  42:                      section_to_modify = section_to_modify.Replace(",", "~").Replace("""", "")
  43:   
  44:                      'Replace 
  45:                      raw_string = raw_string.Substring(0, seacrhedPos + 1) & section_to_modify & _
  46:                      raw_string.Substring(nextQuotePos + 1, raw_string.Length - nextQuotePos - 1)
  47:   
  48:                  Else
  49:                      section_to_modify = raw_string.Substring(seacrhedPos + 1, raw_string.Length - seacrhedPos - 1)
  50:   
  51:                      section_to_modify = section_to_modify.Replace(",", "~").Replace("""", "")
  52:   
  53:                      'Replace 
  54:                      raw_string = raw_string.Substring(0, seacrhedPos + 1) & section_to_modify
  55:   
  56:                  End If
  57:   
  58:                  'Get the next set
  59:                  seacrhedPos = raw_string.IndexOf(",""", 0)
  60:              End While
  61:   
  62:              raw_string = raw_string.Replace(",", "|").Replace("~", ",")
  63:   
  64:          End If
  65:   
  66:          'MesssageBox to see the final value
  67:          System.Windows.Forms.MessageBox.Show(raw_string)
  68:   
  69:          Row.TransformedRow = raw_string
  70:      End Sub
  71:   
  72:  End Class
  73:   

This code will pop up a message box showing the pre-processed record, and then the end result, you can safely remove these two MessageBox.Show calls before productionizing your package.

The end result of this DFT will be a new flat file in this format:

123|ABC
456|D,E,F
7,89|GHS
Posted on Monday, September 19, 2011 9:41 PM | Back to top


Comments on this post: SSIS: How to Remove Occasional Quotes and Replace The Column Delimiter

# re: SSIS: How to Remove Occasional Quotes and Replace The Column Delimiter
Requesting Gravatar...
Do you have a version of this for VS2008? It seems some of the reference are not available in 2008 and the code won't run.

Thank you
Left by Trevor Tally on Jun 18, 2012 12:13 PM

# re: SSIS: How to Remove Occasional Quotes and Replace The Column Delimiter
Requesting Gravatar...
You basically need the code in the UnTransformedInput_ProcessInputRow routine only and it will work in SSIS 2008 and up.
Left by Arthur on Jun 21, 2012 8:09 PM

# re: SSIS: How to Remove Occasional Quotes and Replace The Column Delimiter
Requesting Gravatar...
Reviewing your solution/code here tells me that if the CSV column had |"This is a sentence, but"|"another sentence"|1234|5555| it will read them as 4 fields. However, if you take |"This is a sentence|but"|"another sentence"|1234|5555 this will fail...and read as 5 fields instead...
Left by Ahmad A. on Feb 06, 2013 10:13 PM

# re: SSIS: How to Remove Occasional Quotes and Replace The Column Delimiter
Requesting Gravatar...
This is great! Thank you very much for posting! Solved a big problem for me.
Left by strapp on Mar 05, 2014 11:58 AM

# re: SSIS: How to Remove Occasional Quotes and Replace The Column Delimiter
Requesting Gravatar...
Why is this not working for me
can you explain what do you mean by
You basically need the code in the UnTransformedInput_ProcessInputRow routine only and it will work in SSIS 2008 and up.
I am very new to VB and I am having hard time, I copied exact thing you have
I get errror
Public Overrides Sub UnTransformedInput_ProcessInputRow(ByVal Row As UnTran
here and
on Get the next one
thank you
Left by relin on Feb 01, 2015 12:49 AM

Your comment:
 (will show your gravatar)


Copyright © Compudicted | Powered by: GeeksWithBlogs.net