OneCloud

CSV Transformation

The OneCloud BizApp for CSV Transformation enables the preparation of data prior to loading into target systems. Using CSV Transformation allows tabular data to be massaged into the required format, by splitting data sets based upon content in a record, rules-based filtering, and combining datasets from multiple sources.

For documentation on all available commands, see the CSV Transformation Command Reference.

Connection Setup

To enable OneCloud to perform CSV transformation, you'll need to navigate to the connection manager to create a connection.

From here, choose CSV Transformation ("Lightweight LETL") from the service drop-down and select the appropriate Runner.

Connection Example

Examples

Transforming a Tabular Dataset

The following example will download a file from Google Drive and then walk through various CSV Transformation commands including Find and Replace, Column Filter, Insert Column, Join Columns, and Filter Rows. As a starting point, here is a preview of the initial file.

Step 1: Create a Chain
From the OneCloud Build page, select the blue "+" icon, name the chain, and save.

Step 2: Create first command, Download File
The first command will download a file from Google Drive (pictured above) as a starting point for the CSV Transformation. Add the first command by selecting Google Drive -> Download File from the modal. Next, fill out the parameters of the command by selecting the Connection, Google Drive File Name, and Folder Name.

Step 3: Create the second command, Replace New York - NY
The next command will transform the Market column above by finding all New York values and replacing them with NY. Add this command by selecting Command -> New command -> Insert after -> Success and then CSV Transformation -> Transform -> Find and Replace from the modal. In the command builder, set the connection and property values for Input file, Delimiter, Find pattern, Match pattern type, and Replacement value. Be sure to set the Input file to the output of the prior command (Download File), the Delimiter to Comma, the Find pattern to New York, the Match pattern type to Exact, and the Replacement value to NY.

Step 4: Create third command, Remove ID & Pkg Type
Next, add a command to remove columns ID and Pkg Type from the starting file and group this command with the previous one. To do this, add a command by selecting New command -> Group with above command and then CSV Transformation -> Transform -> Column Filter. Choose the connection, and set property values. Set the Input file to the output of the prior command, th Delimiter to Comma, Pattern Type to Exact, and Pattern to ID,Pkg Type.

Step 5: Create fourth command, Make Year: 2018
The next command will insert a new column into the file to hold "FiscalYear". Add a command by selecting New command -> Group with above command and then CSV Transformation -> Transform -> Insert Column from the modal. Set the CSV Transformation connection, Input file to the output of the previous command, Delimiter to Comma, Header text to FiscalYear, Data value to 2018, and Insert index to 1. Note: the Insert index starts at 0 and is the column index at which to insert the new column.

Step 6: Create fifth command, Join Year & FiscalYear
In the fifth command, join columns Year and FiscalYear into a single column. Add a new command by selecting New command -> Group with above command and then CSV Transformation -> Transform -> Join Columns. Set the CSV Transformation connection, Input file to the output of the previous command, Delimiter to Comma, Joined column index to 0, Match pattern type to Exact, Match pattern to Year,FiscalYear, Joined column header to TimePeriod, Join text to -, and select the Discard option. Look at the first data row in the sample CSV above, this will effectively create a value of "Jan-2018" (remember, in the previous command, a column was created to hold "FiscalYear" and set all values to 2018).

Step 7: Create sixth command, Filter % Records
The last command will filter all rows where column "Measures" is equal to Margin % or Profit %. Add a new command by selecting New command -> Group with above command and then CSV Transformation -> Transform -> Filter Rows. Set the CSV Transformation connection, Input file to the output of the previous command, Delimiter to Comma, Find pattern to (?:Margin %|Profit %), Match pattern type to Regular Expression, and Search Columns to 1. Remember, column "0" is now "TimePeriod", making "Measures" column "1".

Next, publish and run the chain and the output should look like the following:

Filtering Data with Branching Logic

This example will walk through creating branching logic to create two separate files that will be uploaded to Google Drive. This example will add onto the chain in the first example, Fact Data Transformation.

Step 1: Add command, Filter on Actuals
Add a new command to the chain Fact Data Transformation to filter only on records that represent "Actual" data. Select New command -> Insert after -> Success and then CSV Transformation -> Transform -> Filter Rows. Set the CSV Transformation connection, Input file to be the output of the previous command "Filter % Records", Delimiter to Comma, Find a pattern to Actual, Match pattern type to Exact, Search Columns to 4 (this should be the "Scenario" column), and check the Inverse option. Selecting Inverse means that the filter will keep only the rows that match, rather than removing the rows.

Step 2: Add command, Upload Actuals
Next, add a command to upload Actual data to Google Drive. Add a command by selecting New command -> Insert after -> Success and then Google Drive -> Upload file. Set the Google Drive connection, File to the output of the previous command, Folder name to a Google Drive folder, Upload Name to Actuals.csv, and select Overwrite. The Overwrite option will write over the file if it already exists.

Step 3: Add command, Filter on Budget
This command will filter only on records that represent "Budget data" and demonstrates how branching logic can be used in OneCloud. To add this command, select the "+" icon in the white box between commands "Filter % Records" and "Filter on Actuals". From here, select New command -> Insert new branch -> Success and then CSV Transformation -> Transform -> Filter Rows from the modal. In the command builder, set the CSV Transformation connection, Input file to be the output of command "Filter & Records", Delimiter to Comma, Find pattern to Budget, Match pattern type to Exact, Search Columns to 4, and Inverse selected.

Step 4: Add command, Upload Budget
The last command will upload the output of the previous command to Google Drive. Add a command under "Filter on Budget" by selecting New command -> Insert after -> Success and then Google Drive -> Upload file. Set the Google Drive connection, File to the output of the previous command, Folder name to a Google Drive folder, Upload Name to Budget.csv, and select Overwrite.

Publish and run the chain. The files in the Google Drive folder should look like the following example:

Using Regular Expressions

The OneCloud commands Column Filter, Filter Rows, and Find and Replace all have a Pattern type option to use Regular Expressions (a.k.a. "Regex"). Regex is a powerful string of text that allows patterns to be matched and located. Here are some examples:

Character
What does it do?
Example
Matches

^

Matches beginning of line

^abc

abc, abcdef.., abc123

$

Matches end of line

abc$

my:abc, 123abc, theabc

.

Match any character

a.c

abc, asg, a123c

|

OR operator

abc|xyz

abc or xyz

(...)

Capture anything matched

(a)b(c)

Captures 'a' and 'c'

[...]

Matches anything contained in brackets

[abc]

a,b, or c

[a-z]

Matches any characters between 'a' and 'z'

[b-z]

bc, mind, xyz

{x}

The exact 'x' amount of times to match

(abc){2}

abcabc

{x,}

Match 'x' amount of times or more

(abc){2,}

abcabc, abcabcabc

*

Greedy match that matches everything in place of the *

ab*c

abc, abbcc, abcdc

+

Matches character before + one or more times

a+c

ac, aac, aaac,

?

Matches the character before the ? zero or one times. Also, used as a non-greedy match

ab?c

ac, abc

\

Escape the character after the backslash or create an escape sequence.

a\sc

a c

Common Issues

Wrong Delimiter

When configuring a transformation command, often the wrong delimiter is set and the transformation does not execute as expected.

Not a proper CSV

If the tabular data set is not proper CSV, the OneCloud transformation command simply will not execute as OneCloud will check to see if the format is compliant with RFC 4180 before processing the input. A proper CSV:

  • Has data stored in plain text using a character set such as ASCII, various Unicode character sets (e.g. UTF-8), EBCDIC, or Shift JIS
  • Consists of records with one record per line with the records divided into fields separated by delimiters (typically a single reserved character such as a comma, semicolon, or tab; sometimes the delimiter may include optional spaces)
  • Has the same sequence of fields for every record
  • Is typically a flat file or relational data report output

Inconsistent number of columns in each record

If records in a tabular dataset have different column counts, then OneCloud will detect that it is not a proper CSV

Different Column Counts

Combining two proper CSV tabular data sets with different column counts will simply not work and the Stack Files command will display an error.