Rhino - ETL
Using Rhino-ETL ( a C# based) framework for developing standard ETL's is pretty easy and one can do a lot of fun stuff with the underlying source data. I just wrote up a quick console app to generate data into a text file and push the same data into a table in SQL Server as well as an external file. Here are the following steps (for the external file push):
1. Create a new C# console application solution in Visual Studio.
2. Target the .Net framework as shown in the below screen shot in your project properties:-
3. Create 3 Sub Folders underneath your project as shown in the following screen shot
1. Create a new C# console application solution in Visual Studio.
2. Target the .Net framework as shown in the below screen shot in your project properties:-
3. Create 3 Sub Folders underneath your project as shown in the following screen shot
DataObjects --> Contains the class files associated with each and every table/file in your environment. Example:- if your source file contains student data, then you would create a class file called Student with the individual properties (in relation to the properties) exposed (nouns).
Operations--> This primarily contains the class files that contain the activities (adjectives) that need to be performed on the DataObjects. Example:- Writing the Student Data to a database, Reading the Student data from a file etc.
WorkFolder--> Contains the external file sources to interact with. Example:- a flat file, a csv or a tsv. In this case it will be student.txt.
Lets write some code to insert a student record from a flat file into another flat file.......(as simple as it sounds)
4. Create a class file called StudentRecord.cs (pipe delimited) and declare the required entity attributes as shown in the following code snippet:-
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using System; | |
using System.Collections.Generic; | |
using System.Linq; | |
using System.Text; | |
using FileHelpers; | |
namespace FirstSampleRhinoETL.DataObjects | |
{ | |
[DelimitedRecord("|"), IgnoreCommentedLines("Commenting Lines"), IgnoreFirst] | |
class StudentRecord | |
{ | |
int id; | |
string name; | |
string address; | |
int classid; | |
int marks; | |
public int StudentId | |
{ | |
get { return id; } | |
set { id = value; } | |
} | |
public string StudentName | |
{ | |
get { return name; } | |
set { name = value; } | |
} | |
public string StudentAddress | |
{ | |
get { return address; } | |
set { address = value; } | |
} | |
public int StudentClassId | |
{ | |
get { return classid; } | |
set { classid = value; } | |
} | |
public int StudentMarks | |
{ | |
get { return marks; } | |
set { marks = value; } | |
} | |
} | |
} |
Contains records in the following manner (student.txt)
StudentId|StudentName|StudentAddress|StudentClassId|StudentMarks //header
1|Ishwar|TestAddress|1|85 //row
5. Create a class file called NewStudentRecord (which contains the attributes that need to be transferred to the new file)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using System; | |
using System.Collections.Generic; | |
using System.Linq; | |
using System.Text; | |
namespace FirstSampleRhinoETL.DataObjects | |
{ | |
using FileHelpers; | |
[DelimitedRecord("\t"), IgnoreFirst] | |
public class UserFullRecord | |
{ | |
public int sId; | |
public string sName; | |
public string sAddress; | |
public int sclass; | |
} | |
} |
This will be outputted in the following manner(tab separated)
StudentId\tStudentName\tStudentAddress\tStudentClass
1\tIshwar\tTestAddress\t1
Let us now create the action called student write i.e. Let us go about writing this out to an output file called studentoutput.txt and I am creating a new C# class file called StudentWriteFile which will be as shown in the following code snippet:-
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using System; | |
using System.Collections.Generic; | |
using System.Linq; | |
using System.Text; | |
using FileHelpers; | |
using Rhino.Etl.Core.Files; | |
using Rhino.Etl.Core.Operations; | |
using Rhino.Etl.Core; | |
using FirstSampleRhinoETL.DataObjects; | |
namespace FirstSampleRhinoETL.Operations | |
{ | |
class StudentWriteFile : AbstractOperation | |
{ | |
public StudentWriteFile(string filePath) | |
{ | |
this.filePath = filePath; | |
} | |
string filePath = null; | |
public override IEnumerable<Row> Execute(IEnumerable<Row> rows) | |
{ | |
FluentFile engine = FluentFile.For<NewStudentRecord>(); | |
engine.HeaderText = "Id\tsName\tsAddress\tsclass"; | |
using (FileEngine file = engine.To(filePath)) | |
{ | |
foreach (Row testRow in rows) | |
{ | |
Row row = new Row(); | |
//row.Copy(leftRow); | |
//copy over all properties not in the student records | |
row["sId"] = testRow["StudentId"]; | |
row["sName"] = testRow["StudentName"]; | |
row["sAddress"] = testRow["StudentAddress"]; | |
row["sclass"] = testRow["StudentClassId"]; | |
file.Write(row.ToObject<NewStudentRecord>()); | |
//pass through rows if needed for another later operation | |
yield return row; | |
} | |
} | |
} | |
} | |
} |
Now let us go about writing the main program... create the main program in the following manner:-
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using System; | |
using System.Collections.Generic; | |
using System.Linq; | |
using System.Text; | |
using Rhino.Etl.Core; | |
using FirstSampleRhinoETL.Operations; | |
using Rhino.Etl.Core.Operations; | |
using Rhino.Etl.Core.Pipelines; | |
namespace FirstSampleRhinoETL | |
{ | |
public class MainProgram: EtlProcess | |
{ | |
protected override void Initialize(){ | |
Register(new StudentRead(Settings.Default.StudentFile)); | |
Register(new StudentWriteFile(Settings.Default.OutputFile)); | |
} | |
protected override void PostProcessing(){ | |
base.PostProcessing(); | |
foreach (var error in GetAllErrors()) | |
throw error; | |
} | |
} | |
} | |
The setting's values basically point to the settings files that I have created which contains the absolute path of the student.txt and the studentoutput.txt files.
After which in your main just initialize the MainProgram in the following manner:-
new MainProgram().Execute();
and you will have your first rhino-etl to rock and roll with......
Comments
Anyhow I have a small doubt with the example you have provided.
1)"StudentRead" Cant find this class.
public StudentRead(string filePath) { this.filePath = filePath; }
string filePath = null; public override IEnumerable Execute(IEnumerable rows) { FluentFile engine = FluentFile.For(); engine.HeaderText = "Id\tsName\tsAddress\tsclass"; using (FileEngine file = engine.To(filePath)) { foreach (Row testRow in rows) { Row row = new Row(); //row.Copy(leftRow); //copy over all properties not in the student records row["sId"] = testRow["StudentId"]; row["sName"] = testRow["StudentName"]; row["sAddress"] = testRow["StudentAddress"]; row["sclass"] = testRow["StudentClassId"]; file.Read(row.ToObject()); //pass through rows if needed for another later operation yield return row; } }