Problem
I have a database (PostgreSQL) from which I want to migrate some data to a MS SQL Server database. I decided to use a code first approach. The question is related to the proper way of feeding my new tables.
Model.cs:
public class Info
{
[Key]
public int InfoID { get; set; }
public int NameID { get; set; }
public string Name { get; set; }
}
DatabaseInitializer:
private void SeedInfos(SomeContext context)
{
var Infos = new List<Info>();
using (var com = new NpgsqlCommand(PostgresSQLs.GetAllInfos, EstablishedConnection()))
{
using (var r = com.ExecuteReader())
{
while (r.Read())
{
Infos.Add(new Info { InfoID= Convert.ToInt32(r["infoID"]), NameID = Convert.ToInt32(r["NameId"]), Name = r["Name"].ToString()});
};
}
}
foreach (var Usr in Users)
context.Infos.Add(Usr);
}
One of my concerns is related to the static class PostgreSQL
where I store queries which are retrieving the data I need from PostgreSQL. Is it a proper approach? The more complicated query, the more complex code, so it may be hard to maintain later, isn’t it? What approach would you use?
Besides, such data feeding may cost a lot of memory in case the table is big. Is there a possibility to avoid memory leaks?
Solution
Variables should be named using camelCase
casing, see the NET naming guideline so Infos
-> infos
and Usr
-> usr
but as one shouldn’t use abbreviation for any names you should change it to users
which is much easier to read. This same rule applies to r
vs reader
and com
vs command
.
Using braces {}
always although they might be optional is the way to make your code less error prone.
foreach (var user in Users)
{
context.Infos.Add(user);
}
I don’t know where these Users
are coming from and what context.Infos
type is, but if it is by any chance a List<T>
you could use the AddRange()
method to add them all at once like context.Infos.AddRange(Users);
.
What sense does it have to fill a List<Info>
which is not used afterwards ?
Having a static class containing the queries can be ok, but you should at least add some possibility to add custom queries as well.
First your migration service should look something in the lines of
public class MigrationService : IMigrationService
{
private readonly IDatabaseContext _postgreContext;
private readonly IDatabaseContext _mssqlContext;
public ContactDataService(IDatabaseContext postgreContext, IDatabaseContext mssqlContext)
{
if (postgreContext == null) throw new ArgumentNullException(nameof(postgreContext));
if (mssqlContext == null) throw new ArgumentNullException(nameof(mssqlContext));
_navServiceContext = navServiceContext;
_mssqlContext = mssqlContext;
}
public void ImportInfosSeedData()
{
var infos = _postgreContext.GetInfosData();
_mssqlContext.PostInfosData(infos);
}
}
If doesn’t use a procedure GetInfosData and PostInfosData could look like
public IEnumerable<User> GetInfosData(){
return _postgreContext.Infos.Select(o => MapToUser(o));
}
public void PostInfosData(IEnumerable<User> users){
foreach (var user in users)
context.Infos.Add(user);
_mssqlContext.SaveChanges();
}
Note however that there are limitations on how many items a time you can save to context. I assume the number is seed data and collection size is under 100 instances. If you need to add millions, then you would need to create a transaction that updates them in ex: up to 1000 element batches.
If data amount is high then there is a risk for resource saturation. Then you not select all data
public IEnumerable<User> GetInfosData(int count = 0, int amount = 1000){
return _postgreContext.Infos.Skip(amount * count).Take(amount).Select(o => MapToUser(o));
}
This assumes database table has the primary key and concrete ordering and while you run this you do not add new elements. If it does not have an ordering you can use a view where the data is sorted.
Some databases do not provide full C# lamba support, ex: oracle and postgres. You might need to rephrase your select statement or do it in a server side procedure like you have done.