Problem
I have Two tables Circuit and Tag,
Circuit
------------------
Id Name Path
Tag
-----------
Id Name Circuit_Id
Now I have to loop through all the tags and update each tag where there is a matching path and name (circuit path and Tag name).
I am doing this and it works fine , but it’s really slow. Is there a way I could improve the speed.
foreach (var tag in allPDUTags)
{
var removedUnderScoreLast = tag.Path.Remove(tag.Path.LastIndexOf('_'));
var removedUnderScoreLastButOne = removedUnderScoreLast.Remove(removedUnderScoreLast.LastIndexOf('_'));
var circuitId = _context.Circuits.Where(c => c.Name == removedUnderScoreLastButOne).FirstOrDefault().Id;
tag.Circuit_Id = circuitId;
_counter++;
if ((_counter % 1000) == 0)
{
Console.WriteLine($"Updated {_counter} of {totalsTags} Tags");
}
}
Solution
var circuitId = _context.Circuits.Where(c => c.Name == removedUnderScoreLastButOne).FirstOrDefault().Id;
This line is effectively a join inside of your loop meaning that you are going to query the database for every tag in your list. So you have 40k extra queries, which seems likely to be the biggest cause of slowness.
I would focus on moving that join out of your loop. There are a couple of ways you can do that.
- Change the allPDUTags to be a the result of a join between Circuit and Tag.
- If you can drop out of entity framework writing SQL to update the rows in an efficient manner should be relatively simple given they are both in database tables. I am unsure of the DBMS you are using so I can’t provide sample code for that.
- Quick and dirty would be to put the list of circuits in memory. Calling _context.Circuits.ToList() would put them all in memory and then compare against that list in your loop. Assumes that circuits is small enough to fit into memory. This option won’t scale well and I wouldn’t really recommend it.
I would look closely at why Circuit.Path and tag.Name need to be synced in the first place. There might be a historical reason or they might be in 2 different databases/systems but if they are in the same application I would strongly consider normalizing the database and or adding constraints to avoid having to sync in the first place.
This is a TSQL approach. It should be pretty fast.
declare @Circuit table (id int identity primary key, name varchar(20));
declare @Tag table (id int identity primary key, path varchar(40), circuitID int);
insert into @tag (path) values ('asld_aslkjf_alskd'), ('asxx_askjf_alsyd_ lasdj');
insert into @Circuit (name) values ('asld'), ('asxx_askjf');
select * from @tag;
update t
set t.circuitID = c.id
from @tag t
join @Circuit c
on c.name = reverse(SUBSTRING(REVERSE(t.path), CHARINDEX('_', REVERSE(t.path), CHARINDEX('_', REVERSE(t.path)) + 1) + 1, 100))
where t.circuitID <> c.id or t.circuitID is null;
select * from @tag