Problem
I have a problem where I need to replace every base part of URLs in a body of text. However, I only want to replace it in specific URLs. Only URLs that go to “resource.aspx” that have the “source” query string parameter need to get the replacement. Other URLs could have the target base part and we don’t want to touch those. The resource urls are expected to appear inside quotes.
I made a UDF that takes the text body, the base URL you want to replace, and the base URL to replace with and searches through the text and does the base URL replacement on any “resource URLs” it finds.
Does the algorithm look sound? Anything I missed? Better ways to do this?
I was sad to learn that REPLACE() does not work with a pattern, even though it’s second paramter is named “string_pattern” :(. I was hoping it would work like .NET’s Regex.Replace() (I know we don’t have regex in t-sql, but a pattern based replace function would be nice sigh)
-- =============================================
-- Author: John Doe
-- Create date: 11/29/2016
-- Description: Replaces base URLs in text body of every URL that points to
-- resource.aspx that has "source" query string parameter.
-- Replaces specified base url with specified replacement
-- =============================================
CREATE FUNCTION dbo.udfReplaceResourceBaseURLs
(
@TextBody NVARCHAR(MAX),
@BaseUrlToReplace NVARCHAR(MAX),
@BaseUrlReplacement NVARCHAR(MAX)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
-- Create Copy of the original body. We will search and modify the copy in the loop
DECLARE @EmailBodyCopy NVARCHAR(MAX)
SET @EmailBodyCopy = @TextBody
DECLARE @ResourceURLPattern NVARCHAR(MAX) = '%' + @BaseUrlToReplace + '/myResource.aspx%source=%'
-- Do replacements while resource URLs are found in the copy
DECLARE @resourceURLStart INT = PATINDEX(@ResourceURLPattern, @EmailBodyCopy)
WHILE @resourceURLStart > 0
BEGIN
-- Get the full URL of the found resource URL
DECLARE @resourceURLEnd INT = CHARINDEX('"', @EmailBodyCopy, @resourceURLStart)
DECLARE @resourceURL NVARCHAR(MAX) = SUBSTRING(@EmailBodyCopy, @resourceURLStart, (@resourceURLEnd - @resourceURLStart))
-- Replace base URL to get the new URL with the desired base URL replaced in
DECLARE @newResourceURL NVARCHAR(MAX) = REPLACE(@resourceURL, @BaseUrlToReplace, @BaseUrlReplacement)
-- Replace the URL in the original body with our new one
SET @TextBody = REPLACE(@TextBody, @resourceURL, @newResourceURL)
-- Chop off the front of the body copy up to the end of the current resource URL
-- so that we find the next resource url on next pass
SET @EmailBodyCopy = SUBSTRING(@EmailBodyCopy, @resourceURLEnd, 999999999) -- get to index infinity to make sure grab to end
SET @resourceURLStart = PATINDEX(@ResourceURLPattern, @EmailBodyCopy)
END
RETURN @TextBody
END
Solution
Your code looping over string functions doesn’t seem like something SQL Server is optimized for.
Database systems are optimized for set based operations, not looping. I understand the confusion coming from procedural/Object Oriented languages, but in the end SQL isn’t a programming language in it’s classical form but a way to instruct an RDBMS what data you need.
Also, UDF’s do all sort of weird things with execution plans, like hiding the actual cost in a plan, and often removing the option to choose a parallel plan for the optimizer.
I wouldn’t try to roll my own UDF for something like this.
The best way IMO would be to figure out a way to do it inline or set based.
I don’t have any sample data and expected output to work with but I could imagine a
UPDATE [table]
SET [url] = REPLACE([url], 'base_url', 'new_url')
WHERE [url] LIKE '%<regex>%';
kind of construct
You might even do it in the application code (such as retrieving a datatable, fixing the data and saving the datatable again).
If you insist on doing it in a function I’d just either write a SQLCLR function or go and download the sqlsharp library and use that.
If you are worried about performance then read the article posted by the creator of the library comparing performance.
I highly doubt your implementation will work better or faster than a .NET regex solution.