Problem
I have developed my custom solution for this. It happened that the first solution is using XPath queries and the second, a conceptually similar to the first one, is using CSS queries processed by sizzle.js
Here is the sample code for the second solution:
using System;
using System.Collections.Generic;
using System.Reflection;
using System.Windows.Forms;
namespace myTest.WinFormsApp
{
public partial class MainFormForSizzleTesting : Form
{
public MainFormForSizzleTesting()
{
InitializeComponent();
}
private void MainForm_Load(object sender, EventArgs e)
{
webBrowser1.DocumentText = @"
<html>
<body>
<img alt=""0764547763 Product Details""
src=""http://ecx.images-amazon.com/images/I/51AK1MRIi7L._AA160_.jpg"">
<hr/>
<h2>Product Details</h2>
<ul>
<li><b>Paperback:</b> 648 pages</li>
<li class=""test""><b>Publisher:</b> Wiley; Unlimited Edition edition (October 15, 2001)</li>
<li><b>Language:</b> English</li>
<li class=""test""><b>ISBN-10:</b> 0764547763</li>
</html>
";
}
private void cmdTest_Click(object sender, EventArgs e)
{
var processor = new WebBrowserControlCSSQueriesProcessor(webBrowser1);
// change attributes of the first element of the list
{
var li = processor.GetHtmlElement("li");
li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:green;'>{0}</span>", li.innerText);
}
// change attributes of the <li> elements with class = "test"
var list = processor.GetHtmlElements("li.test");
foreach (var li in list)
{
li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:blue;'>{0}</span>", li.innerText);
}
}
/// <summary>
/// Enables IE WebBrowser control to evaluate CSS queries
/// by injecting sizzle.js (http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js)
/// and to return CSS queries results to the calling C# code as strongly typed
/// mshtml.IHTMLElement and IEnumerable<mshtml.IHTMLElement>
/// </summary>
public class WebBrowserControlCSSQueriesProcessor
{
private System.Windows.Forms.WebBrowser _webBrowser;
public WebBrowserControlCSSQueriesProcessor(System.Windows.Forms.WebBrowser webBrowser)
{
_webBrowser = webBrowser;
injectScripts();
}
private void injectScripts()
{
// Thanks to: https://stackoverflow.com/questions/7998996/how-to-inject-javascript-in-webbrowser-control
HtmlElement head = _webBrowser.Document.GetElementsByTagName("head")[0];
HtmlElement scriptEl = _webBrowser.Document.CreateElement("script");
mshtml.IHTMLScriptElement element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
element.src = "http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js";
head.AppendChild(scriptEl);
string javaScriptText = @"
function GetElementsByCSSQuery (cssQuery) {
var items = Sizzle(cssQuery);
var elements = new Object();
var elementIndex = 1;
for (i = 0; i < items.length; i++) {
elements[elementIndex++] = items[i];
}
elements.length = elementIndex -1;
return elements;
};
";
scriptEl = _webBrowser.Document.CreateElement("script");
element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
element.text = javaScriptText;
head.AppendChild(scriptEl);
}
/// <summary>
/// Gets Html element's mshtml.IHTMLElement object instance using CSS query
/// </summary>
public mshtml.IHTMLElement GetHtmlElement(string cssQuery)
{
string code = string.Format("Sizzle('{0}')[0];", cssQuery);
return _webBrowser.Document.InvokeScript("eval", new object[] { code }) as mshtml.IHTMLElement;
}
/// <summary>
/// Gets Html elements' IEnumerable<mshtml.IHTMLElement> object instance using CSS query
/// </summary>
public IEnumerable<mshtml.IHTMLElement> GetHtmlElements(string cssQuery)
{
// Thanks to: https://stackoverflow.com/questions/5278275/accessing-properties-of-javascript-objects-using-type-dynamic-in-c-sharp-4
var comObject = _webBrowser.Document.InvokeScript("eval", new object[] { string.Format("GetElementsByCSSQuery('{0}')", cssQuery) });
Type type = comObject.GetType();
int length = (int)type.InvokeMember("length", BindingFlags.GetProperty, null, comObject, null);
for (int i = 1; i <= length; i++)
{
yield return type.InvokeMember(i.ToString(), BindingFlags.GetProperty, null, comObject, null) as mshtml.IHTMLElement;
}
}
}
}
}
When I’m running the above sample code I’m getting the following expected test results:
Could you please review the code and post your notes and remarks on what could be improved in it?
Solution
private void cmdTest_Click(object sender, EventArgs e)
This looks like your button is called cmdTest
, why? Hungarian notation is generally considered a bad thing and even then, why cmd
for a button? I think a good name for that button would be TestButton
.
WebBrowserControlCSSQueriesProcessor
That name is way too long, why not shorten it to something like WebBrowserCssProcessor
?
li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:green;'>{0}</span>", li.innerText);
li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:blue;'>{0}</span>", li.innerText);
These two lines are almost the same, consider extracting them into a method:
private static void ChangeStyle(mshtml.IHTMLElement element, string color)
{
element.innerHTML = string.Format(
"<span style='text-transform: uppercase;font-family:verdana;color:{1};'>{0}</span>",
element.innerText, color);
}
And use it like this:
ChangeStyle(li, "green");
ChangeStyle(li, "blue");
/// Enables IE WebBrowser control to evaluate CSS queries
/// by injecting sizzle.js (http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js)
Why link the included version of the file here? If I need to know that, I can look at the source code. I would either give no link here (and assume people can google sizzle.js) or link to the main page.
System.Windows.Forms.WebBrowser
No need to spell out the whole namespace every time, when you have using System.Windows.Forms
at the top of your file.
The same applies to the mshtml
namespace: you should put that into a using
.
HtmlElement scriptEl = _webBrowser.Document.CreateElement("script");
mshtml.IHTMLScriptElement element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
element.src = "http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js";
…
scriptEl = _webBrowser.Document.CreateElement("script");
element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
element.text = javaScriptText;
I don’t have much experience with WebBrowser
or mshtml
, but why are you using mshtml
here in the first place? Why not just use the HtmlElement
directly:
scriptEl.SetAttribute("src", "http://cdnjs.cloudflare.com/ajax/libs/sizzle/1.9.1/sizzle.min.js");
scriptEl.InnerText = javaScriptText;
Also, reusing variables (scriptEl
and element
) like this is not great. You should use different variables here (e.g. sizzleScriptElement
and functionScriptElement
).
_webBrowser.Document.InvokeScript("eval", new object[] { code }) as mshtml.IHTMLElement
_webBrowser.Document.InvokeScript("eval", new object[] { string.Format("GetElementsByCSSQuery('{0}')", cssQuery) })
Repeated code again, so extract it into a method again:
public T Eval<T>(string code)
{
return (T)_webBrowser.Document.InvokeScript("eval", new object[] { code });
}
Notice that I used a cast and not as
. That’s because when an error happens, cast gives you immediately a clear InvalidCastException
, while as
gives you a confusing NullReferenceException
later.
public IEnumerable<mshtml.IHTMLElement> GetHtmlElements(string cssQuery)
{
// Thanks to: http://stackoverflow.com/questions/5278275/accessing-properties-of-javascript-objects-using-type-dynamic-in-c-sharp-4
var comObject = _webBrowser.Document.InvokeScript("eval", new object[] { string.Format("GetElementsByCSSQuery('{0}')", cssQuery) });
Type type = comObject.GetType();
int length = (int)type.InvokeMember("length", BindingFlags.GetProperty, null, comObject, null);
for (int i = 1; i <= length; i++)
{
yield return type.InvokeMember(i.ToString(), BindingFlags.GetProperty, null, comObject, null) as mshtml.IHTMLElement;
}
}
From the linked question, it seems that accessing length
using dynamic
works, if you do it from JS first. I would do that, since it means you avoid writing all that reflection code.