Web Page Scraping Using Selenium and .Net

Written on:May 10, 2012
Comments
Add One

Selenium is a popular Browser automation framework with bindings available in various flavors including( C#,Ruby,Python and Java).  From time to time I used different methods to scrap data from web including Javascript, jQuery, HTMLAgilityPack, Jsoup and so on.

Best thing about the selenium is this that you can use it to scrap pages which gets rendered using Ajax, Json or using templates. Here is how you can download Ajax powered pages using selenium.

First of all download the necessary bindings to be used to .Net from selenium download page.

selenium-web-scraping-ajax

I prefer to used LinQpad for all kind of scripting need. Even if you use Visual Studio, you will require to add reference to “ThoughtWorks.Selenium.Core.dll” and “WebDriver.dll” to run the project. The bare minimum code snippet to run selenium is below :

Now you have complete HTML pages saved locally. You can process it the way you want.