Scraping Information from Web Pages

by ps1Oct 6, 2010

Regular expressions are a great way of identifying and retrieving text patterns. Take a look at the next code fragment as it defines a RegEx engine that searches for HTML divs with a "post-summary" attribute, then reads the PowerShell team blog and returns all summaries from all posts in clear text:

$regex = [RegEx]'<div class="post-summary">(.*?)</div>'

$url = 'http://blogs.msdn.com/b/powershell/'
$wc = New-Object System.Net.WebClient
$content = $wc.DownloadString($url)

$regex.Matches($content) | Foreach-Object { $_.Groups[1].Value }

ReTweet this Tip!

Free Trial

SQL Diagnostic Manager

SQL Compliance Manager

SQL Secure

SQL Safe Backup

SQL Inventory Manager

SQL Admin Toolset

Cross-Platform Product

Aqua Data Studio

ER/Studio

Free Tools

Free Trial

Resources

Support

Events

Contact Sales

Customers

Free Trial

Enterprises

Database

Cloud Services

Applications

Scraping Information from Web Pages

Categories