HTML-Scraping with RegEx

by Oct 20, 2011

To scrape valuable information from websites with PowerShell you can download the HTML code and then use regular expressions to extract what you are after. That's not hard. Here is a sample:

$webclient = New-Object System.Net.WebClient
$html = $webclient.DownloadString('') | Out-String

$headerpattern = '(?i)<h1>(.*?)</h1>'

$header = ([regex]$headerpattern).Matches($html) |
  ForEach-Object { $_.Groups[1].Value }


It downloads the HTML content from and then extracts all <h1>…</h1> headers. That way, you get a quick headline overview.


