Downloading Information from Internet (Part 3)

by ps1Apr 17, 2018

In previous tips, we showed how to use Invoke-WebRequest to download data from webpages, and process data delivered in JSON or XML format. Most webpages contain plain HTML data, however. You can use regular expressions to pick information from plain HTML.

This is how you get to webpage content:

$url = 'http://pages.cs.wisc.edu/~ballard/bofh/bofhserver.pl' $page = Invoke-WebRequest -Uri $url -UseBasicParsing $page.Content

The webpage used in this example provides random excuses. To get to the actual excuse, create a regular expression pattern:

$url = 'http://pages.cs.wisc.edu/~ballard/bofh/bofhserver.pl' $page = Invoke-WebRequest -Uri $url -UseBasicParsing $content = $page.Content $pattern = '(?s)<br><font size\s?=\s?"\+2">(.+)</font' if ($page.Content -match $pattern) { $matches[1] }

Whenever you run this code, it provides you with a new excuse. We are not going to dive into regular expressions here, but the pattern basically looks for static text like “)<br><font size\s?=\s?"\+2">”, then takes anything that follows (“(.+)”) up to the ending static text (“</font”). $matches[1] then takes the content of the first parens in your pattern which happens to be the excuse we were after.

ReTweet this Tip!

Downloading Information from Internet (Part 3)

Categories