Automating “Live” Websites

by Oct 8, 2018

Categories

Tags

Administration agent-based monitoring Agentless Monitoring alert responses alert thresholds alerting Alerts Amazon Aurora Amazon EC2 Amazon RDS Amazon RDS / Aurora Amazon RDS for SQL Server Amazon Redshift Amazon S3 Amazon Web Services (AWS) Analytics application monitoring Aqua Data Studio automation availability Azure Azure SQL Database azure sql managed instance Azure VM backup Backup and recovery backup and restore backup compression backup status Backup Strategy backups big data Blocking bug fixes business architecture business data objects business intelligence business process modeling business process models capacity planning change management cloud cloud database cloud database monitoring cloud infrastructure cloud migration cloud providers Cloud Readiness Cloud Services cloud storage cloud virtual machine cloud VM clusters code completion collaboration compliance compliance audit compliance audits compliance manager compliance reporting conference configuration connect to database cpu Cross Platform custom counters Custom Views customer survey customer testimonials Dark Theme dashboards data analysis Data Analytics data architect data architecture data breaches Data Collector data governance data lakes data lineage data management data model data modeler data modeling data models data privacy data protection data security data security measures data sources data visualization data warehouse database database administration database administrator database automation database backup database backups database capacity database changes database community database connection database design database developer database developers database development database diversity Database Engine Tuning Advisor database fragmentation database GUI database IDE database indexes database inventory management database locks database management database migration database monitoring database navigation database optimization database performance Database Permissions database platforms database profiling database queries database recovery database replication database restore database schema database security database support database synchronization database tools database transactions database tuning database-as-a-service databases DB Change Manager DB Optimizer DB PowerStudio DB2 DBA DBaaS DBArtisan dBase DBMS DDL Debugging defragmentation Demo diagnostic manager diagnostics dimensional modeling disaster recovery Download drills embedded database Encryption End-user Experience entity-relationship model ER/Studio ER/Studio Data Architect ER/Studio Enterprise Team Edition events execution plans free tools galera cluster GDPR Getting Started Git GitHub Google Cloud Hadoop Healthcare high availability HIPAA Hive hybrid clouds Hyper-V IDERA IDERA ACE Index Analyzer index optimization infrastructure as a service (IaaS) infrastructure monitoring installation Integrated Development Environment interbase Inventory Manager IT infrastructure Java JD Edwards JSON licensing load test load testing logical data model macOS macros managed cloud database managed cloud databases MariaDB memory memorystorage memoryusage metadata metric baselines metric thresholds Microsoft Azure Microsoft Azure SQL Database Microsoft PowerShell Microsoft SQL Server Microsoft Windows MongoDB monitoring Monitoring Tools Monyog multiple platforms MySQL news newsletter NoSQL Notifications odbc optimization Oracle PeopleSoft performance Performance Dashboards performance metrics performance monitoring performance schema performance tuning personally identifiable information physical data model Platform platform as a service (PaaS) PostgreSQL Precise Precise for Databases Precise for Oracle Precise for SQL Server Precise Management Database (PMDB) product updates Project Migration public clouds Query Analyzer query builder query monitor query optimization query performance Query Store query tool query tuning query-level waits Rapid SQL rdbms real time monitoring Real User Monitoring recovery regulations relational databases Releases Reporting Reports repository Restore reverse engineering Roadmap sample SAP Scalability Security Policy Security Practices server monitoring Server performance server-level waits Service Level Agreement SkySQL slow query SNMP snowflake source control SQL SQL Admin Toolset SQL CM SQL code SQL coding SQL Compliance Manager SQL Defrag Manager sql development SQL Diagnostic Manager SQL Diagnostic Manager for MySQL SQL Diagnostic Manager for SQL Server SQL Diagnostic Manager Pro SQL DM SQL Doctor SQL Enterprise Job Manager SQl IM SQL Inventory Manager SQL Management Suite SQL Monitoring SQL Performance SQL Quality SQL query SQL Query Tuner SQL Safe Backup SQL script SQL Secure SQL Security Suite SQL Server sql server alert SQL Server Migration SQL Server Performance SQL Server Recommendations SQL Server Security SQL statement history SQL tuning SQL Virtual Database sqlmemory sqlserver SQLyog Storage Storage Performance structured data Subversion Support tempdb tempdb data temporal data Tips and Tricks troubleshooting universal data models universal mapping unstructured data Uptime Infrastructure Monitor user experience user permissions Virtual Machine (VM) web services webinar What-if analysis WindowsPowerShell

Occasionally, there is the need to automate tasks on websites that have been opened manually. Maybe you need to log into internal web pages first using some web forms. Provided the website is hosted in Internet Explorer (not Edge or any 3rd-party browser), you can use a COM interface to access the live browser content.

This can even be valuable for plain “HTML-scraping” when you visit dynamic web pages. A pure WebClient (or the cmdlet Invoke-WebRequest) would always return only the static HTML which is not what users see in their browsers. When you use a real browser to show website content, your scripts can access the full HTML that drives the display.

To test-drive this, open Internet Explorer or Edge, and navigate to a website of your choice. In our example, we navigate to www.powershellmagazine.com.

$obj = New-Object -ComObject Shell.Application
$browser = $obj.Windows() | 
    Where-Object FullName -like '*iexplore.exe' |
    # adjust the below to match your URL
    Where-Object LocationUrl -like '*powershellmagazine.com*' |
    # take the first browser that matches in case there are
    # more than one
    Select-Object -First 1

In $browser, you now have access to the object model of the live browser. If $browser is empty, make sure you adjusted the filter for LocationUrl in the code so it matches your URL. Do not forget the asterisks at both ends.

If you wanted to scrape all images off the website, this is how you would get the list of images:

 
$browser.Document.images | Out-GridView 

Likewise, if you wanted to scrape information off the website content, this line returns the page HTML:

 
PS> $browser.Document.building.innerHTML

You could now use regular expressions to scrape content. There is one limitation though: if you need to perform additional actions in the context of the logged-in web visitor, you are out of luck. For example, if you wanted to download files that require a web login to access, you would have to invoke the download process via the Internet Explorer object model.

You would not be able to use Invoke-WebRequest or another simple web client to download the file because PowerShell runs in its own context, and to the website, appears as an anonymous visitor.

Using the Internet Explorer object model to perform more advanced actions such as downloading files or videos isn’t entirely impossible. It is just very complex because essentially, you would need to send clicks and key strokes to the user interface.

Twitter This Tip! ReTweet this Tip!