Performance (Part 3): Faster Pipeline Functions

by Oct 19, 2018

Categories

Tags

Administration agent-based monitoring Agentless Monitoring alert responses alert thresholds alerting Alerts Amazon Aurora Amazon EC2 Amazon RDS Amazon RDS / Aurora Amazon RDS for SQL Server Amazon Redshift Amazon S3 Amazon Web Services (AWS) Analytics application monitoring Aqua Data Studio automation availability Azure Azure SQL Database azure sql managed instance Azure VM backup Backup and recovery backup and restore backup compression backup status Backup Strategy backups big data Blocking bug fixes business architecture business data objects business intelligence business process modeling business process models capacity planning change management cloud cloud database cloud database monitoring cloud infrastructure cloud migration cloud providers Cloud Readiness Cloud Services cloud storage cloud virtual machine cloud VM clusters code completion collaboration compliance compliance audit compliance audits compliance manager compliance reporting conference configuration connect to database cpu Cross Platform custom counters Custom Views customer survey customer testimonials Dark Theme dashboards data analysis Data Analytics data architect data architecture data breaches Data Collector data governance data lakes data lineage data management data model data modeler data modeling data models data privacy data protection data security data security measures data sources data visualization data warehouse database database administration database administrator database automation database backup database backups database capacity database changes database community database connection database design database developer database developers database development database diversity Database Engine Tuning Advisor database fragmentation database GUI database IDE database indexes database inventory management database locks database management database migration database monitoring database navigation database optimization database performance Database Permissions database platforms database profiling database queries database recovery database replication database restore database schema database security database support database synchronization database tools database transactions database tuning database-as-a-service databases DB Change Manager DB Optimizer DB PowerStudio DB2 DBA DBaaS DBArtisan dBase DBMS DDL Debugging defragmentation Demo diagnostic manager diagnostics dimensional modeling disaster recovery Download drills embedded database Encryption End-user Experience entity-relationship model ER/Studio ER/Studio Data Architect ER/Studio Enterprise Team Edition events execution plans free tools galera cluster GDPR Getting Started Git GitHub Google Cloud Hadoop Healthcare high availability HIPAA Hive hybrid clouds Hyper-V IDERA IDERA ACE Index Analyzer index optimization infrastructure as a service (IaaS) infrastructure monitoring installation Integrated Development Environment interbase Inventory Manager IT infrastructure Java JD Edwards JSON licensing load test load testing logical data model macOS macros managed cloud database managed cloud databases MariaDB memory memorystorage memoryusage metadata metric baselines metric thresholds Microsoft Azure Microsoft Azure SQL Database Microsoft PowerShell Microsoft SQL Server Microsoft Windows MongoDB monitoring Monitoring Tools Monyog multiple platforms MySQL news newsletter NoSQL Notifications odbc optimization Oracle PeopleSoft performance Performance Dashboards performance metrics performance monitoring performance schema performance tuning personally identifiable information physical data model Platform platform as a service (PaaS) PostgreSQL Precise Precise for Databases Precise for Oracle Precise for SQL Server Precise Management Database (PMDB) product updates Project Migration public clouds Query Analyzer query builder query monitor query optimization query performance Query Store query tool query tuning query-level waits Rapid SQL rdbms real time monitoring Real User Monitoring recovery regulations relational databases Releases Reporting Reports repository Restore reverse engineering Roadmap sample SAP Scalability Security Policy Security Practices server monitoring Server performance server-level waits Service Level Agreement SkySQL slow query SNMP snowflake source control SQL SQL Admin Toolset SQL CM SQL code SQL coding SQL Compliance Manager SQL Defrag Manager sql development SQL Diagnostic Manager SQL Diagnostic Manager for MySQL SQL Diagnostic Manager for SQL Server SQL Diagnostic Manager Pro SQL DM SQL Doctor SQL Enterprise Job Manager SQl IM SQL Inventory Manager SQL Management Suite SQL Monitoring SQL Performance SQL Quality SQL query SQL Query Tuner SQL Safe Backup SQL script SQL Secure SQL Security Suite SQL Server sql server alert SQL Server Migration SQL Server Performance SQL Server Recommendations SQL Server Security SQL statement history SQL tuning SQL Virtual Database sqlmemory sqlserver SQLyog Storage Storage Performance structured data Subversion Support tempdb tempdb data temporal data Tips and Tricks troubleshooting universal data models universal mapping unstructured data Uptime Infrastructure Monitor user experience user permissions Virtual Machine (VM) web services webinar What-if analysis WindowsPowerShell

In previous tips we illustrated how you can improve loops and especially pipeline operations. To transfer this knowledge to functions, take a look at a simple pipeline-aware function which counts the number of piped elements:

function Count-Stuff
{
    param
    (
        # pipeline-aware input parameter
        [Parameter(ValueFromPipeline)]
        $InputElement
    )

    begin 
    {
        # initialize counter
        $c = 0
    }
    process
    {
        # increment counter for each incoming object
        $c++
    }
    end
    {
        # output sum
        $c
    }


}

When you run this function and count a fairly large number of objects, this is what you might get:

 
PS> $start = Get-Date
1..1000000 | Count-Stuff
(Get-Date) - $start

1000000

...
TotalMilliseconds : 3895,5848 
...
 

Now let’s turn the function into a “simple function” by avoiding attributes:

function Count-Stuff
{
    begin 
    {
        # initialize counter
        $c = 0
    }
    process
    {
        # increment counter for each incoming object
        $c++
    }
    end
    {
        # output sum
        $c
    }
}

Since you now have no way of defining pipeline-aware parameters, pipeline input surfaces as “$_” in the process block, and as “$input” as an enumerator to all received input in the end block. Note also that our counting example does not need any of these as it only counts the incoming input.

Here is the result:

 
$start = Get-Date
1..1000000 | Count-Stuff
(Get-Date) - $start

1000000

...
TotalMilliseconds : 690,1558 
...
 

Apparently, simple functions produce much less overhead with pipeline operations when a lot of objects are involved.

Of course, the effect becomes significant only when you pipe a high number of objects, but it increases when you do more complex things. For example, the code below produces 5-digit server lists, and using advanced functions, it took on our test system roughly 10 seconds:

function Get-Servername
{
    param
    (
        # pipeline-aware input parameter
        [Parameter(ValueFromPipeline)]
        $InputElement
    )

    process
    {
        "Server{0:n5}" -f $InputElement
    }
}



$start = Get-Date
$list = 1..1000000 | Get-ServerName
(Get-Date) - $start

The very same result was produced by a simple function in under 2 seconds (both in PowerShell 5.1 and 6.1):

function Get-ServerName
{
    process
    {
        "Server{0:n5}" -f $InputElement
    }
}



$start = Get-Date
$list = 1..1000000 | Get-ServerName
(Get-Date) - $start

Twitter This Tip! ReTweet this Tip!