Chapter 13. Text and Regular Expressions

by ps1Mar 20, 2012

Often, you need to deal with plain text information. You may want to read the content from some text file and extract lines that contain a keyword, or you would like to isolate the file name from a file path. So while the object-oriented approach of PowerShell is a great thing, at the end of a day most useful information breaks down to plain text. In this chapter, you’ll learn how to control text information in pretty much any way you want.

Topics Covered:

Defining Text
Composing Text with “-f”
Simple Pattern Recognition
Regular Expressions
Summary

Defining Text

To define text, place it in quotes. If you want PowerShell to treat the text exactly the way you type it, use single quotes. Use double quotes with care because they can transform your text: any variable you place in your text will get resolved, and PowerShell replaces the variable with its context. Have a look:

$text = 'This text may also contain $env:windir `: $(2+2)' 
This text may also contain $env:windir `: $(2+2)

Placed in single quotes, PowerShell returns the text exactly like you entered it. With double quotes, the result is completely different:

$text = "This text may also contain $env:windir `: $(2+2)" 
This text may also contain C:Windows: 4

Special Characters in Text

The most common “special” character you may want to put in text are quotes. Quotes are tricky because you need to make sure that PowerShell does not confuse the quotes inside your text with the quotes that actually surround and define the text. You do have a couple of choices.

If you used single quotes to delimit the text, you can freely use double quotes inside the text, and vice versa:

'The "situation" was really not that bad'
"The 'situation' was really not that bad"

If you must use the same type of quote both as delimiter and inside the text, you can “escape” quotes (remove their special meaning) by either using two consecutive quotes, or by placing a “backtick” character in front of the quote:

'The ''situation'' was really not that bad'
"The ""situation"" was really not that bad"
'The `'situation`' was really not that bad'
"The `"situation`" was really not that bad"

The second most wanted special character you may want to include in text is a new line so you can extend text to more than one line. Again, you have a couple of choices.

When you use double quotes to delimit text, you can insert special control characters like tabs or line breaks by adding a backtick and then a special character where “t” stands for a tab and “n” represents a line break. This technique does require that the text is defined by double quotes:

PS> "One line`nAnother line"
One line
Another line
PS> 'One line`nAnother line'
One line`nAnother line

Escape Sequence	Special Characters
`n	New line
`r	Carriage return
`t	Tabulator
`a	Alarm
`b	Backspace
`’	Single quotation mark
`”	Double quotation mark
`0	Null
“	Backtick character

Table 13.1: Special characters and “escape” sequences for text

Resolving Variables

A rather unusual special character is “$”. PowerShell uses it to define variables that can hold information. Text in double quotes also honors this special character and recognizes variables by resolving them: PowerShell automatically places the variable content into the text:

$name = 'Weltner'
"Hello Mr $name"

This only works for text enclosed in double quotes. If you use single quotes, PowerShell ignores variables and treats “$” as a normal character:

'Hello Mr $name'

At the same time, double quotes protect you from unwanted variable resolving. Take a look at this example:

"My wallet is low on $$$$"

As turns out, $$ is again a variable (it is an internal “automatic” variable maintained by PowerShell which happens to contain the last command token PowerShell processed which is why the result of the previous code line can vary and depends on what you executed right before), so as a rule of thumb, you should start using single quotes by default unless you really want to resolve variables in your text. Resolving text can be enormously handy:

PS> $name = "report"
PS> $extension = "txt"
PS> "$name.$extension"
report.txt

Just make sure you use it with care.

Now, what would you do if you needed to use “$” both to resolve variables and to display literally in the same text? Again, you can use the backtick to escape the “$” and remove its special resolving capability:

"The variable `$env:windir contains ""$env:windir"""

Tip: You can use the “$” resolving capabilities to insert live code results into text. Just place the code you want to evaluate in brackets. To make PowerShell treat these brackets as it would outside of text, place a “$” before:

$result = "One CD has the capacity of $(720MB / 1.44MB) diskettes." 
$result 
One CD has the capacity of 500 diskettes.

“Here-Strings”: Multi-Line Text

As you have seen, you can insert special backtick-key-combinations to insert line breaks and produce multi-line text. While that may work for one or two lines of text, it quickly becomes confusing for the reader and tiresome for the script author to construct strings like that.

A much more readable way is using here-strings. They work like quotes except they use a “@” before and after the quote to indicate that the text extends over multiple lines.

$text = @"
>> Here-Strings can easily stretch over several lines and may also include
>>"quotation marks". Nevertheless, here, too, variables are replaced with 
>> their values: C:Windows, and subexpressions like 4  are likewise replaced
>> with their result. The text will be concluded only if you terminate the 
>> here-string with the termination symbol "@.
>> "@
>>
$text
Here-Strings can easily stretch over several lines and may also include
"quotation marks". Nevertheless, here, too, variables are replaced with 
their values: C:Windows, and subexpressions like 4  are likewise replaced
with their result. The text will be concluded only if you terminate the 
here-string with the termination symbol "@.

Communicating with the User

Maybe you don’t want to hard-code text information in your script at all but instead provide a way for the user to enter information. To accept plain text input use Read-Host:

$text = Read-Host "Enter some text" 
Enter some text: Hello world! 
$text 
Hello world!

Text accepted by Read-Host is treated literally, so it behaves like text enclosed in single quotes. Special characters and variables are not resolved. If you want to resolve the text a user entered, you can however send it to the internal ExpandString() method for post-processing. PowerShell uses this method internally when you define text in double quotes:

# Query and output text entry by user:
$text = Read-Host "Your entry" 
Your entry: $env:windir 
$text 
$env:windir

# Treat entered text as if it were in double quotation marks:
$ExecutionContext.InvokeCommand.ExpandString($text) 
 
C:Windows

You can also request secret information from a user. To mask input, use the switch parameter -asSecureString. This time, however, Read-Host won’t return plain text anymore but instead an encrypted SecureString. So, not only the input was masked with asterisks, the result is just as unreadable. To convert an encrypted SecureString into plain text, you can use some internal .NET methods:

$pwd = Read-Host -asSecureString "Password" 
Password: ************* 
$pwd 
System.Security.SecureString
[Runtime.InteropServices.Marshal]::PtrToStringAuto([Runtime.InteropServices.Marshal]::SecureStringToBSTR($pwd)) 
strictly confidential

Composing Text with “-f”

The –f format operator is the most important PowerShell string operator. You’ll soon be using it to format numeric values for easier reading:

"{0:0} diskettes per CD" -f (720mb/1.44mb) 
500 diskettes per CD

The -f format operator formats a string and requires a string, along with wildcards on its left side and on its right side, that the results are to be inserted into the string instead of the wildcards:

"{0} diskettes per CD" -f (720mb/1.44mb) 
500 diskettes per CD

It is absolutely necessary that exactly the same results are on the right side that are to be used in the string are also on the left side. If you want to just calculate a result, then the calculation should be in parentheses. As is generally true in PowerShell, the parentheses ensure that the enclosed statement is evaluated first and separately and that subsequently, the result is processed instead of the parentheses. Without parentheses, -f would report an error:

"{0} diskettes per CD" -f 720mb/1.44mb 
Bad numeric constant: 754974720 diskettes per CD.
At line:1 char:33
+ "{0} diskettes per CD" -f 720mb/1 <<<< .44mb

You may use as many wildcard characters as you wish. The number in the braces states which value will appear later in the wildcard and in which order:

"{0} {3} at {2}MB fit into one CD at {1}MB" -f (720mb/1.44mb), 1.44, 720, "diskettes" 
500 diskettes at 720MB fit into one CD at 1.44MB

Setting Numeric Formats

The –f format operator can insert values into text as well as format the values. Every wildcard used has the following formal structure: {index[,alignment][:format]}:

Index: This number indicates which value is to be used for this wildcard. For example, you could use several wildcards with the same index if you want to output one and the same value several times, or in various display formats. The index number is the only obligatory specification. The other two specifications are voluntary.
Alignment: Positive or negative numbers can be specified that determine whether the value is right justified (positive number) or left justified (negative number). The number states the desired width. If the value is wider than the specified width, the specified width will be ignored. However, if the value is narrower than the specified width, the width will be filled with blank characters. This allows columns to be set flush.
Format: The value can be formatted in very different ways. Here you can use the relevant format name to specify the format you wish. You’ll find an overview of available formats below.

Formatting statements are case sensitive in different ways than what is usual in PowerShell. You can see how large the differences can be when you format dates:

# Formatting with a small letter d:
"Date: {0:d}" -f (Get-Date) 
Date: 08/28/2007

# Formatting with a large letter D:
"Date: {0:D}" -f (Get-Date) 
Date: Tuesday, August 28, 2007

Symbol	Type	Call	Result
#	Digit placeholder	“{0:(#).##}” -f $value	(1000000)
%	Percentage	“{0:0%}” -f $value	100000000%
,	Thousands separator	“{0:0,0}” -f $value	1,000,000
,.	Integral multiple of 1,000	“{0:0,.} ” -f $value	1000
.	Decimal point	“{0:0.0}” -f $value	1000000.0
0	0 placeholder	“{0:00.0000}” -f $value	1000000.0000
c	Currency	“{0:c}” -f $value	1,000,000.00 €
d	Decimal	“{0:d}” -f $value	1000000
e	Scientific notation	“{0:e}” -f $value	1.000000e+006
e	Exponent wildcard	“{0:00e+0}” -f $value	10e+5
f	Fixed point	“{0:f}” -f $value	1000000.00
g	General	“{0:g}” -f $value	1000000
n	Thousands separator	“{0:n}” -f $value	1,000,000.00
x	Hexadecimal	“0x{0:x4}” -f $value	0x4240

Table 13.3: Formatting numbers

Using the formats in Table 13.3, you can format numbers quickly and comfortably. No need for you to squint your eyes any longer trying to decipher whether a number is a million or 10 million:

10000000000
"{0:N0}" -f 10000000000
10,000,000,000

There’s also a very wide range of time and date formats. The relevant formats are listed in Table 13.4 and their operation is shown in the following lines:

$date= Get-Date
foreach ($format in "d","D","f","F","g","G","m","r","s","t","T","u","U","y",`
"dddd, MMMM dd yyyy","M/yy","dd-MM-yy") {
"DATE with $format : {0}" -f $date.ToString($format)
}
DATE with d : 10/15/2007
DATE with D : Monday, 15 October, 2007
DATE with f : Monday, 15 October, 2007 02:17 PM
DATE with F : Monday, 15 October, 2007 02:17:02 PM
DATE with g : 10/15/2007 02:17
DATE with G : 10/15/2007 02:17:02
DATE with m : October 15
DATE with r : Mon, 15 Oct 2007 02:17:02 GMT
DATE with s : 2007-10-15T02:17:02
DATE with t : 02:17 PM
DATE with T : 02:17:02 PM
DATE with u : 2007-10-15 02:17:02Z
DATE with U : Monday, 15 October, 2007 00:17:02
DATE with y : October, 2007
DATE with dddd, MMMM dd yyyy : Monday, October 15 2007
DATE with M/yy : 10/07
DATE with dd-MM-yy : 15-10-07

Symbol	Type	Call	Result
d	Short date format	“{0:d}” -f $value	09/07/2007
D	Long date format	“{0:D}” -f $value	Friday, September 7, 2007
t	Short time format	“{0:t}” -f $value	10:53 AM
T	Long time format	“{0:T}” -f $value	10:53:56 AM
f	Full date and time (short)	“{0:f}” -f $value	Friday, September 7, 2007 10:53 AM
F	Full date and time (long)	“{0:F}” -f $value	Friday, September 7, 2007 10:53:56 AM
g	Standard date (short)	“{0:g}” -f $value	09/07/2007 10:53 AM
G	Standard date (long)	“{0:G}” -f $value	09/07/2007 10:53:56 AM
M	Day of month	“{0:M}” -f $value	September 07
r	RFC1123 date format	“{0:r}” -f $value	Fri, 07 Sep 2007 10:53:56 GMT
s	Sortable date format	“{0:s}” -f $value	2007-09-07T10:53:56
u	Universally sortable date format	“{0:u}” -f $value	2007-09-07 10:53:56Z
U	Universally sortable GMT date format	“{0:U}” -f $value	Friday, September 7, 2007 08:53:56
Y	Year/month format pattern	“{0:Y}” -f $value	September 2007

Table 13.4: Formatting date values

If you want to find out which type of formatting options are supported, you need only look for .NET types that support the toString() method:

[AppDomain]::CurrentDomain.GetAssemblies() | ForEach-Object {
   $_.GetExportedTypes() | Where-Object {! $_.IsSubclassOf([System.Enum])}
  } | ForEach-Object { 
     $Methods = $_.GetMethods() | Where-Object {$_.Name -eq "tostring"} |%{"$_"}
     if ($methods -eq "System.String ToString(System.String)") {
           $_.FullName
     }
   }
System.Enum
System.DateTime
System.Byte
System.Convert
System.Decimal
System.Double
System.Guid
System.Int16
System.Int32
System.Int64
System.IntPtr
System.SByte
System.Single
System.UInt16
System.UInt32
System.UInt64
Microsoft.PowerShell.Commands.MatchInfo

For example, among the supported data types is the “globally unique identifier” System.Guid. Because you’ll frequently require GUID, which is clearly understood worldwide, here’s a brief example showing how to create and format a GUID:

$guid = [GUID]::NewGUID()
foreach ($format in "N","D","B","P") {"GUID with $format : {0}" -f $GUID.ToString($format)}
GUID with N : 0c4d2c4c8af84d198b698e57c1aee780
GUID with D : 0c4d2c4c-8af8-4d19-8b69-8e57c1aee780
GUID with B : {0c4d2c4c-8af8-4d19-8b69-8e57c1aee780}
GUID with P : (0c4d2c4c-8af8-4d19-8b69-8e57c1aee780)

Symbol	Type	Call	Result
dd	Day of month	“{0:dd}” -f $value	07
ddd	Abbreviated name of day	“{0:ddd}” -f $value	Fri
dddd	Full name of day	“{0:dddd}” -f $value	Friday
gg	Era	“{0:gg}” -f $value	A. D.
hh	Hours from 01 to 12	“{0:hh}” -f $value	10
HH	Hours from 0 to 23	“{0:HH}” -f $value	10
mm	Minute	“{0:mm}” -f $value	53
MM	Month	“{0:MM}” -f $value	09
MMM	Abbreviated month name	“{0:MMM}” -f $value	Sep
MMMM	Full month name	“{0:MMMM}” -f $value	September
ss	Second	“{0:ss}” -f $value	56
tt	AM or PM	“{0:tt}” -f $value
yy	Year in two digits	“{0:yy}” -f $value	07
yyyy	Year in four digits	“{0:YY}” -f $value	2007
zz	Time zone including leading zero	“{0:zz}” -f $value	+02
zzz	Time zone in hours and minutes	“{0:zzz}” -f $value	+02:00

Table 13.5: Customized date value formats

Outputting Values in Tabular Form: Fixed Width

To display the output of several lines in a fixed-width font and align them one below the other, each column of the output must have a fixed width. A format operator can set outputs to a fixed width.

In the following example, Dir returns a directory listing, from which a subsequent loop outputs file names and file sizes. Because file names and sizes vary, the result is ragged right and hard to read:

dir | ForEach-Object { "$($_.name) = $($_.Length) Bytes" }
history.csv = 307 Bytes
info.txt = 8562 Bytes
layout.lxy = 1280 Bytes
list.txt = 164186 Bytes
p1.nrproj = 5808 Bytes
ping.bat = 116 Bytes
SilentlyContinue = 0 Bytes

The following result with fixed column widths is far more legible. To set widths, add a comma to the sequential number of the wildcard and after it specify the number of characters available to the wildcard. Positive numbers will set values to right alignment, negative numbers to left alignment:

dir | ForEach-Object { "{0,-20} = {1,10} Bytes" -f $_.name, $_.Length }   
history.csv          =        307 Bytes
info.txt             =       8562 Bytes
layout.lxy           =       1280 Bytes
list.txt             =     164186 Bytes
p1.nrproj            =       5808 Bytes
ping.bat             =        116 Bytes
SilentlyContinue     =          0 Bytes

More options are offered by special text commands that PowerShell furnishes from three different areas:

String operators: PowerShell includes a number of string operators for general text tasks which you can use to replace text and to compare text (Table 13.2).
Dynamic methods: the String data type, which saves text, includes its own set of text statements that you can use to search through, dismantle, reassemble, and modify text in diverse ways (Table 13.6).
Static methods: finally, the String .NET class includes static methods bound to no particular text.

String Operators

All string operators work in basically the same way: they take data from the left and the right and then do something with them. The –replace operator for example takes a text and some replacement text and then replaces the replacement text in the original text:

"Hello Carl" -replace "Carl", "Eddie" 
Hello Eddie

The format operator -f works in exactly the same way. You heard about this operator at the beginning of this chapter. It takes a static string template with placeholders and an array with values, and then fills the values into the placeholders.

Two additional important string operators are -join and -split. They can be used to automatically join together an array or to split a text into an array of substrings.

Let’s say you want to output information that really is an array of information. When you query WMI for your operating system to identify the installed MUI languages, the result can be an array (when more than one language is installed). So, this line produces an incomplete output:

You would have to join the array to one string first using -join. Here is how:

PS> $mui = Get-WmiObject Win32_OperatingSystem | Select-Object -ExpandProperty MuiLanguages
PS> 'Installed MUI-Languages: {0}' -f ($mui -join ', ')
Installed MUI-Languages: de-DE, en-US

The -split operator does the exact opposite. It takes a text and a split pattern, and each time it discovers the split pattern, it splits the original text in chunks and returns an array. This example illustrates how you can use -split to parse a path:

PS> ('c:testfolderfile.txt' -split '\')[-1]
file.txt

Note that -replace expects the pattern to be a regular expression, so if your pattern is composed of reserved characters (like the backslash), you have to escape it. Note also that the Split-Path cmdlet can split paths more easily.

To auto-escape a simple text pattern, use .NET methods. The Escape() method takes a simple text pattern and returns the escaped version that you can use wherever a regular expression is needed:

PS> [RegEx]::Escape('some.pattern')
some.\pattern

String Object Methods

You know from Chapter 6 that PowerShell represents everything as objects and that every object contains a set of instructions known as methods. Text is stored in a String object, and a string object has built-in methods for manipulating the text information. Simply add a “.” and then the method you need:

$path = "c:testExample.bat" 
$path.Substring( $path.LastIndexOf(".")+1 ) 
bat

Another approach uses the dot as separator and Split() to split up the path into an array. The result is that the last element of the array (-1 index number) will include the file extension:

$path.Split(".")[-1] 
bat

Function	Description	Example
CompareTo()	Compares one string to another	(“Hello”).CompareTo(“Hello”)
Contains()	Returns “True” if a specified comparison string is in a string or if the comparison string is empty	(“Hello”).Contains(“ll”)
CopyTo()	Copies part of a string to another string	$a = (“Hello World”).toCharArray() (“User!”).CopyTo(0, $a, 6, 5) $a
EndsWith()	Tests whether the string ends with a specified string	(“Hello”).EndsWith(“lo”)
Equals()	Tests whether one string is identical to another string	(“Hello”).Equals($a)
IndexOf()	Returns the index of the first occurrence of a comparison string	(“Hello”).IndexOf(“l”)
IndexOfAny()	Returns the index of the first occurrence of any character in a comparison string	(“Hello”).IndexOfAny(“loe”)
Insert()	Inserts new string at a specified index in an existing string	(“Hello World”).Insert(6, “brave “)
GetEnumerator()	Retrieves a new object that can enumerate all characters of a string	(“Hello”).GetEnumerator()
LastIndexOf()	Finds the index of the last occurrence of a specified character	(“Hello”).LastIndexOf(“l”)
LastIndexOfAny()	Finds the index of the last occurrence of any character of a specified string	(“Hello”).LastIndexOfAny(“loe”)
PadLeft()	Pads a string to a specified length and adds blank characters to the left (right-aligned string)	(“Hello”).PadLeft(10)
PadRight()	Pads string to a specified length and adds blank characters to the right (left-aligned string)	(“Hello”).PadRight(10) + “World!”
Remove()	Removes any requested number of characters starting from a specified position	(“Hello World”).Remove(5,6)
Replace()	Replaces a character with another character	(“Hello World”).Replace(“l”, “x”)
Split()	Converts a string with specified splitting points into an array	(“Hello World”).Split(“l”)
StartsWith()	Tests whether a string begins with a specified character	(“Hello World”).StartsWith(“He”)
Substring()	Extracts characters from a string	(“Hello World”).Substring(4, 3)
ToCharArray()	Converts a string into a character array	(“Hello World”).toCharArray()
ToLower()	Converts a string to lowercase	(“Hello World”).toLower()
ToLowerInvariant()	Converts a string to lowercase using casing rules of the invariant language	(“Hello World”).toLowerInvariant()
ToUpper()	Converts a string to uppercase	(“Hello World”).toUpper()
ToUpperInvariant()	Converts a string to uppercase using casing rules of the invariant language	(“Hello World”).ToUpperInvariant()
Trim()	Removes blank characters to the right and left	(” Hello “).Trim() + “World”
TrimEnd()	Removes blank characters to the right	(” Hello “).TrimEnd() + “World”
TrimStart()	Removes blank characters to the left	(” Hello “).TrimStart() + “World”
Chars()	Provides a character at the specified position	(“Hello”).Chars(0)

Table 13.6: The methods of a string object

Analyzing Methods: Split() as Example

You already know in detail from Chapter 6 how to use Get-Member to find out which methods an object contains and how to invoke them. Just as a quick refresher, let’s look again at an example of the Split() method to see how it works.

("something" | Get-Member Split).definition 
System.String[] Split(Params Char[] separator), System.String[] Split(Char[] separator,
 Int32 count), System.String[] Split(Char[] separator, StringSplitOptions options),
 System.String[] Split(Char[] separator, Int32 count, StringSplitOptions options),
 System.String[] Split(String[] separator, StringSplitOptions options),
 System.String[] Split(String[] separator, Int32 count, StringSplitOptions options)

Definition gets output, but it isn’t very easy to read. Because Definition is also a string object, you can use methods from Table 13.6, including Replace(), to insert a line break where appropriate. That makes the result much more understandable:

("something" | Get-Member Split).Definition.Replace("), ", ")`n")
System.String[] Split(Params Char[] separator)
System.String[] Split(Char[] separator, Int32 count)
System.String[] Split(Char[] separator, StringSplitOptions options)
System.String[] Split(Char[] separator, Int32 count, StringSplitOptions options)
System.String[] Split(String[] separator, StringSplitOptions options)
System.String[] Split(String[] separator, Int32 count, StringSplitOptions options)

There are six different ways to invoke Split(). In simple cases, you might use Split() with only one argument, Split(), you will expect a character array and will use every single character as a possible splitting separator. That’s important because it means that you may use several separators at once:

"a,b;c,d;e;f".Split(",;") 
a
b
c
d
e
f

If the splitting separator itself consists of several characters, then it has got to be a string and not a single Char character. There are only two signatures that meet this condition:

System.String[] Split(String[] separator, StringSplitOptions options)
System.String[] Split(String[] separator, Int32 count, StringSplitOptions options)

You must make sure that you pass data types to the signature that is exactly right for it to be able to use a particular signature. If you want to use the first signature, the first argument must be of the String[] type and the second argument of the StringSplitOptions type. The simplest way for you to meet this requirement is by assigning arguments first to a strongly typed variable. Create the variable of exactly the same type that the signature requires:

# Create a variable of the [StringSplitOptions] type:
[StringSplitOptions]$option = "None"

# Create a variable of the String[] type:
[string[]]$separator = ",;"
# Invoke Split with the wished signature and use a two-character long separator:
("a,b;c,;d,e;f,;g").Split($separator, $option)
a,
d,
g

Split() in fact now uses a separator consisting of several characters. It splits the string only at the points where it finds precisely the characters that were specified. There does remain the question of how do you know it is necessary to assign the value “None” to the StringSplitOptions data type. The simple answer is: you don’t know and it isn’t necessary to know. If you assign a value to an unknown data type that can’t handle the value, the data type will automatically notify you of all valid values:

 [StringSplitOptions]$option = "werner wallbach"
Cannot convert value "werner wallbach" to type "System.StringSplitOptions" due to invalid
 enumeration values. Specify one of the following enumeration values and try again.
 The possible enumeration values are "None, RemoveEmptyEntries".
At line:1 char:28
+ [StringSplitOptions]$option  <<<< = "werner wallbach"

By now it should be clear to you what the purpose is of the given valid values and their names. For example, what was RemoveEmptyEntries() able to accomplish? If Split() runs into several separators following each other, empty array elements will be the consequence. RemoveEmptyEntries() deletes such empty entries. You could use it to remove redundant blank characters from a text:

[StringSplitOptions]$option = "RemoveEmptyEntries"
"This   text   has   too   much   whitespace".Split(" ", $option)
This
text
has
too
much
whitespace

Now all you need is just a method that can convert the elements of an array back into text. The method is called Join()String object but in the String class.

Simple Pattern Recognition

Recognizing patterns is a frequent task that is necessary for verifying user entries, such as to determine whether a user has given a valid network ID or valid e-mail address.

A simple form of wildcards was invented for the file system many years ago and it still works today. In fact, you’ve probably used it before in one form or another:

# List all files in the current directory that have the txt file extension:
Dir *.txt 

# List all files in the Windows directory that begin with "n" or "w":
dir $env:windir[nw]*.* 

# List all files whose file extensions begin with "t" and which are exactly 3 characters long:
Dir *.t?? 

# List all files that end in one of the letters from "e" to "z"
dir *[e-z].*

Wildcard	Description	Example
*	Any number of any character (including no characters at all)	Dir .txt*
?	Exactly one of any characters	Dir .??t*
[xyz]	One of specified characters	Dir [abc].
[x-z]	One of the characters in the specified area	Dir [p-z].

Table 13.7: Using simple placeholders

The placeholders in Table 13.7 work in the file system, but also with string comparisons like -like and -notlike. For example, if you want to verify whether a user has given a valid IP address, you could do so in the following way:

$ip = Read-Host "IP address"
if ($ip -like "*.*.*.*") { "valid" } else { "invalid" }

If you want to verify whether a valid e-mail address was entered, you could check the pattern like this:

$email = "tobias.weltner@powershell.de"
$email -like "*.*@*.*"

These simple patterns are not very exact, though:

# Wildcards are appropriate only for very simple pattern recognition and leave room for erroneous entries:
$ip = "300.werner.6666." 
if ($ip -like "*.*.*.*") { "valid" } else { "invalid" }  
valid

# The following invalid e-mail address was not identified as false:
$email = ".@." 
$email -like "*.*@*.*" 
True

Regular Expressions

Use regular expressions for more accurate pattern recognition. Regular expressions offer highly specific wildcard characters; that’s why they can describe patterns in much greater detail. For the very same reason, however, regular expressions are also much more complicated.

Describing Patterns

Using the regular expression elements listed in Table 13.11, you can describe patterns with much greater precision. These elements are grouped into three categories:

Placeholder: The placeholder represents a specific type of data, for example a character or a digit.
Quantifier: Allows you to determine how often a placeholder occurs in a pattern. You could, for example, define a 3-digit number or a 6-character-word.
Anchor: Allows you to determine whether a pattern is bound to a specific boundary. You could define a pattern that needs to be a separate word or that needs to begin at the beginning of the text.

The pattern represented by a regular expression may consist of four different character types:

Literal characters like “abc” that exactly matches the “abc” string.
Masked or “escaped” characters with special meanings in regular expressions; when preceded by “”, they are understood as literal characters: “[test]” looks for the “[test]” string. The following characters have special meanings and for this reason must be masked if used literally: “. ^ $ * + ? { [ ] | ( )”.
Pre-defined wildcard characters that represent a particular character category and work like placeholders. For example, “d” represents any number from 0 to 9.
Custom wildcard characters: They consist of square brackets, within which the characters are specified that the wildcard represents. If you want to use any character except for the specified characters, use “^” as the first character in the square brackets. For example, the placeholder “[^f-h]” stands for all characters except for “f”, “g”, and “h”.

Element	Description
.	Exactly one character of any kind except for a line break (equivalent to [^n])
[^abc]	All characters except for those specified in brackets
[^a-z]	All characters except for those in the range specified in the brackets
[abc]	One of the characters specified in brackets
[a-z]	Any character in the range indicated in brackets
a	Bell alarm (ASCII 7)
c	Any character allowed in an XML name
cA-cZ	Control+A to Control+Z, equivalent to ASCII 0 to ASCII 26
d	A number (equivalent to [0-9])
D	Any character except for numbers
e	Escape (ASCII 9)
f	Form feed (ASCII 15)
n	New line
r	Carriage return
s	Any whitespace character like a blank character, tab, or line break
S	Any character except for a blank character, tab, or line break
t	Tab character
uFFFF	Unicode character with the hexadecimal code FFFF. For example, the Euro symbol has the code 20AC
v	Vertical tab (ASCII 11)
w	Letter, digit, or underline
W	Any character except for letters
xnn	Particular character, where nn specifies the hexadecimal ASCII code
.*	Any number of any character (including no characters at all)

Table 13.8: Placeholders for characters

Quantifiers

Every pattern listed in Table 13.8 represents exactly one instance of that kind. Using quantifiers, you can tell how many instances are parts of your pattern. For example, “d{1,3}” represents a number occurring one to three times for a one-to-three digit number.

Element	Description
*	Preceding expression is not matched or matched once or several times (matches as much as possible)
*?	Preceding expression is not matched or matched once or several times (matches as little as possible)
.*	Any number of any character (including no characters at all)
?	Preceding expression is not matched or matched once (matches as much as possible)
??	Preceding expression is not matched or matched once (matches as little as possible)
{n,}	n or more matches
{n,m}	Inclusive matches between n and m
{n}	Exactly n matches
+	Preceding expression is matched once

Table 13.9: Quantifiers for patterns

Anchors

Anchors determine whether a pattern has to match a certain boundary. For example, the regular expression “bd{1,3}” finds numbers only up to three digits if these turn up separately in a string. The number “123” in the string “Bart123” would not qualify.

Elements	Description
$	Matches at end of a string (Z is less ambiguous for multi-line texts)
A	Matches at beginning of a string, including multi-line texts
b	Matches on word boundary (first or last characters in words)
B	Must not match on word boundary
Z	Must match at end of string, including multi-line texts
^	Must match at beginning of a string (A is less ambiguous for multi-line texts)

Table 13.10: Anchor boundaries

Recognizing IP Addresses

Patterns such as an IP address can be very precisely described by regular expressions. Usually, you would use a combination of characters and quantifiers to specify which characters may occur in a string and how often:

$ip = "10.10.10.10" 
$ip -match "bd{1,3}.d{1,3}.d{1,3}.d{1,3}b" 
True
$ip = "a.10.10.10" 
$ip -match "bd{1,3}.d{1,3}.d{1,3}.d{1,3}b" 
False
$ip = "1000.10.10.10" 
$ip -match "bd{1,3}.d{1,3}.d{1,3}.d{1,3}b" 
False

The pattern is described here as four numbers (char: d) between one and three digits (using the quantifier {1,3}) and anchored on word boundaries (using the anchor b), meaning that it is surrounded by white space like blank characters, tabs, or line breaks. Checking is far from perfect since it is not verified whether the numbers really do lie in the permitted number range from 0 to 255.

# There still are entries incorrectly identified as valid IP addresses:
$ip = "300.400.500.999" 
$ip -match "bd{1,3}.d{1,3}.d{1,3}.d{1,3}b" 
True

Validating E-Mail Addresses

If you’d like to verify whether a user has given a valid e-mail address, use the following regular expression:

$email = "test@somewhere.com"
$email -match "b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b" 
True
$email = ".@." 
$email -match "b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b" 
False

Whenever you look for an expression that occurs as a single “word” in text, delimit your regular expression by word boundaries (anchor: b). The regular expression will then know you’re interested only in those passages that are demarcated from the rest of the text by white space like blank characters, tabs, or line breaks.

The regular expression subsequently specifies which characters may be included in an e-mail address. Permissible characters are in square brackets and consist of “ranges” (for example, “A-Z0-9”) and single characters (such as “._%+-“). The “+” behind the square brackets is a quantifier and means that at least one of the given characters must be present. However, you can also stipulate as many more characters as you wish.

Following this is “@” and, if you like, after it a text again having the same characters as those in front of “@”. A dot (.) in the e-mail address follows. This dot is introduced with a “” character because the dot actually has a different meaning in regular expressions if it isn’t within square brackets. The backslash ensures that the regular expression understands the dot behind it literally.

After the dot is the domain identifier, which may consist solely of letters ([A-Z]). A quantifier ({2,4}) again follows the square brackets. It specifies that the domain identifier may consist of at least two and at most four of the given characters.

However, this regular expression still has one flaw. While it does verify whether a valid e-mail address is in the text somewhere, there could be another text before or after it:

$email = "Email please to test@somewhere.com and reply!" 
$email -match "b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b" 
True

Because of “b”, when your regular expression searches for a pattern somewhere in the text, it only takes into account word boundaries. If you prefer to check whether the entire text corresponds to an authentic e-mail, use the elements for sentence beginnings (anchor: “^”) and endings (anchor: “$”) instead of word boundaries.

$email -match "^[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}$"

Simultaneous Searches for Different Terms

Sometimes search terms are ambiguous because there may be several ways to write them. You can use the “?” quantifier to mark parts of the search term as optional. In simple cases put a “?” after an optional character. Then the character in front of “?” may, but doesn’t have to, turn up in the search term:

"color" -match "colou?r" 
True
"colour" -match "colou?r" 
True

The “?” character here doesn’t represent any character at all, as you might expect after using simple wildcards. For regular expressions, “?” is a quantifier and always specifies how often a character or expression in front of it may occur. In the example, therefore, “u?” ensures that the letter “u” may, but not necessarily, be in the specified location in the pattern. Other quantifiers are “*” (may also match more than one character) and “+” (must match characters at least once).

If you prefer to mark more than one character as optional, put the character in a sub-expression, which are placed in parentheses. The following example recognizes both the month designator “Nov” and “November”:

"Nov" -match "bNov(ember)?b" 
True
"November" -match "bNov(ember)?b" 
True

If you’d rather use several alternative search terms, use the OR character “|”:

"Bob and Ted" -match "Alice|Bob" 
True

And if you want to mix alternative search terms with fixed text, use sub-expressions again:

# finds "and Bob":
"Peter and Bob" -match "and (Bob|Willy)" 
True

# does not find "and Bob":
"Bob and Peter" -match "and (Bob|Willy)" 
False

Case Sensitivity

In keeping with customary PowerShell practice, the -match operator is case insensitive. Use the operator -cmatch as alternative if you’d prefer case sensitivity:

# -match is case insensitive:
"hello" -match "heLLO" 
True

# -cmatch is case sensitive:
"hello" -cmatch "heLLO" 
False

If you want case sensitivity in only some pattern segments, use –match. Also, specify in your regular expression which text segments are case sensitive and which are insensitive. Anything following the “(?i)” construct is case insensitive. Conversely, anything following “(?-i)” is case sensitive. This explains why the word “test” in the below example is recognized only if its last two characters are lowercase, while case sensitivity has no importance for the first two characters:

"TEst" -match "(?i)te(?-i)st" 
True
"TEST" -match "(?i)te(?-i)st" 
False

If you use a .NET framework RegEx object instead of –match, it will work case-sensitive by default, much like –cmatch. If you prefer case insensitivity, either use the above construct to specify the option (i?) in your regular expression or submit extra options to the Matches() method (which is a lot more work):

 [regex]::matches("test", "TEST", "IgnoreCase")

Element	Description	Category
(xyz)	Sub-expression
\|	Alternation construct	Selection
	When followed by a character, the character is not recognized as a formatting character but as a literal character	Escape
x?	Changes the x quantifier into a “lazy” quantifier	Option
(?xyz)	Activates of deactivates special modes, among others, case sensitivity	Option
x+	Turns the x quantifier into a “greedy” quantifier	Option
?:	Does not backtrack	Reference
?<name>	Specifies name for back references	Reference

Table 13.11: Regular expression elements

Of course, a regular expression can perform any number of detailed checks, such as verifying whether numbers in an IP address lie within the permissible range from 0 to 255. The problem is that this makes regular expressions long and hard to understand. Fortunately, you generally won’t need to invest much time in learning complex regular expressions like the ones coming up. It’s enough to know which regular expression to use for a particular pattern. Regular expressions for nearly all standard patterns can be downloaded from the Internet. In the following example, we’ll look more closely at a complex regular expression that evidently is entirely made up of the conventional elements listed in Table 13.11:

$ip = "300.400.500.999" 
$ip -match "b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)b"  
False

The expression validates only expressions running into word boundaries (the anchor is b). The following sub-expression defines every single number:

(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

The construct ?: is optional and enhances speed. After it come three alternatively permitted number formats separated by the alternation construct “|”. 25[0-5] is a number from 250 through 255. 2[0-4][0-9] is a number from 200 through 249. Finally, [01]?[0-9][0-9]? is a number from 0-9 or 00-99 or 100-199. The quantifier “?” ensures that the preceding pattern must be included. The result is that the sub-expression describes numbers from 0 through 255. An IP address consists of four such numbers. A dot always follows the first three numbers. For this reason, the following expression includes a definition of the number:

(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}

A dot, (.), is appended to the number. This construct is supposed to be present three times ({3}). When the fourth number is also appended, the regular expression is complete. You have learned to create sub-expressions (by using parentheses) and how to iterate sub-expressions (by indicating the number of iterations in braces after the sub-expression), so you should now be able to shorten the first used IP address regular expression:

$ip = "10.10.10.10" 
$ip -match "bd{1,3}.d{1,3}.d{1,3}.d{1,3}b" 
True
$ip -match "b(?:d{1,3}.){3}d{1,3}b" 
True

Finding Information in Text

Regular expressions can recognize patterns. They can also filter data matching certain patterns from text. So, regular expressions are perfect for parsing raw data.

$rawtext = "If it interests you, my e-mail address is tobias@powershell.com." 

# Simple pattern recognition:
$rawtext -match "b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b" 
True

# Reading data matching the pattern from raw text:
$matches 
Name                           Value
----                           -----
0                              tobias@powershell.com
$matches[0] 
tobias@powershell.com

Does that also work for more than one e-mail addresses in text? Unfortunately, no. The –match operator finds only the first matching expression. So, if you want to find more than one occurrence of a pattern in raw text, you have to switch over to the RegEx object underlying the –match operator and use it directly.

Since the RegEx object is case-sensitive by default, put the “(?i)” option before the regular expression to make it work like -match.

# A raw text contains several e-mail addresses. –match finds the first one only:
$rawtext = "test@test.com sent an e-mail that was forwarded to spam@junk.de." 
$rawtext -match "b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b" 
True
$matches 
Name                           Value
----                           -----
0                              test@test.com

# A RegEx object can find any pattern but is case sensitive by default:
$regex = [regex]"(?i)b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}b" 
$regex.Matches($rawtext) 
Groups   : {test@test.com}
Success  : True
Captures : {test@test.com}
Index    : 4
Length   : 13
Value    : test@test.com

Groups   : {spam@junk.de}
Success  : True
Captures : {spam@junk.de}
Index    : 42
Length   : 13
Value    : spam@junk.de 

# Limit result to e-mail addresses:
$regex.Matches($rawtext) | Select-Object -Property Value 
Value
-----
test@test.com
spam@junk.de

# Continue processing e-mail addresses:
$regex.Matches($rawtext) | ForEach-Object { "found: $($_.Value)" } 
found: test@test.com
found: spam@junk.de

Searching for Several Keywords

You can use the alternation construct “|” to search for a group of keywords, and then find out which keyword was actually found in the string:

"Set a=1" -match "Get|GetValue|Set|SetValue" 
True
$matches 
Name                           Value
----                           -----
0                              Set

$matches tells you which keyword actually occurs in the string. But note the order of keywords in your regular expression—it’s crucial because the first matching keyword is the one selected. In this example, the result would be incorrect:

"SetValue a=1" -match "Get|GetValue|Set|SetValue" 
True
$matches[0] 
Set

Either change the order of keywords so that longer keywords are checked before shorter ones …:

"SetValue a=1" -match "GetValue|Get|SetValue|Set" 
True
$matches[0] 
SetValue

… or make sure that your regular expression is precisely formulated, and remember that you’re actually searching for single words. Insert word boundaries into your regular expression so that sequential order no longer plays a role:

"SetValue a=1" -match "b(Get|GetValue|Set|SetValue)b" 
True
$matches[0] 
SetValue

It’s true here, too, that -match finds only the first match. If your raw text has several occurrences of the keyword, use a RegEx object again:

$regex = [regex]"b(Get|GetValue|Set|SetValue)b" 
$regex.Matches("Set a=1; GetValue a; SetValue b=12") 
Groups   : {Set, Set}
Success  : True
Captures : {Set}
Index    : 0
Length   : 3
Value    : Set

Groups   : {GetValue, GetValue}
Success  : True
Captures : {GetValue}
Index    : 9
Length   : 8
Value    : GetValue

Groups   : {SetValue, SetValue}
Success  : True
Captures : {SetValue}
Index    : 21
Length   : 8
Value    : SetValue

Forming Groups

A raw text line is often a heaping trove of useful data. You can use parentheses to collect this data in sub-expressions so that it can be evaluated separately later. The basic principle is that all the data that you want to find in a pattern should be wrapped in parentheses because $matches will return the results of these sub-expressions as independent elements. For example, if a text line contains a date first, then text, and if both are separated by tabs, you could describe the pattern like this:

# Defining pattern: two characters separated by a tab
$pattern = "(.*)t(.*)" 

# Generate example line with tab character
$line = "12/01/2009`tDescription" 

# Use regular expression to parse line:
$line -match $pattern 
True

# Show result:
$matches 
Name                           Value
----                           -----
2                              Description
1                              12/01/2009
0                              12/01/2009    Description
$matches[1] 
12/01/2009
$matches[2] 
Description

When you use sub-expressions, $matches will contain the entire searched pattern in the first array element named “0”. Sub-expressions defined in parentheses follow in additional elements. To make them easier to read and understand, you can assign sub-expressions their own names and later use the names to call results. To assign names to a sub-expression, type ? in parentheses for the first statement:

# Assign subexpressions their own names:
$pattern = "(?<Date>.*)t(?<Text>.*)" 

# Generate example line with tab character:
$line = "12/01/2009`tDescription" 

# Use a regular expression to parse line:
$line -match $pattern 
True

# Show result:
$matches 
Name                           Value
----                           -----
Text                           Description
Date                           12/01/2009
0                              12/01/2009    Description
$matches.Date 
12/01/2009
$matches.Text 
Description

Each result retrieved by $matches for each sub-expression naturally requires storage space. If you don’t need the results, discard them to increase the speed of your regular expression. To do so, type “?:” as the first statement in your sub-expression:

# Don't return a result for the second subexpression:
$pattern = "(?<Date>.*)t(?:.*)" 

# Generate example line with tab character:
$line = "12/01/2009`tDescription" 

# Use a regular expression to parse line:
$line -match $pattern 
True

# No more results will be returned for the second subexpression:
$matches 
Name                           Value
----                           -----
Date                          12/01/2009
0                             12/01/2009    Description

Greedy or Lazy? Shortest or Longest Possible Result

Assume that you would like to evaluate month specifications in a logging file, but the months are not all specified in the same way. Sometimes you use the short form, other times the long form of the month name is used. As you’ve seen, that’s no problem for regular expressions, because sub-expressions allow parts of a keyword to be declared optional:

"Feb" -match "Feb(ruary)?" 
True
$matches[0] 
Feb
"February" -match "Feb(ruary)?" 
True
$matches[0] 
February

In both cases, the regular expression recognizes the month, but returns different results in $matches. By default, the regular expression is “greedy” and returns the longest possible match. If the text is “February,” then the expression will search for a match starting with “Feb” and then continue searching “greedily” to check whether even more characters match the pattern. If they do, the entire (detailed) text is reported back: February.

If your main concern is just standardizing the names of months, you would probably prefer getting back the shortest possible text: Feb. To switch regular expressions to work lazy (returning the shortest possible match), add “?” to the expression. “Feb(ruary)??” now stands for a pattern that starts with “Feb”, followed by zero or one occurance of “ruary” (Quantifier “?”), and returning only the shortest possible match (which is turned on by the second “?”).

"Feb" -match "Feb(ruary)??" 
True
$matches[0] 
Feb
"February" -match "Feb(ruary)??" 
True
$matches[0] 
Feb

Finding String Segments

Our last example, which locates text segments, shows how you can use the elements listed in Table 13.11 to easily gather surprising search results. If you type two words, the regular expression will retrieve the text segment between the two words if at least one word is, and not more than six other words are, in between the two words. This example shows how complex (and powerful) regular expressions can get. If you think that’s cool, you should grab yourself a book on regular expressions and dive deeper:

"Find word segments from start to end" -match "bstartW+(?:w+W+){1,6}?endb"
True
$matches[0] 
Name                           Value
----                           -----
0                              start to end

Replacing a String

You already know how to replace a string because you know the string –replace operator. Simply tell the operator what term you want to replace in a string:

"Hello, Ralph" -replace "Ralph", "Martina" 
Hello, Martina

But simple replacement isn’t always sufficient, so you can also use regular expressions for replacements. Some of the following examples show how that could be useful.

Let’s say you’d like to replace several different terms in a string with one other term. Without regular expressions, you’d have to replace each term separately. With regular expressions, simply use the alternation operator, “|”:

"Mr. Miller and Mrs. Meyer" -replace "(Mr.|Mrs.)", "Our client" 
Our client Miller and Our client Meyer

You can type any term in parentheses and use the “|” symbol to separate them. All the terms will be replaced with the replacement string you specify.

Using Back References

This last example replaces specified keywords anywhere in a string. Often, that’s sufficient, but sometimes you don’t want to replace a keyword everywhere it occurs but only when it occurs in a certain context. In such cases, the context must be defined in some way in the pattern. How could you change the regular expression so that it replaces only the names Miller and Meyer? Like this:

"Mr. Miller, Mrs. Meyer and Mr. Werner" -replace "(Mr.|Mrs.)s*(Miller|Meyer)", "Our client" 
Our client, Our client and Mr. Werner

The result looks a little peculiar, but the pattern you’re looking for was correctly identified. The only replacements were Mr. or Mrs. Miller and Mr. or Mrs. Meyer. The term “Mr. Werner” wasn’t replaced. Unfortunately, the result also shows that it doesn’t make any sense here to replace the entire pattern. At least the name of the person should be retained. Is that possible?

This is where the back referencing you’ve already seen comes into play. Whenever you use parentheses in your regular expression, the result inside the parentheses is evaluated separately, and you can use these separate results in your replacement string. The first sub-expression always reports whether a “Mr.” or a “Mrs.” was found in the string. The second sub-expression returns the name of the person. The terms “$1” and “$2” provide you the sub-expressions in the replacement string (the number is consequently a sequential number; you could also use “$3” and so on for additional sub-expressions).

"Mr. Miller, Mrs. Meyer and Mr. Werner" -replace "(Mr.|Mrs.)s*(Miller|Meyer)", "Our client $2" 
Our client , Our client  and Mr. Werner

The back references don’t seem to work. Can you see why? “$1” and “$2” look like PowerShell variables, but in reality they are part of the regular expression. As a result, if you put the replacement string inside double quotes, PowerShell replaces “$2” with the PowerShell variable $2, which is probably undefined. Use single quotation marks instead, or add a backtick to the “$” special character so that PowerShell won’t recognize it as its own variable and replace it:

# Replacement text must be inside single quotation marks so that the PS variable $2:
"Mr. Miller, Mrs. Meyer and Mr. Werner" -replace "(Mr.|Mrs.)s*(Miller|Meyer)", 'Our client $2' 
Our client Miller, Our client Meyer and Mr. Werner

# Alternatively, $ can also be masked by `$:
"Mr. Miller, Mrs. Meyer and Mr. Werner" -replace "(Mr.|Mrs.)s*(Miller|Meyer)", "Our client `$2" 
Our client Miller, Our client Meyer and Mr. Werner

Putting Characters First at Line Beginnings

Replacements can also be made in multiple instances in text of several lines. For example, when you respond to an e-mail, usually the text of the old e-mail is quoted in your new e-mail and marked with “>” at the beginning of each line. Regular expressions can do the marking.

However, to accomplish this, you need to know a little more about “multi-line” mode. Normally, this mode is turned off, and the “^” anchor represents the text beginning and the “$” the text ending. So that these two anchors refer respectively to the line beginning and line ending of a text of several lines, the multi-line mode must be turned on with the “(?m)” statement. Only then will –replace substitute the pattern in every single line. Once the multi-line mode is turned on, the anchors “^” and “A”, as well as “$” and “Z”, will suddenly behave differently. “A” will continue to indicate the text beginning, while “^” will mark the line ending; “Z” will indicate the text ending, while “$” will mark the line ending.

# Using Here-String to create a text of several lines:
$text = @" 
>> Here is a little text. 
>> I want to attach this text to an e-mail as a quote. 
>> That's why I would put a ">" before every line. 
>> "@ 
>> 
$text 
Here is a little text.
I want to attach this text to an e-mail as a quote.
That's why I would put a ">" before every line.

# Normally, -replace doesn't work in multiline mode. For this reason,
# only the first line is replaced:
$text -replace "^", "> " 
> Here is a little text.
I want to attach this text to an e-mail as a quote.
That's why I would put a ">" before every line.

# If you turn on multiline mode, replacement will work in every line:
$text -replace "(?m)^", "> " 
> Here is a little text.
> I want to attach this text to an e-mail as a quote.
> That's why I would put a ">" before every line.


# The same can also be accomplished by using a RegEx object,
# where the multiline option must be specified:
[regex]::Replace($text, "^", "> ", [Text.RegularExpressions.RegExOptions]::Multiline) 
> Here is a little text.
> I want to attach this text to an e-mail as a quote.
> That's why I would put a ">" before every line.

# In multiline mode, A stands for the text beginning and ^ for the line beginning:
[regex]::Replace($text, "A", "> ", [Text.RegularExpressions.RegExOptions]::Multiline) 
> Here is a little text.
I want to attach this text to an e-mail as a quote.
That's why I would put a ">" before every line.

Removing White Space

Regular expressions can perform routine tasks as well, such as remove superfluous white space. The pattern describes a blank character (char: “s”) that occurs at least twice (quantifier: “{2,}”). That is replaced with a normal blank character.

"Too   many   blank   characters" -replace "s{2,}", " " 
Too many blank characters

Finding and Removing Doubled Words

How is it possible to find and remove doubled words in text? Here, you can use back referencing again. The pattern could be described as follows:

"b(w+)(s+1){1,}b"

The pattern searched for is a word (anchor: “b”). It consists of one word (the character “w” and quantifier “+”). A blank character follows (the character “s” and quantifier “?”). This pattern, the blank character and the repeated word, must occur at least once (at least one and any number of iterations of the word, quantifier “{1,}”). The entire pattern is then replaced with the first back reference, that is, the first located word.

# Find and remove doubled words in a text:
"This this this is a test" -replace "b(w+)(s+1){1,}b", '$1' 
This is a test

Summary

Text is defined either by single or double quotation marks. If you use double quotation marks, PowerShell will replace PowerShell variables and special characters in the text. Text enclosed in single quotation marks remains as-is. If you want to prompt the user for input text, use the Read-Host cmdlet. Multi-line text can be defined with Here-Strings, which start with @”(Enter) and end with “@(Enter).

By using the format operator –f, you can compose formatted text. This gives you the option to display text in different ways or to set fixed widths to output text in aligned columns (Table 13.3 through Table 13.5). Along with the formatting operator, PowerShell has a number of string operators you can use to validate patterns or to replace a string (Table 13.2).

PowerShell stores text in string objects, which support methods to work on the stored text. You can use these methods by typing a dot after the string object (or the variable in which the text is stored) and then activating auto complete (Table 13.6). Along with the dynamic methods that always refer to text stored in a string object, there are also static methods that are provided directly by the string data type by qualifying the string object with “[string]::”.

The simplest way to describe patterns is to use the simple wildcards in Table 13.7. Simple wildcard patterns, while easy to use, only support very basic pattern recognition. Also, simple wildcard patterns can only recognize the patterns; they cannot extract data from them.

A far more sophisticated tool are regular expressions. They consist of very specific placeholders, quantifiers and anchors listed in Table 13.11. Regular expressions precisely identify even complex patterns and can be used with the operators -match or –replace. Use the .NET object [regex] if you want to match multiple pattern instances.

Free Trial

SQL Diagnostic Manager

SQL Compliance Manager

SQL Secure

SQL Safe Backup

SQL Inventory Manager

SQL Admin Toolset

Cross-Platform Product

Aqua Data Studio

ER/Studio

Free Tools

Free Trial

Resources

Support

Events

Contact Sales

Customers

Free Trial

Enterprises

Database

Cloud Services

Applications

Chapter 13. Text and Regular Expressions

Defining Text

Special Characters in Text

Resolving Variables

“Here-Strings”: Multi-Line Text

Communicating with the User

Composing Text with “-f”

Setting Numeric Formats

Outputting Values in Tabular Form: Fixed Width

String Operators

String Object Methods

Analyzing Methods: Split() as Example

Simple Pattern Recognition

Regular Expressions

Describing Patterns

Quantifiers

Anchors

Recognizing IP Addresses

Validating E-Mail Addresses

Simultaneous Searches for Different Terms

Case Sensitivity

Finding Information in Text

Searching for Several Keywords

Forming Groups

Greedy or Lazy? Shortest or Longest Possible Result

Finding String Segments

Replacing a String

Using Back References

Putting Characters First at Line Beginnings

Removing White Space

Finding and Removing Doubled Words

Summary

Categories