- http://franckrichard.blogspot.com/2010/08/powershell-get-encoding-file-type.html
- Chad Millers script (referenced above) - http://poshcode.org/2059
- and, Lee Holmes variant - http://poshcode.org/2153
To give myself something to work with I decided to explore the standard encodings available with most cmdlets. Interestingly, there are a few standard encodings that you should be familiar with:
- ASCII
- Big Endian Unicode
- Default
- OEM
- Unicode
- UTF-32
- UTF-7
- UTF-8
In testing the output for these, I used this approach:
which yielded this output:unicode,utf7,utf8,utf32,ascii,bigendianunicode,default,oem |sort |% {Out-File -FilePath "C:dataDocumentsPowershellProjectsEncoding est$_.txt" -InputObject Test -Encoding $_;$bytearray = Get-Content -Path "C:dataDocumentsPowershellProjectsEncoding est$($_).txt" -Encoding byte"$($_): $($bytearray -join )"}
As you can see, there are some similarities between each, but, when working with encoding it is important to know what is "expected" and what is purely data. I highlighted the "common" characters in red so it was obvious what the control was in each case. Alternatively, here is the same thing in Hex.ascii: 84 101 115 116 13 10bigendianunicode: 254 255 0 84 0 101 0 115 0 116 0 13 0 10default: 84 101 115 116 13 10oem: 84 101 115 116 13 10unicode: 255 254 84 0 101 0 115 0 116 0 13 0 10 0utf32: 255 254 0 0 84 0 0 0101 0 0 0 115 0 0 0 116 0 0 0 13 0 0 0 10 0 0 0utf7: 84 101 115 116 13 10utf8: 239 187 191 84 101 115 116 13 10
It is clear you need to be careful when you are dealing with unknown file formats. I will more than likely use Lees function as it covers some non-standard encodings:unicode,utf7,utf8,utf32,ascii,bigendianunicode,default,oem |sort |% {Out-File -FilePath "C:dataDocumentsPowershellProjectsEncoding est$_.txt" -InputObject Test -Encoding $_;$bytearray = Get-Content -Path "C:dataDocumentsPowershellProjectsEncoding est$($_).txt" -Encoding byte"$($_): {0}" -f (($bytearray | % { [Convert]::ToString($_,16).PadLeft(2,"0")}) -join )}ascii: 54 65 73 74 0d 0abigendianunicode: fe ff 00 54 00 65 00 73 00 74 00 0d 00 0adefault: 54 65 73 74 0d 0aoem: 54 65 73 74 0d 0aunicode: ff fe 54 00 65 00 73 00 74 00 0d 00 0a 00utf32: ff fe 00 00 54 00 00 00 65 00 00 00 73 00 00 00 74 00 00 00 0d 00 00 00 0a 00 00 00utf7: 54 65 73 74 0d 0autf8: ef bb bf 54 65 73 74 0d 0a
function Get-FileEncoding{################################################################################## Get-FileEncoding#### From Windows PowerShell Cookbook (OReilly)## by Lee Holmes (http://www.leeholmes.com/guide)################################################################################<#.SYNOPSISGets the encoding of a file.EXAMPLEGet-FileEncoding.ps1 .UnicodeScript.ps1BodyName : unicodeFFFEEncodingName : Unicode (Big-Endian)HeaderName : unicodeFFFEWebName : unicodeFFFEWindowsCodePage : 1200IsBrowserDisplay : FalseIsBrowserSave : FalseIsMailNewsDisplay : FalseIsMailNewsSave : FalseIsSingleByte : FalseEncoderFallback : System.Text.EncoderReplacementFallbackDecoderFallback : System.Text.DecoderReplacementFallbackIsReadOnly : TrueCodePage : 1201#>param(## The path of the file to get the encoding of.$Path)Set-StrictMode -Version Latest## The hashtable used to store our mapping of encoding bytes to their## name. For example, "255-254 = Unicode"$encodings = @{}## Find all of the encodings understood by the .NET Framework. For each,## determine the bytes at the start of the file (the preamble) that the .NET## Framework uses to identify that encoding.$encodingMembers = [System.Text.Encoding] |Get-Member -Static -MemberType Property$encodingMembers | Foreach-Object {$encodingBytes = [System.Text.Encoding]::($_.Name).GetPreamble() -join -$encodings[$encodingBytes] = $_.Name}## Find out the lengths of all of the preambles.$encodingLengths = $encodings.Keys | Where-Object { $_ } |
0 comments:
Post a Comment