domingo, 8 de enero de 2017

Working With Compression In Power Query And Power BI Desktop

If you’re reading this post there’s one important question you’ll probably want to ask: is it possible to extract data from a zip file in Power Query/Power BI? The answer is, unfortunately, no (at least at the time of writing). As this answer from Tristan on the dev team explains, because there are so many flavours of zip file out there it’s an extremely difficult problem to solve – so it hasn’t been attempted yet. That said, there are two other mildly interesting things to learn about compression in Power Query/Power BI Desktop that I thought were worth blogging about…
The first is that Power Query/Power BI can work with gzip files. For example, given a gzip file that contains a single csv file, here’s an example M query showing how the Binary.Decompress() function can be used to extract the csv file from the gzip file and then treat the contents of the csv file as a table:
01
02
03
04
05
06
07
08
09
10
11
12
13
14
let
    Source = Binary.Decompress(
     File.Contents(
      "C:\Myfolder\CompressedData.csv.gz"),
       Compression.GZip),
    #"Imported CSV" = Csv.Document(Source,
     [Delimiter=",",Encoding=1252]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV"),
    #"Changed Type" = Table.TransformColumnTypes(
     #"Promoted Headers",{
      {"Month", type text}, {" Sales", Int64.Type}
      })
in
    #"Changed Type"
The other is that you’ll see Binary.Decompress() used when you import an Excel workbook that contains a linked table into Power BI Desktop. For example, consider an Excel workbook that contains the following table:
image
If this table is imported into the Excel Data Model as a linked table, and you then save the workbook and try to import it into Power BI using File/Import/Excel Workbook Contents:
image
… you’ll see this message:
image
Click Start and you’ll get another message:
image
If you choose the Copy Data option, the data from the Excel table will be copied into Power BI. But where is it stored exactly? A look in the Query Editor at the query that returns the data shows that it’s embedded in the M code itself:
1
2
3
4
5
6
7
8
9
10
11
12
13
let
    Source = Table.FromRows(
     Json.Document(
      Binary.Decompress(
       Binary.FromText(
        "i45WMlTSUTJSitWJVjIGskzALFMgy0wpNhYA",
        BinaryEncoding.Base64),
       Compression.Deflate)),
      {"A","B"}),
    #"Changed Type" = Table.TransformColumnTypes(
     Source,{{"A", Int64.Type}, {"B", Int64.Type}})
in
    #"Changed Type"
That big chunk of text in the middle of the Source step is the data from the Excel table stored as a compressed JSON document, and again Binary.Decompress() is used to extract this data.