A recent conversation in the comments of this blog post brought up the subject of how to handle columns that have either been removed from or added to a data source in Power Query. Anyone who has worked with csv files knows that they have a nasty habit of changing format even when they aren’t supposed to, and added or removed columns can cause all kinds of problems downstream.

Ken Puls (whose excellent blog you are probably already reading if you’re interested in Power Query) pointed out that it’s very easy to protect yourself  against new columns in your data source. When creating a query, select all the columns that you want and then right-click and select Remove Other Columns:

image

This means that if any new columns are added to your data source in the future, they won’t appear in the output of your query. In the M code the Table.SelectColumns() function is used to do this.

Dealing with missing columns is a little bit more complicated. In order to find out whether a column is missing, first of all you’ll need a list of columns that should be present in your query. You can of course store these tables in a table in Excel and enter the column names manually, or you can do this in M fairly easily by creating a query that connects to your data source and using the Table.ColumnNames() function something like this:

let
    //Connect to CSV file
    Source = Csv.Document(
                      File.Contents(
                       "C:\Users\Chris\Documents\Power Query demos\SampleData.csv"
                      ),null,",",null,1252),
    //Use first row as headers
    FirstRowAsHeader = Table.PromoteHeaders(Source),
    //Get a list of column names
    GetColumns = Table.ColumnNames(FirstRowAsHeader),
    //Turn this list into a table
    MakeATable = Table.FromList(
                              GetColumns, 
                              Splitter.SplitByNothing(), 
                              null, 
                              null, 
                              ExtraValues.Error),
    //Rename this table's sole column
    RenamedColumns = Table.RenameColumns(
                                         MakeATable ,
                                         {{"Column1", "ColumnName"}})
in
    RenamedColumns 

Given a csv file that looks like this:

image

…the query above returns the following table of column names:

image

You can then store the output of this query in an Excel table for future reference – just remember not to refresh the query!

Having done that, you can then look at the columns returned by your data source and compare them with the columns you are expecting by using the techniques shown in this post. For example, here’s a query that reads a list of column names from an Excel table and compares them with the columns returned from a csv file:

let
    //Connect to Excel table containing expected column names
    ExcelSource = Excel.CurrentWorkbook(){[Name="GetColumnNames"]}[Content],
    //Get list of expected columns
    ExpectedColumns = Table.Column(ExcelSource, "ColumnName"),
    //Connect to CSV file
    CSVSource = Csv.Document(
                      File.Contents(
                       "C:\Users\Chris\Documents\Power Query demos\SampleData.csv"
                      ),null,",",null,1252),
    //Use first row as headers
    FirstRowAsHeader = Table.PromoteHeaders(CSVSource),
    //Get a list of column names in csv
    CSVColumns = Table.ColumnNames(FirstRowAsHeader),
    //Find missing columns
    MissingColumns = List.Difference(ExpectedColumns, CSVColumns),
    //Find added columns
    AddedColumns = List.Difference(CSVColumns, ExpectedColumns),
    //Report what has changed
    OutputMissing = if List.Count(MissingColumns)=0 then
                     "No columns missing" else
                     "Missing columns: " & Text.Combine(MissingColumns, ","),
    OutputAdded = if List.Count(AddedColumns)=0 then
                     "No columns added" else
                     "Added columns: " & Text.Combine(AddedColumns, ","),
    Output = OutputMissing & "   " & OutputAdded
in
    Output

Given a csv file that looks like this:

image

…and an Excel table like the one above containing the three column names Month, Product and Sales, the output of this query is:

image

It would be very easy to convert this query to a function that you could use to check the columns expected by multiple queries, and also to adapt the output to your own needs. Also, in certain scenarios (such as when you’re importing data from SQL Server) you might also want to check the data types used by the columns; I’ll leave that for another blog post though. In any case, data types aren’t so much of an issue with CSV files because it’s Power Query that imposes the types on the columns within a query, and any type conversion issues can be dealt with by Power Query’s error handling functionality (see Gerhard Brueckl’s post on this topic, for example).

You can download a workbook containing the two queries from this post here.