Today I am combining a few techniques to show how to build a robust cleansing approach for your data using Power Query. This article will demonstrate the following Power Query techniques
- Joining tables (Merge Query)
- Self Referencing tables
- Adding Custom Columns
Sample Data
I am using some sample data as shown below. The Country data is the data I want to cleanse so there are consistent country values in the Country column.
I first load this data into Power Query and set “Close and Load to” so that it only creates a connection. I then have Power Query access to the source data table that can be reused multiple times.
Create a Table of All Entered Countries
The next step is to produce a list of the unique entries in the source data. To do this, I referenced the above table by right clicking (1 below), then Reference (2)
I then removed the Name column (1 below) and then removed the duplicates in the remaining Country Column (2 below).
I then sorted the column before loading the data back to Excel as shown below.
Create the Substitutes with a Self Referencing Table
I first covered this trick of a self referencing table in this blog article. Continuing on with my test data above, I have manually added a new column to the green table above and entered the substitutions as needed. Where there is no substitute needed I have simply left the substitute blank.
As it currently stands above, if the source table is sorted and then the green table is refreshed, the substitute column will fall out of alignment with the source table – that is bad. Now for the self referencing trick. I load this green table above back into Power Query as a new query, setting it to only create a connection as I did for the original set of data. I named this new query “Substitutes”
I then edited the original query that created the original green table and joined it to the substitute table (aka self referencing). While editing the Entered Names table, I selected Merge Queries (1 below), selected the Substitute table (2), then joined on the Country Columns (3, 4) with a left outer join (5).
I then expanded the new column to extract the substitutes.
This now gives me a self referencing table that will refresh while keeping the countries and substitutes correctly aligned regardless of what happens to the source table. This is the key point about a self referencing table (covered in my previous article).
Update the New Data
Now when new data is added to the source table, it will appear in the substitute table after refresh as follows.
What would be much better is to be clear which are the correct entries that have previously been checked so it is more obvious which are the new entries. So I decided to enter the correct country name for each entry as follows.
It is now much clearer that there are 3 new entries that need to be maintained, and I then updated the new data below.
Use the Substitute Table
Now that I have this substitute table working for me, all I need to do is create a new copy of my source table that uses the substitute country rather than the original country entered. I create a new query referencing my original source data (right click, reference).
Merge the queries as before, joining on the original country column.
expand the table to retrieve the substitute column
Then delete the original country column and rename the substitute column to be called “Country”
As you can see below, the substitute table is correctly remapping the data as needed.
When the new data is added to the source table, the substitute table will have blank entries as shown below.
As a nice touch, I went back into the final query and added a new custom column as follows

and then used this new column as the country column. Now when I refresh the data, it is clear that I need to do some maintenance on the substitute table.

If you want a comprehensive lesson on how to use Power Query, checkout my training course here https://exceleratorbi.com.au/product/power-query-excel-power-bi-online-training/


Hi there! I could have sworn I’ve been to this blog before but after reading through some
of the post I realized it’s new to me. Anyhow, I’m definitely glad I found it and I’ll be bookmarking
and checking back often!
My blog; Gaining Targeted twitter followers
continuously i used to read smaller posts which also
clear their motive, and that is also happening with this post which I am
reading at this place.
Here is my website – Personal bankruptcy
A fascinating discussion is worth comment.
I think that you ought to publish more about this subject matter, it might not be
a taboo matter but usually folks don’t speak about such issues.
To the next! Kind regards!!
Here is my webpage Entry Level Attorney Jobs
Nice post. I learn something new and challenging on sites I stumbleupon on a daily basis.
It will always be interesting to read through articles from other writers and
use something from their sites.
Feel free to visit my web-site;
Unquestionably imagine that which you stated. Your favourite reason seemed to be on the net the easiest factor to take into accout
of. I say to you, I certainly get irked whilst other folks think about worries that they just don’t realize about.
You managed to hit the nail upon the highest as smartly as defined out
the whole thing with no need side-effects , other folks can take a signal.
Will likely be again to get more. Thanks
Feel free to visit my webpage … Mlm Company
When I originally commented I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get several e-mails with the same comment. Is there any way you can remove people from that service? Appreciate it!
Greetings! Very helpful advice on this article! It is the little changes that make the biggest changes. Thanks a lot for sharing!
I am usually to running a blog and i really appreciate your content. The article has actually peaks my interest. I am going to bookmark your web site and preserve checking for brand spanking new information.
Hey! This is kind of off topic but I need some advice from an established blog. Is it very difficult to set up your own blog? I’m not very techincal but I can figure things out pretty fast. I’m thinking about creating my own but I’m not sure where to begin. Do you have any points or suggestions? Appreciate it
Some genuinely choice blog posts on this website , saved to fav.
I do trust all the ideas you have presented to your post. They are very convincing and can certainly work. Still, the posts are very brief for starters. May just you please extend them a bit from next time? Thanks for the post.
I really like your writing style, superb information, appreciate it for posting :D. “Nothing sets a person so much out of the devil’s reach as humility.” by Johathan Edwards.
Thanx for the effort, keep up the good work Great work, I am going to start a small Blog Engine course work using your site I hope you enjoy blogging with the popular BlogEngine.net.Thethoughts you express are really awesome. Hope you will right some more posts.
Can you be more specific about the content of your article? After reading it, I still have some doubts. Hope you can help me.
you are in point of fact a good webmaster. The website loading speed is amazing. It sort of feels that you are doing any unique trick. Furthermore, The contents are masterwork. you’ve done a wonderful task on this subject!
Este site é realmente incrível. Sempre que acesso eu encontro coisas incríveis Você também pode acessar o nosso site e descobrir mais detalhes! informaçõesexclusivas. Venha saber mais agora! 🙂
I’ve been surfing online greater than three hours today, yet I by no means found any attention-grabbing article like yours. It’s beautiful price sufficient for me. In my view, if all web owners and bloggers made just right content material as you probably did, the net shall be a lot more useful than ever before.
Appreciating the persistence you put into your blog and in depth information you offer. It’s awesome to come across a blog every once in a while that isn’t the same unwanted rehashed material. Wonderful read! I’ve saved your site and I’m adding your RSS feeds to my Google account.
hi!,I like your writing very much! share we communicate more about your article on AOL? I require a specialist on this area to solve my problem. May be that’s you! Looking forward to see you.
When I originally commented I clicked the -Notify me when new comments are added- checkbox and now each time a comment is added I get four emails with the same comment. Is there any way you can remove me from that service? Thanks!
How do you see this working behind the scenes? Does the join between Cleansed Data and Substitutes persist? Or is the join run before (for example) Source is resorted?
I’m not sure what you mean by “persist”. The join is made and the substitutions are made. The substitution table remains as part of the model. If you refresh you data later and new data exists, the substitutions will work on that new data too. If you get new substitution requirements, you will need to add new records in the substitution tables
Very useful post. I kept struggling with Units of Measure in multiple versions ( eg. “Kg”, “kg”, “kgs”, “kg.” ).
I actually use Table.TransformColumns() instead of merging to save the Expand, Remove, and Rename Columns steps. Something like:
= Table.TransformColumns( YourTable, {{“Country”, each try Substitutes{ [Country = _ ] }[Substitute] otherwise _ }} )
Ok, performance can be an issue with my approach. Merging is extremely fast compared to accessing a record field for each row, at least with my data that has ~60k rows and 40 original values that are subsituted by 20 ‘correct’ values.
Hi! I was having trouble with the middle part, because it kept adding a new column to the green table [Substitute2], because I didn’t see that I had to delete the entered names table that we manually updated. Also, at the end, I think it is confusing where you switch to using the NAME instead of COUNTRY. The orange table ends up having people’s first name in the country column. Thank you for all that you do!
Ohhh, I have to admit I struggled to understand what you were talking about. Wow, yes this was a typo in the formula. I will fix it now. Thanks for letting me know
This is awesome, because I actually did something similar to this for a project just last month. I had a data set that might have client names or might not have client names, and the names may or may not be correct. Using that glorious, glorious self-referencing tables technique, I set it up so the user can update them or not as they choose, and another column where they noted what needed to happen for the accounting data on that line so they could see easily what had or hadn’t been looked at. I think I like your three table approach better for most cases though, so I’m bookmarking this one (grins)!
One thing I have been playing around with is trying to get the date of the last update on a line to show automatically, but I haven’t quite gotten that figured out yet.
I doubt you will get auto date working, as anything that auto adds the date will also auto update. You could use VBA, but maybe just as easy is Ctrl + ; (which is a great short cut I use all the time)
Yep, VBA seems to be the only option, but I think you’re right that it’s more trouble than it’s worth. And that’s a handy shortcut, I’ll add that in to the best practices list, thanks!
You can create in Excel a simple table with two columns (ParamaterName, ParameterValue). Your first row could be (Date, =today()). This will always give you the current date. If you want to be more precise, you can use (Timestamp, =now()), which will give you date and time. Load this table in PQ and call it Parameters. In any query which is used to get a snapshot of data and needs a timestamp, you simply create a new column and then merge that query with the Parameters query.