Remove Duplicates and Keep the Last Record with Power Query

Learn how to remove duplicates keep the last record on power query.

Today I was helping a customer with a problem that seemed quite simple on the surface.  She had a data table containing historical customer sales orders (each customer has many orders on different dates).  The objective was to filter this table in Power Query and just load one record for each customer – the one that was the last order date.  To illustrate the problem more clearly, I have adapted the scenario using the Adventure Works database so you can remove duplicates keep the last record on power query.

Adventure Works Example

The Sales table contains all the historical sales transactions by customer (identified by CustomerKey) and each transaction has an Order Date. The objective is to filter this table in Power Query so as to keep only the last entry for each customer (the last entry is the most recent order date).  At the first instance, the solution seems to be simple.  In Power Query, you would think that you simply:

  • Sort the table by Order Date in descending order.
  • Select the customer key column and then remove duplicates.

But when you do this in Power Query, it does not work as expected. As you can see in the Sales table below, each customer has many transactions with different order dates.

remove duplicates keep last record power query

In Power Query, I sorted by OrderDate descending, then removed duplicates as shown below.

remove duplicates keep last record power query

But the solution is not correct – the order dates for some of the customers are actually not the last orders. The table on the left below shows the original data sorted by OrderDate for each customer. The table on the right below shows the results from Power Query. If you compare the full data on the left with the Power Query results on the right, you can see that PQ has returned the wrong order date for some customers.

Removing Duplicates Expected Incorrect

Public Training Banner

Why Doesn’t it Work?

I can’t say that I have a deep technical understanding of the problem, but I do have a conceptual understanding.  When you select “sort column”, it is reasonable to expect that the entire table is sorted before proceeding to the next step.  In reality, it is only the data that is loaded in memory that is sorted.  The remaining data on disk is not included in the sort.  Power Query also uses a concept called “lazy evaluation”.  In short this means that if you add a step in the code, and that step is not technically needed to produce the final result, then that step is actually never executed (even though it is there in the instructions) – weird I know, but very efficient.

Table.Buffer to the Rescue

Before I share this solution, let me point out there are other ways to solve the problem, specifically using group by. However, the purpose of this article is to broaden readers understanding of Power Query and introduce the table.buffer function.

I am pretty sure I learnt this tip from Imke Feldman at The BIccountant (or possibly Chris Webb).  Both are absolute wizes at this stuff.  To solve the problem you will need to get in and make some manual changes to the M code.  To do this, first make sure you turn on the formula bar.  Go to the View menu and select “formula bar”.

remove duplicates keep last record power query formula bar

When I click on the step that sorts the table (desc) by OrderDate, the M code was as follows:

Removing Duplicates 4

To solve the problem, I need to force Power Query to load all the data into memory, forcing the sort to be completed now before proceeding.  All I did was to wrap the line of code above inside the Table.Buffer( ) function as shown below.

Removing Duplicates 5

The rest of the steps remain the same. The Table.Buffer( ) function forces the entire set of data to be loaded into memory after sorting and the hence the next step of removing duplicates works correctly on the entire data set.

The resulting table looks as follows:

Removing Duplicates 2

These results are now correct as you can see in the table below.  The OrderDate (Incorrect Solution) column is the result without using Table.Buffer( ) and the OrderDate (Correct Solution) column is the result of using Table.Buffer( ).  You can see several customers have different results.  The correct result can be manually validated against the raw data.

Removing Duplicates Solution

Here is the sample workbook and the source data that I used in this blog post.

Removing Duplicates Sample

265 thoughts on “Remove Duplicates and Keep the Last Record with Power Query”

  1. Hello, i feel that i saw you visited my weblog so i got here to return the prefer?.I’m trying to to
    find issues to enhance my website!I assume its adequate to
    make use of some of your concepts!!

  2. The role of SEO is to enhance the visibility and ranking of a website or webpage in the natural results of search engines like Google, Bing or Yahoo! Search Engine Optimization (SEO) is an essential aspect of any successful online business strategy.

  3. With the vast number of websites on the internet, it is essential that the site ranks high on the SERP to gain maximum visibility. In conclusion, SEO is a valuable tactic that businesses must adopt in their online marketing strategies.

  4. บทความนี้ น่าสนใจดี ครับ
    ผม ไปเจอรายละเอียดของ เรื่องที่เกี่ยวข้อง

    สามารถอ่านได้ที่ Jolie
    น่าจะถูกใจใครหลายคน
    มีตัวอย่างประกอบชัดเจน
    ขอบคุณที่แชร์ คอนเทนต์ดีๆ นี้
    จะรอติดตามเนื้อหาใหม่ๆ
    ต่อไป

  5. Fantastic goods from you, man. I’ve understand your stuff previous to and you’re just too excellent. I really like what you have acquired here, certainly like what you are saying and the way in which you say it. You make it enjoyable and you still take care of to keep it smart. I cant wait to read much more from you. This is really a great website.

  6. เนื้อหานี้ มีประโยชน์มาก
    ครับ
    ดิฉัน ไปเจอรายละเอียดของ เรื่องที่เกี่ยวข้อง

    ซึ่งอยู่ที่ รีวิว betflik85 ล่าสุด
    สำหรับใครกำลังหาเนื้อหาแบบนี้

    เพราะให้ข้อมูลเชิงลึก
    ขอบคุณที่แชร์ ข้อมูลที่มีประโยชน์ นี้
    และอยากเห็นบทความดีๆ
    แบบนี้อีก

  7. Good day I am so happy I found your blog page,
    I really found you by error, while I was researching on Bing
    for something else, Regardless I am here now and would
    just like to say thank you for a marvelous post and a all round
    thrilling blog (I also love the theme/design),
    I don’t have time to look over it all at
    the minute but I have book-marked it and also added your RSS feeds, so when I have time I
    will be back to read much more, Please do keep up the excellent
    work.

  8. Hello there! This post couldn’t be written any better! Looking through this article
    reminds me of my previous roommate! He constantly kept preaching about this.

    I’ll send this information to him. Pretty sure he’s going to have a
    great read. Thank you for sharing!

  9. 하루의 피로를 풀고 싶으신가요? 저희 출장마사지 서비스로 집이나 사무실에서 편안한 휴식을 즐겨보세요! 고객의 편안함과 건강을 최우선으로 생각하며 맞춤형 힐링 서비스를 제공합니다.

  10. You’re so awesome! I don’t believe I have read a single thing like that before. So great to find someone with some original thoughts on this topic. Really.. thank you for starting this up. This website is something that is needed on the internet, someone with a little originality! 👉 Watch Live Tv online in HD. Stream breaking news, sports, and top shows anytime, anywhere with fast and reliable live streaming.

  11. Definitely imagine that that you stated. Your favourite
    reason seemed to be on the web the easiest factor to remember of.
    I say to you, I definitely get irked at the same time as people consider issues that they plainly don’t
    understand about. You controlled to hit the nail upon the top as neatly as outlined out the entire thing with no need side effect , other folks can take a signal.
    Will probably be back to get more. Thank you

  12. My partner and I stumbled over here from a different page and thought
    I may as well check things out. I like what I see
    so now i am following you. Look forward to looking at your web page repeatedly.

  13. You actually make it appear so easy along with your presentation however I
    to find this matter to be actually somethimg which I think I
    ight by no means understand. It seems too complex and extremely vast for
    me. I am taking a look ahead for your subsequent post,
    I’ll attempt to get the hold of it!

    Also visit my homepage :: new York

  14. Having read this I believed it was very enlightening. I appreciate you finding the time and effort to put this short article together. I once again find myself personally spending a lot of time both reading and leaving comments. But so what, it was still worthwhile!

    my web page: http://Fen.Gku.An.GX.R.Ku.Ai8.Xn–.Xn–.U.KMeli.S.A.Ri.C.H4223@www.Trackroad.com/conn/garminimport?returnurl=https://Familylawyernycgroupp.com/practice-areas/grandparents-rights/

  15. Do yoou mind if I quote a few of your posts as long as I provide credit and sources back to your blog?
    My website is in thhe exact same area off interest as yours and my visitors would definitely benefit from a lot off the information you
    provide here. Please let me know if this alright with you.
    Cheers!

    Allso visit my webpage – รับจัดดอกไม้หน้าโลงศพ

  16. Hi there! This is my first visit to your blog! We are a team of volunteers and starting a new initiative in a community in the same niche. Your blog provided us beneficial information to work on. You have done a wonderful job!

  17. Technical factors: Technical factors, such as website structure, meta tags, sitemaps, and schema markup, also influence your ranking. Ensuring that your website follows best practices for technical SEO can improve your visibility and ranking.

  18. Search engine optimization (SEO) is the process of optimizing a website to rank higher on search engine results pages (SERPs). The importance of SEO cannot be overstated, and without it, a website might as well not exist.

  19. I love your blog.. very nice colors & theme. Did you create this website yourself or did you hire someone to do it for you? Plz respond as I’m looking to construct my own blog and would like to know where u got this from. thank you

  20. Hey there, I think your website might be having browser compatibility issues. When I look at your blog site in Firefox, it looks fine but when opening in Internet Explorer, it has some overlapping. I just wanted to give you a quick heads up! Other then that, amazing blog!

  21. Hey are using WordPress for your blog platform? I’m new to the blog world but I’m trying to get started and set up my own. Do you require any html coding knowledge to make your own blog? Any help would be greatly appreciated!

  22. Do you mind if I quote a few of your articles as
    long as I provide credit and sources back to your blog?
    My blog is in the exact same niche as yours and my visitors would definitely benefit from some of
    the information you present here. Please
    let me know if this alright with you. Cheers!

  23. I’m rеally enjoying the design and layout of your blog.

    It’s a very easy on the eyes which makеѕ it much more
    pleasant for me to come here and vіsit more often. Did you
    hire out a developer to create your theme? Superb
    ԝork!

    my web blog Rafa88

  24. Have you ever considered about adding a little bit more than just your articles? I mean, what you say is valuable and everything. But think of if you added some great photos or video clips to give your posts more, “pop”! Your content is excellent but with pics and video clips, this website could definitely be one of the greatest in its niche. Very good blog!

  25. My brοther recommended Ι migһt like this web site. He used to bе totally right.
    Tһis submit actuallү made my day. You can not imagine simply how
    a ⅼot tіme I haɗ spent for this info! Thanks!

    Take a look at my page – Rafaslot

  26. I actually felt this blog was relevant and timely, either) and living a life with appreciation for the randomness and
    patterns of nature, keeps people feeling grounded. Considering vital memes is likewise some other sort of experience that can help us make the discovery of self understanding.

  27. Great post. I was checking constantly this blog and I am impressed!
    Extremely useful information particularly the closing phase 🙂 I care for uch information much.
    I used to be looking for this certain information for a very long time.
    Thanks and best of luck.

    My homepage us Flags

  28. Do you mind if I quote a couyple оf your posts
    as long as I providе crerit and sources back to
    your weblog? My bⅼog site is in the exact same niche as yours and my visitors would definitely
    benefit from a ⅼot of the information you present here.
    Ꮲlease let mme know if this okay ѡith you. Appreciate it!

    Also visit my homepagе … Rafa88

  29. Fantastic goods from you, man. I have understand your stuff previous
    to and you are just extremely excellent. I really like whjat you have
    acquired here, certainly like what you are stating
    and the way in which you say it. You make iit enjoyable and you still care for to keep it smart.
    I can’t wait to read far more from you. This is actually a great webb site.

    My webpage … Mariana

  30. I have a tendency to come across blogs along these lines at the most improbable moments.
    For example at present, I had been seeking a specific
    thing on an altogether different area and I found this website
    instead. I all too often spend time scouting
    the web for posts, but yet normally I prefer to be
    out in the open, walking in the area woods. You’ll find nothing is
    like fresh air to make you truly feel revitalized.
    This can be as soon as i am not accomplishing work as
    an at home massage therapist. It uses up a
    lot of my own time.

  31. Hi! Տommeone in my Ꮇyspace grօuρ ѕhared this site with us
    so I came to loߋk it ovеr. I’m definitely lovimց the information.
    I’m bookmrking and will be tweeting thіs to my followers!
    Great blog and wonderful dеsіgn.

    Look into my webpage Rafa77

  32. What’s Tɑking ρlаce i’m new tߋ this, I stumbled upօn this I’ve discovered It ɑbsolutely
    helpful annd it has helped me out loads.

    I hope to give a contribution & aid οther uѕers like itѕ helped
    me. Great job.

    Here is my weЬpage; Rafa88

  33. I do not even know how I ended up here, but I thought this post was good.
    I don’t know who you are but definitely you’re going to a famous blogger if you aren’t already 😉 Cheers!

  34. My own daily life being a busy event massage planner does not leave
    very much spare time over for contributing my own views on information articles, generally.
    However, having already stopped in this morning, I was trying
    to ensure I started out on the right footing, and get going with
    a comment. It really is gladdening in my opinion that there is a blog post about
    this vital issue, for the reason that it is not tackled nearly sufficiently in real life.

    First, thank you for the info, and your completely unique perspective.
    I could value this blogging site and specifically this informative article.
    At this time, Personally I think I misuse far
    too much time online, studying rubbish, generally.
    This was a refreshing change from that experience.
    However, I feel that examining other’s thoughts is a very vital investment
    of at least a bit of my weekly measure of time in my schedule.
    It’s the same as hunting through the junk heap to get the treasure.
    Or, whatever example will work for you. Nonetheless, being near the
    desktop computer is most likely as bad for you as cigarette smoking and fried potato
    chips.

  35. Hello, Neat post. There is a problem along with your site in internet explorer, might test this? IE still is the marketplace leader and a big portion of people will omit your great writing because of this problem.

  36. современные технологии в сельском хозяйстве

    I will immediately grab your rss as I can’t in finding your email subscription link or newsletter service.
    Do you’ve any? Please let me know so that I may subscribe.

    Thanks.

  37. Superb blog! Do you have any tips for aspiring writers? I’m hoping to start my own website soon but I’m a little lost on everything. Would you recommend starting with a free platform like WordPress or go for a paid option? There are so many choices out there that I’m completely overwhelmed .. Any recommendations? Cheers!

  38. Woah! I’m really enjoying the template/theme of this website. It’s simple, yet effective. A lot of times it’s difficult to get that “perfect balance” between superb usability and visual appearance. I must say you have done a fantastic job with this. Additionally, the blog loads very fast for me on Chrome. Excellent Blog!

  39. Wah, bahasan yang menarik banget nih! Dari apa yang saya lihat, situs yang modal kecil tapi gacor biasanya peluang menangnya juga cukup besar.

    Kalau kamu lagi cari rekomendasi, coba cek daftar situs https://getgosmart.com 10k terpercaya. Mereka fiturnya lengkap dan gampang dimainkan.

    Semoga info ini menambah referensi. Jangan lupa share juga pengalaman kamu kalau sudah coba ya!

  40. I’ve been surfing online more than 3 hours today, yet I never found any interesting article like yours. It is pretty worth enough for me. In my opinion, if all site owners and bloggers made good content as you did, the internet will be much more useful than ever before.

  41. Beneath the Act, tenants can take legal motion in the event that they consider their rental home isn’t habitable – with damp and mould included on the record of 29 hazards inside the Housing Well being & Safety Rating System (HHSRS).

  42. Aside from these there are Divorce attorneys, employment lawyers, fraud lawyers, medical malpractice legal professionals, software program legal professionals, site visitors lawyers, tax legal professionals, litigation attorneys and many others.

  43. Howdy! Quick question that’s totally off topic. Do you know how to make your site mobile friendly? My website looks weird when viewing from my iphone4. I’m trying to find a theme or plugin that might be able to correct this problem. If you have any recommendations, please share. Thanks!

  44. Thanks , I’ve just been searching for information about this subject for ages and yours is the greatest I have discovered so far. But, what concerning the bottom line? Are you positive concerning the source?

  45. Nice post. I was checking continuously this blog and I am impressed! Extremely useful information particularly the last part 🙂 I care for such info a lot. I was looking for this certain info for a long time. Thank you and good luck.

  46. I’m really impressed with your writing skills as well as with the layout on your blog. Is this a paid theme or did you modify it yourself? Either way keep up the excellent quality writing, it’s rare to see a nice blog like this one nowadays.

  47. Oh my goodness! Amazing article dude! Thank you so much, However I am experiencing troubles with your RSS. I don’t know why I cannot join it. Is there anyone else having similar RSS problems? Anyone that knows the answer will you kindly respond? Thanks!!

  48. I really like your blog.. very nice colors & theme. Did you make this website yourself or did you hire someone to do it for you? Plz reply as I’m looking to design my own blog and would like to know where u got this from. cheers

  49. Pretty nice post. I just stumbled upon your blog and wanted to say that I’ve really enjoyed browsing your blog posts. In any case I will be subscribing to your feed and I hope you write again very soon!

  50. Hello There. I found your blog using msn. This is a very well written article. I will be sure to bookmark it and return to read more of your useful information. Thanks for the post. I’ll certainly return.

  51. I do not know if it’s just me or if everyone else encountering issues with your website. It appears like some of the written text within your posts are running off the screen. Can somebody else please provide feedback and let me know if this is happening to them too? This may be a issue with my internet browser because I’ve had this happen previously. Thanks

  52. Have you ever thought about including a little bit more than just your articles? I mean, what you say is important and everything. However just imagine if you added some great images or video clips to give your posts more, “pop”! Your content is excellent but with pics and videos, this website could undeniably be one of the best in its niche. Very good blog!

  53. hello there and thank you for your information – I’ve certainly picked up anything new from right here. I did however expertise a few technical points using this website, as I experienced to reload the site lots of times previous to I could get it to load correctly. I had been wondering if your web hosting is OK? Not that I am complaining, but slow loading instances times will sometimes affect your placement in google and could damage your high quality score if advertising and marketing with Adwords. Anyway I’m adding this RSS to my e-mail and can look out for a lot more of your respective exciting content. Make sure you update this again soon.

  54. Hey I am so delighted I found your website, I really found you by accident, while I was browsing on Bing for
    something else, Regardless I am here now and would just like to say many thanks for a marvelous post and
    a all round exciting blog (I also love the theme/design), I don’t have time to go through
    it all at the moment but I have saved it and also
    added your RSS feeds, so when I have time I
    will be back to read more, Please do keep up the great job.

  55. Hi, I keep rereading this because I feel like my answer is here somewhere but I can’t find it. I get a data set from an outside source. The data comes where the initial column is the keyID and there may be duplicates in the keyID; the last row in any duplicate set is always the row to keep (independent of ascending or descending sort of keyID column. There is no data with any time or date attributes so there is no ability to sort within duplicates except I know the last row is always the most current status. So effectively, I want to “Remove Duplicates” and keep last row instead of first. Any thoughts?

    Your above analysis is very helpful for other data sets I get which have sortable data elements.

  56. HI all, great solution, thanks for spreading.
    Am I the only one struggling with the performance? I’m trying in a dataset with 1 million rows more or less and I receive this error: evaluation ran out of memory and can’t continue.
    Any suggestion?

      1. Thanks for replying!
        Do you think the buffering+remove duplicates will work if the source is a DB (in my case is not, the source are .csv files in a sharepoint folder) or do you think the entire action should be done in the DB directly?

        1. If the source is a DB, it will depend if the PQ Query can query fold back to the DB. My guess is no, but I can’t be sure. If the source is a DB, then I suggest getting a DBA to write a query that removes duplicates and use that as a view.

  57. Great solution thx – once i do this performance slows down and it takes a long time to run the query. I have over one million lines of data to process so appreciate it takes time to run, but now its even longer, are there any tips on how to speed this up please

    1. This will never be fast. This Query needs to load all data I to memory first. If there’s not enough memory, it will page it to disk while loading. To make it faster, you would need to load it into a database and process it there. Depending on the data, you may be able to pre-process the 1m records and write to a “history” csv. Then incrementally load the new data and only remove duplicates from the new data. The may or may not work depending on the data

  58. Thanks a lot for the insight and method!
    I am worried that this will severely impact the performance of Power Query as I am filtering 1-2 Million Rows of Data.
    Has anybody tried this a bigger amount of Data?

  59. Matt Allington,
    I applied the table.buffer in my model, but I noticed that for small databases, the solution works well. But when applied to bigdata, especially when applied to dataflow within the power bi portal, I noticed a huge decrease in the application’s performance, even crashing the system. Do we have another solution?

    1. Make sure you are using the Table.Buffer function around your Table.Sort, otherwise the Lazy Evaluation steps in and will chose to ignore the Sorting step.

      1. Mmm, I don’t know why, but I think the DAX/image above is wrong. I believe the table.buffer should be inside the sort function. If you think about it, we need to buffer the table first, then sort it. Please try that and let me know.

      2. I just did a quick crosscheck, and confirm that the technique works.

        The correct M is
        Table.Buffer(Table.Sort(#”Changed Type”,{{“OrderDate”, Order.Descending}}))

  60. Thanks Matt, that was awesome!! Everything works fine just one issue – I have duplicates in Order Date.
    Following the above steps give me the correct output for some and incorrect for other. By incorrect I mean – I get the 2nd last record from the Order Date and not the last one. No clue why, can you please advise?

    Thanks in advance.

  61. Hello,

    And If I need to remove duplicates only per month? For instance, I have the same entry twice in November and Once in October, but I need to remove only the duplicated from November and Keep One for November and the on from October?

    Thanks in Advanced.

  62. This is awesome! It solves a longstanding problem I had with duplicate client entries in table with different discharge dates. I am working with complex patient treatment data, and I need to have unduplicated lookup table for client discharge reasons. Thank-you!!!!

  63. I was just wondering about this myself! I thought there muse be some wrong here and indeed there is.
    I did some experiments with a smaller data set. I found that for those 2 activities, remove duplicates and sort, the the first appearance of a duplicate row before being sorted is always kept. My guess is that When PQ meets those two activities, it always executes remove duplicate first, therefore the first appearance is always kept. Hope you know what I am saying.

    1. Jason Cockington

      Hi Jordan,
      That’s because of the “Lazy Evaluation”.
      That’s why we need to use Table.Buffer() to force Power Query to load and sort the entire data set before removing duplicates.

  64. Thanks. Saved my day. Stuff like this is hard to even know it’s not working right when you’re dealing with 1000’s or records. Only when things have gone too far wrong or it’s too late, we’ll know we need to investigate only to find out that the software decided to be “lazy”.

  65. Hi, I need to find a way to delete not only one of duplicated records but to delete all records which appear to be doubled. We are using this option to review all employees internal movements from NH and termination list. Removing duplicates is always leaving one record behind when i don’t need any of it. When employee moved internally should not be on NH list nor on Termination list. In regular excel file i just mark them with color and delete manually but I want to create query and i can’t seem to find a way to do it automatically. Help please 🙂

    1. Well, the short answer is this. What is the repeatable process you use to manually identify the records you don’t need? You need to be able to extract the logic from that manual process so you can write a query to do it for you. Of course, it depends on the situation, but that is what you need to do.

  66. This has solved a problem for me in seconds that I had been struggling with for days, Many thanks for the solution and explanation of what causes the issue in the first place

  67. I was fortunate enough to find this guide before implementing the solution and possibly getting incorrect results. It boggles my mind that Power Query would work in this “lazy evaluation” mode without some more or less explicit warning about this functionality. I wonder about other cases this can generate possible issues. Maybe anyone has some further insight? Thank you for the article!

  68. Perfect! I was tearing my hair out trying to figure this out. I knew it must have to do with query folding, but didn’t know a way around it. Such a simple and elegant solution! Thanks!

  69. It depends what you are trying to do when you say “processing” and “a long time”. Refresh typically takes 2 – 30 mins depending on column and complexity of transformation. Power BI is a reporting tool that has its own ETL. If the source is not a database, then ETL can be slow.

  70. Hi,
    I have about 1 million records of data in csv file and need to process them and create the reports.
    Is powerbi suitable for this mean?
    when i load data in powerbi and use power query for processing , this takes long time.

  71. Thanks a million Matt! I have a few suppliers sending me their stock on hand reports regularly via Excel. I have been consolidating them into one sheet. To get the latest stock on hand by each supplier, I could use max/maxx measures for visuals/reports in Power BI. But what I really wanted was for Power BI to only load the latest stock on hand data by each supplier. Your solution was exactly what I wanted in the first place. If this post had a title like “only load/filter/keep the latest/last record by customer/product/supplier via Power Query”, I’d have found it much sooner… LOL. Cheers Sean

  72. Thanks so much for this info…. I was wondering how could the sorting/duplicatesRemoval not work properly..
    This post showed me the full picture about the “partial sort”…..

    Really Great.

    Regards
    //Lünkes

  73. Thank you so much for this article…this has been a problem for me for some time…problem solved…I was removing duplicates manually because I was not getting the expected results…You have saved me so much time going forward. Very much appreciated!!!

  74. I thought I had the problem solved, until I recently performed an audit and discovered otherwise. Wrapping the Table.Sort function in the Table.Buffer function worked like a charm! Can’t thank you enough! 🙂

  75. Thank You so much, Mr. Matt Allington First sorted data in descending order then applied to remove duplicate so removed old record data instead of a new one as per modified date. So once again thank you so much.

  76. Hi Matt,

    If Table.Distinct finds more than one row with duplicate values in the specified column list (which just contains column CustomerKey, in this case — Table.Distinct(#”Sorted Rows”, {“CustomerKey”})), I believe it’s free to return any one of those duplicate rows, at its convenience (documentation [https://docs.microsoft.com/en-us/powerquery-m/table-distinct] doesn’t promise that it will return the first or last row out of a sorted set of duplicates). This makes me think that the non-working solution is really working correctly–just not producing the desired output.

    If Table.Distinct happens to respect sort order in some circumstances (like when working with a buffered input), it seems like this would be an internal implementation detail that could change and might not hold true across all platforms (PBI, ADF, etc.).

    Ben

    1. Fair question, but not as Important as getting the correct answer. I suspect performance will be fine except for large data. The data has to be loaded anyway. Even if you use groupby, PQ still has to load all the data before proceeding.

  77. Hi Matt,
    Great post, thanks for sharing.
    I added Conditional Formatting in the Sales_IncorrectSolution.
    1 Added Column: RelCorrectDate = RELATED(Sales_CorrectSolution[OrderDate])
    1 Added Column:EqualYesNo = IF(RELATED(Sales_CorrectSolution[OrderDate])=Sales_IncorrectSolution[OrderDate];1;0)
    Conditional Formatting based on the column EqualYesNo

    Greetings

  78. Nice work Matt, I was working on a DiFOT report just yesterday and did the same thing you mentioned; sort by then remove duplicates. Then your email came out last night, talk about timing. Thank you.

  79. Matt, I add an INDEX column after the sort step. Fixes the issue. Think I read that on Reza’s blog once upon a time

    Like you say, a number of ways to address this. Maybe a sort should automatically trigger an index step? Things like Promote Headers triggers an automatic change type step.

  80. Nithyanandam Raman

    Hi Matt,
    Although there are multiple ways to resolve any problems, thanks for sharing this with an example, I learnt new and unique command(Table.Buffer()) today, which will be more helpful when I working in power query,

  81. Sweet solution. I was not aware of Table.Buffer().

    side note: There is a phrase between brackets before the last table that is not supposed to be there 🙂

    regards

  82. I’m thinking that you should just reference the original table, group by customer key and do a max on the date field, and then use this result to inner join back to the original table on customer key and date. Seems like a straight forward solution. It probably pushes down to the database too. And no need to hack code?

    Thanks,
    Scott

  83. Try this trick 🙂 … Insert “stupid step” between Table.Sort and Table.Distinct… e.g. you can change type OrderDate on the type datetime

    It “freezing” table in sorted state…. and after that you can remove duplicates properly.

    Polish/English version :

    #”Posortowano wiersze” = Table.Sort(#”Zmieniono typ”,{{“OrderDate”, Order.Descending}}),

    #”Zmieniono typ1″ = Table.TransformColumnTypes(#”Posortowano wiersze”,{{“OrderDate”, type datetime}}),

    #”Usunięto duplikaty” = Table.Distinct(#”Zmieniono typ1″, {“CustomerKey”})

  84. Maybe I’m not understanding the full extent of the problem but I found simply grouping on CustomerKey and taking the Max of OrderDate produced the same result.

    let
    Source = Excel.Workbook(File.Contents(“C:\Projects\Consulting\PowerBI\Removing-Duplicates-Sample\5 Tables in Excel.xlsx”), null, true),
    Sales_Sheet = Source{[Item=”Sales”,Kind=”Sheet”]}[Data],
    #”Promoted Headers” = Table.PromoteHeaders(Sales_Sheet, [PromoteAllScalars=true]),
    #”Changed Type” = Table.TransformColumnTypes(#”Promoted Headers”,{{“ProductKey”, Int64.Type}, {“OrderDate”, type date}, {“CustomerKey”, Int64.Type}, {“SalesTerritoryKey”, Int64.Type}, {“SalesOrderNumber”, type text}, {“SalesOrderLineNumber”, Int64.Type}, {“OrderQuantity”, Int64.Type}, {“UnitPrice”, type number}, {“ExtendedAmount”, type number}, {“TotalProductCost”, type number}, {“TaxAmt”, type number}, {“Freight”, type number}, {“RegionMonthID”, type text}}),
    #”Grouped Rows” = Table.Group(#”Changed Type”, {“CustomerKey”}, {{“Last Order Date”, each List.Max([OrderDate]), type date}})
    in
    #”Grouped Rows”

    1. Yes, that will also work. But remember this is a simplified example to demonstrate the technique. What I didn’t say in the post was that in reality the customer need was a bit different and she wanted to keep the entire record with 15 columns. This can still be done with group by but is harder.

Leave a Comment