Remove Duplicates and Keep the Last Record with Power Query

Name: BI Training
Address: AU
Telephone: @
Price range: @

By Matt Allington / February 6, 2020 July 4, 2022

Learn how to remove duplicates keep the last record on power query.

Today I was helping a customer with a problem that seemed quite simple on the surface. She had a data table containing historical customer sales orders (each customer has many orders on different dates). The objective was to filter this table in Power Query and just load one record for each customer – the one that was the last order date. To illustrate the problem more clearly, I have adapted the scenario using the Adventure Works database so you can remove duplicates keep the last record on power query.

Adventure Works Example

The Sales table contains all the historical sales transactions by customer (identified by CustomerKey) and each transaction has an Order Date. The objective is to filter this table in Power Query so as to keep only the last entry for each customer (the last entry is the most recent order date). At the first instance, the solution seems to be simple. In Power Query, you would think that you simply:

Sort the table by Order Date in descending order.
Select the customer key column and then remove duplicates.

But when you do this in Power Query, it does not work as expected. As you can see in the Sales table below, each customer has many transactions with different order dates.

In Power Query, I sorted by OrderDate descending, then removed duplicates as shown below.

But the solution is not correct – the order dates for some of the customers are actually not the last orders. The table on the left below shows the original data sorted by OrderDate for each customer. The table on the right below shows the results from Power Query. If you compare the full data on the left with the Power Query results on the right, you can see that PQ has returned the wrong order date for some customers.

Why Doesn’t it Work?

I can’t say that I have a deep technical understanding of the problem, but I do have a conceptual understanding. When you select “sort column”, it is reasonable to expect that the entire table is sorted before proceeding to the next step. In reality, it is only the data that is loaded in memory that is sorted. The remaining data on disk is not included in the sort. Power Query also uses a concept called “lazy evaluation”. In short this means that if you add a step in the code, and that step is not technically needed to produce the final result, then that step is actually never executed (even though it is there in the instructions) – weird I know, but very efficient.

Table.Buffer to the Rescue

Before I share this solution, let me point out there are other ways to solve the problem, specifically using group by. However, the purpose of this article is to broaden readers understanding of Power Query and introduce the table.buffer function.

I am pretty sure I learnt this tip from Imke Feldman at The BIccountant (or possibly Chris Webb). Both are absolute wizes at this stuff. To solve the problem you will need to get in and make some manual changes to the M code. To do this, first make sure you turn on the formula bar. Go to the View menu and select “formula bar”.

When I click on the step that sorts the table (desc) by OrderDate, the M code was as follows:

To solve the problem, I need to force Power Query to load all the data into memory, forcing the sort to be completed now before proceeding. All I did was to wrap the line of code above inside the Table.Buffer( ) function as shown below.

The rest of the steps remain the same. The Table.Buffer( ) function forces the entire set of data to be loaded into memory after sorting and the hence the next step of removing duplicates works correctly on the entire data set.

The resulting table looks as follows:

These results are now correct as you can see in the table below. The OrderDate (Incorrect Solution) column is the result without using Table.Buffer( ) and the OrderDate (Correct Solution) column is the result of using Table.Buffer( ). You can see several customers have different results. The correct result can be manually validated against the raw data.

Here is the sample workbook and the source data that I used in this blog post.

Removing Duplicates Sample

About The Author

Matt Allington

Matt Allington is the Data Professional you want to be trained by. Matt does a phenomenal job of breaking concepts down into easily digestible chunks. He is also the principal consultant at Excelerator BI Pty Ltd. There are 3 main areas where he can help you save months and even years of self-learning: Kickstart Power BI in your organisation, training and consulting. He also brings his 35-year career expertise in business and data analytics directly to you with his high quality Power BI training courses and consulting. Check out his Public Training and begin your Power BI Ninja journey!

108 thoughts on “Remove Duplicates and Keep the Last Record with Power Query”

Asti
April 13, 2024 at 7:38 pm

Thank you for such an excellent summary of this essay. I enjoyed the one before this one as well.

Reply
Crystal Ouyang
April 2, 2024 at 12:44 pm

Excellent!!!

Reply
Emile Captain
March 16, 2024 at 9:40 am

Hi, I keep rereading this because I feel like my answer is here somewhere but I can’t find it. I get a data set from an outside source. The data comes where the initial column is the keyID and there may be duplicates in the keyID; the last row in any duplicate set is always the row to keep (independent of ascending or descending sort of keyID column. There is no data with any time or date attributes so there is no ability to sort within duplicates except I know the last row is always the most current status. So effectively, I want to “Remove Duplicates” and keep last row instead of first. Any thoughts?

Your above analysis is very helpful for other data sets I get which have sortable data elements.

Reply
1. Matt Allington
  March 16, 2024 at 9:43 am
  
  You could manually add an index column to the raw data inside PQ. this would give you the sort order you need. The rest of my blog should then work
  
  Reply
Patricia
January 23, 2024 at 9:33 pm

Great solution!! It works perfectly!!!
Thank you so much ♥

Reply
Emanuele
September 10, 2023 at 12:11 am

HI all, great solution, thanks for spreading.
Am I the only one struggling with the performance? I’m trying in a dataset with 1 million rows more or less and I receive this error: evaluation ran out of memory and can’t continue.
Any suggestion?

Reply
1. Matt Allington
  September 10, 2023 at 7:06 am
  
  Yes, it’s going to be inefficient. PQ is not a DB. If you can solve this problem elsewhere (Ie in the source if it’s a DB), then that is better.
  
  Reply
  1. Emanuele
    September 11, 2023 at 5:02 pm
    
    Thanks for replying!
    Do you think the buffering+remove duplicates will work if the source is a DB (in my case is not, the source are .csv files in a sharepoint folder) or do you think the entire action should be done in the DB directly?
    
    Reply
    1. Matt Allington
      September 12, 2023 at 1:38 pm
      
      If the source is a DB, it will depend if the PQ Query can query fold back to the DB. My guess is no, but I can’t be sure. If the source is a DB, then I suggest getting a DBA to write a query that removes duplicates and use that as a view.
      
      Reply
Neil
August 5, 2023 at 10:24 pm

Great solution thx – once i do this performance slows down and it takes a long time to run the query. I have over one million lines of data to process so appreciate it takes time to run, but now its even longer, are there any tips on how to speed this up please

Reply
1. Matt Allington
  August 6, 2023 at 7:13 am
  
  This will never be fast. This Query needs to load all data I to memory first. If there’s not enough memory, it will page it to disk while loading. To make it faster, you would need to load it into a database and process it there. Depending on the data, you may be able to pre-process the 1m records and write to a “history” csv. Then incrementally load the new data and only remove duplicates from the new data. The may or may not work depending on the data
  
  Reply
Juan
July 29, 2023 at 1:19 am

I adore both the explanation of the problem and the easiness of the solution. Thx!

Reply
sarvesh
July 19, 2023 at 5:20 pm

Simple yet a brilliant idea ….. your are genius !!!
Thanks it helped me 🙂

Reply
Ernesto
June 27, 2023 at 5:59 am

The solution is simple and works perfectly!

Reply
Jonathan
June 22, 2023 at 5:32 am

Bro, you are a genius. Thanks for such an easy to understand explanation.

Reply
NT
June 2, 2023 at 4:19 am

Brilliant solution been wracking my brain for ages thank you

Reply
DN
March 24, 2023 at 1:42 pm

Thank you! This was very helpful and it works.

Reply
steve
February 28, 2023 at 12:49 am

Thanks so much, I didnt need to go via “group by” afterall, as this works really well. thanks

Reply
Ana
January 26, 2023 at 9:24 pm

THANK YOU!! This has made my day!

Reply
Kam khan
January 18, 2023 at 3:14 pm

Thank you very much Matt Allington. This awesome solution and very clear explanation has hugely helped me tonight … great great relief.

Reply
Simon
January 17, 2023 at 11:05 pm

Thanks a lot for the insight and method!
I am worried that this will severely impact the performance of Power Query as I am filtering 1-2 Million Rows of Data.
Has anybody tried this a bigger amount of Data?

Reply
1. Matt Allington
  January 18, 2023 at 8:18 am
  
  It absolutely may be an issue. If it is, you should move to a DB in the back end.
  
  Reply
Nicky
January 13, 2023 at 4:29 pm

Brilliant! Really appreciate the explanation and the solution 🙂

Reply
Fernando Gabos
December 9, 2022 at 12:45 am

Matt Allington,
I applied the table.buffer in my model, but I noticed that for small databases, the solution works well. But when applied to bigdata, especially when applied to dataflow within the power bi portal, I noticed a huge decrease in the application’s performance, even crashing the system. Do we have another solution?

Reply
1. Matt Allington
  December 9, 2022 at 5:27 am
  
  Yes, the solution is to do it in the source database.
  
  Reply
Vivek Shrivastava
November 28, 2022 at 6:26 pm

It worked for me today. Thanks a ton Matt for this wonderful article.

Reply
Git
September 9, 2022 at 9:11 pm

This does not work

Reply
1. Jason Cockington
  September 9, 2022 at 10:17 pm
  
  Make sure you are using the Table.Buffer function around your Table.Sort, otherwise the Lazy Evaluation steps in and will chose to ignore the Sorting step.
  
  Reply
2. Joey
  September 22, 2022 at 6:48 am
  
  Same, did not work. Went through options and cleared my cache, restarted Power BI desktop.
  
  Reply
  1. Matt Allington
    September 22, 2022 at 7:18 am
    
    Mmm, I don’t know why, but I think the DAX/image above is wrong. I believe the table.buffer should be inside the sort function. If you think about it, we need to buffer the table first, then sort it. Please try that and let me know.
    
    Reply
  2. Jason Cockington
    September 27, 2022 at 9:58 am
    
    I just did a quick crosscheck, and confirm that the technique works.
    
    The correct M is
    Table.Buffer(Table.Sort(#”Changed Type”,{{“OrderDate”, Order.Descending}}))
    
    Reply
Saba
September 3, 2022 at 2:08 am

Thanks Matt, that was awesome!! Everything works fine just one issue – I have duplicates in Order Date.
Following the above steps give me the correct output for some and incorrect for other. By incorrect I mean – I get the 2nd last record from the Order Date and not the last one. No clue why, can you please advise?

Thanks in advance.

Reply
1. Matt Allington
  September 3, 2022 at 7:11 am
  
  It’s hard to say. You could use groupby instead. Do a groupby on the columns you have and then max on order date
  
  Reply
Matias Zayas
September 1, 2022 at 5:49 pm

Thanks Matt! You helped me today

Reply
nicheplayer
August 25, 2022 at 6:18 am

This helped me today. Thanks, Matt!

Reply
Diego
June 22, 2022 at 12:10 am

Thank you so much. Very useful and educational!

Reply
isaac
March 10, 2022 at 2:27 pm

Thank you a lot!!!!! It really works!!!!!!!

Reply
Adrian Smethurst
December 31, 2021 at 3:31 am

Great Post – Well explained!
I’ve been having issues with this for a while, now added to my bookmarks!

Reply
Scott Davies
December 23, 2021 at 5:41 pm

Thank you Matt, learned a neat new trick today!

Reply
Jose Munoz
December 4, 2021 at 11:35 am

Hello,

And If I need to remove duplicates only per month? For instance, I have the same entry twice in November and Once in October, but I need to remove only the duplicated from November and Keep One for November and the on from October?

Thanks in Advanced.

Reply
1. Matt Allington
  December 6, 2021 at 8:41 am
  
  I suggest you try to “Group By” month, and then aggregate on the item you need. I guess there is a “first” option. You would have to check.
  
  Reply
Jess
November 1, 2021 at 5:03 am

Thank you so so much for this.

Reply
Rodrigo
September 22, 2021 at 9:30 pm

Hey guys, I would like to say thank you from Brazil. I was suffering to solve this issue. Congratulations for the work here!

Reply
Judith Calvo
September 22, 2021 at 4:22 am

This is awesome! It solves a longstanding problem I had with duplicate client entries in table with different discharge dates. I am working with complex patient treatment data, and I need to have unduplicated lookup table for client discharge reasons. Thank-you!!!!

Reply
jordan
August 7, 2021 at 12:47 pm

I was just wondering about this myself! I thought there muse be some wrong here and indeed there is.
I did some experiments with a smaller data set. I found that for those 2 activities, remove duplicates and sort, the the first appearance of a duplicate row before being sorted is always kept. My guess is that When PQ meets those two activities, it always executes remove duplicate first, therefore the first appearance is always kept. Hope you know what I am saying.

Reply
1. Jason Cockington
  August 9, 2021 at 8:29 am
  
  Hi Jordan,
  That’s because of the “Lazy Evaluation”.
  That’s why we need to use Table.Buffer() to force Power Query to load and sort the entire data set before removing duplicates.
  
  Reply
Kris Kite
July 24, 2021 at 12:13 am

This is exactly what I need this morning! Thank you!

Reply
Sreenath
July 4, 2021 at 2:01 am

Thanks. Saved my day. Stuff like this is hard to even know it’s not working right when you’re dealing with 1000’s or records. Only when things have gone too far wrong or it’s too late, we’ll know we need to investigate only to find out that the software decided to be “lazy”.

Reply
May
July 1, 2021 at 2:57 pm

Thank you so much Sir! Your post solved my problem in a flash!!!

Reply
Shih-Chi Wang
June 30, 2021 at 1:26 pm

The request seems very simple, but I can’t solve it until reading this page. Thank you.

Reply
Aubrey Robertson
June 25, 2021 at 5:30 am

Thank you so much for this!

Reply
Gosia
June 23, 2021 at 7:13 pm

Hi, I need to find a way to delete not only one of duplicated records but to delete all records which appear to be doubled. We are using this option to review all employees internal movements from NH and termination list. Removing duplicates is always leaving one record behind when i don’t need any of it. When employee moved internally should not be on NH list nor on Termination list. In regular excel file i just mark them with color and delete manually but I want to create query and i can’t seem to find a way to do it automatically. Help please 🙂

Reply
1. Matt Allington
  June 24, 2021 at 9:01 am
  
  Well, the short answer is this. What is the repeatable process you use to manually identify the records you don’t need? You need to be able to extract the logic from that manual process so you can write a query to do it for you. Of course, it depends on the situation, but that is what you need to do.
  
  Reply
2. Dan
  June 29, 2021 at 5:24 am
  
  Group by employee name, accepting the default parameters, then filter Count to 1.
  
  Reply
Kenneth Tait
June 23, 2021 at 11:01 am

This has solved a problem for me in seconds that I had been struggling with for days, Many thanks for the solution and explanation of what causes the issue in the first place

Reply
Onkar S
June 17, 2021 at 11:16 am

Thanks a lot for this article! It really helped quickly solve the issue i was facing.

Reply
Tomasz Sibilski
June 7, 2021 at 10:31 pm

It worked. OMG. How to know what we don’t know? Thanks anyway!

Reply
Rahul Abhale
June 6, 2021 at 12:00 am

The above solution has really worked with the query of similar type with my report. Thanks

Reply
Jiri Stros
May 24, 2021 at 4:56 pm

I was fortunate enough to find this guide before implementing the solution and possibly getting incorrect results. It boggles my mind that Power Query would work in this “lazy evaluation” mode without some more or less explicit warning about this functionality. I wonder about other cases this can generate possible issues. Maybe anyone has some further insight? Thank you for the article!

Reply
Regina
May 13, 2021 at 8:08 am

The solution works perfectly. The use of Table.Buffer saved my day!
Thank you so much.

Reply
Mark
April 20, 2021 at 10:11 pm

Perfect! I was tearing my hair out trying to figure this out. I knew it must have to do with query folding, but didn’t know a way around it. Such a simple and elegant solution! Thanks!

Reply
Sergio
April 17, 2021 at 11:58 am

BRILLANT! Thank you!!!! 1000 times

Reply
Nirajah
March 18, 2021 at 10:05 pm

Thank you so much for this article. This is the exact method o need to apply to my data set to obtain the unique results.

Reply
Javier
February 23, 2021 at 1:55 am

Thanks! It was driving me crazy. The solution you suggested works perfect!

Reply
A Henriquez
February 21, 2021 at 1:04 pm

This was really helpful. Thank you

Reply
Matt Allington
February 21, 2021 at 7:14 am

It depends what you are trying to do when you say “processing” and “a long time”. Refresh typically takes 2 – 30 mins depending on column and complexity of transformation. Power BI is a reporting tool that has its own ETL. If the source is not a database, then ETL can be slow.

Reply
Sima
February 20, 2021 at 11:42 pm

Hi,
I have about 1 million records of data in csv file and need to process them and create the reports.
Is powerbi suitable for this mean?
when i load data in powerbi and use power query for processing , this takes long time.

Reply
Jesson
February 16, 2021 at 11:41 pm

Oh this page is really helpful. You saved me from a disaster. Thank you so much.

Reply
Todd
February 16, 2021 at 1:12 am

Thank you Matt. This worked perfect.

Reply
fernando
February 8, 2021 at 11:46 pm

This was very helpfull ! thank you so much for helping 😉

Reply
Chris
February 4, 2021 at 2:00 pm

Saved my report! Thanks!

Reply
Alex
January 5, 2021 at 5:46 am

Excellent! Thanks Alot

Reply
Curtis
December 23, 2020 at 8:46 am

This has not worked for me using google sheets as a source.

Reply
asdf
December 10, 2020 at 7:49 am

Genius thank you!

Reply
Sean
November 11, 2020 at 12:21 pm

Thanks a million Matt! I have a few suppliers sending me their stock on hand reports regularly via Excel. I have been consolidating them into one sheet. To get the latest stock on hand by each supplier, I could use max/maxx measures for visuals/reports in Power BI. But what I really wanted was for Power BI to only load the latest stock on hand data by each supplier. Your solution was exactly what I wanted in the first place. If this post had a title like “only load/filter/keep the latest/last record by customer/product/supplier via Power Query”, I’d have found it much sooner… LOL. Cheers Sean

Reply
Lünkes
November 4, 2020 at 3:49 am

Thanks so much for this info…. I was wondering how could the sorting/duplicatesRemoval not work properly..
This post showed me the full picture about the “partial sort”…..

Really Great.

Regards
//Lünkes

Reply
Emily S Johnson
October 30, 2020 at 8:33 am

Amazing! I’ve spent so much time fighting with deduplication issues. This has brought my skills up another notch!

Reply
Johan
October 5, 2020 at 6:39 pm

This was really great. Thanks for the good explanation .

Reply
Javier
September 27, 2020 at 4:17 am

You are amazing, no one had this answer in spanish, thank you so much!

Reply
Kate
August 21, 2020 at 10:25 am

I had this exact problem and you just saved me hours and hours of frustration. Thank you!

Reply
Dan Kordelski
August 4, 2020 at 8:11 am

Thank you so much for this article…this has been a problem for me for some time…problem solved…I was removing duplicates manually because I was not getting the expected results…You have saved me so much time going forward. Very much appreciated!!!

Reply
Pouriya
July 16, 2020 at 9:25 pm

Best case for Table. Buffer. Thank you so much.

Reply
Kim M.
June 26, 2020 at 1:09 am

I thought I had the problem solved, until I recently performed an audit and discovered otherwise. Wrapping the Table.Sort function in the Table.Buffer function worked like a charm! Can’t thank you enough! 🙂

Reply
Jignesh
June 8, 2020 at 8:12 pm

Thank You so much, Mr. Matt Allington First sorted data in descending order then applied to remove duplicate so removed old record data instead of a new one as per modified date. So once again thank you so much.

Reply
Tsvetelina Lalova
June 5, 2020 at 11:14 pm

Thank you so much! This article is so helpful to me. I have been working so hard months and this solution helps me to finish my work.

Reply
Gary Parmar
May 13, 2020 at 5:57 am

This is an excellent solution.
Thanks

Reply
Edson
May 4, 2020 at 10:55 pm

Thank you very much!!! Brilliant your explanation it saved my life. Good to know that even the computer is lazy sometimes.

Reply
Michiel Soede
March 16, 2020 at 1:47 am

Matt,

Colin Bandfield from Msft came to the rescue some time ago with a very elegant solution:

https://social.technet.microsoft.com/Forums/en-US/0cc2a3fe-38ec-4e0b-9dcc-abb9b5cd9f9b/group-data-and-sort-rows-inside-group?forum=powerquery

regards,

Reply
Neil Mathews
February 15, 2020 at 12:05 am

I’ve also found that adding an index column between the sort and remove duplicates steps also produces the correct result.

Reply
1. Nilo
  May 5, 2020 at 6:49 am
  
  You are the best
  
  Reply
Hoss Satour
February 12, 2020 at 9:03 pm

Hi Matt,
Thank you for this article and for this trick.
In this example, if we have more than 1 article bought in the same date, I think it’s better to use the “Group By” method.

Hoss

Reply
Ben Gribaudo
February 12, 2020 at 8:54 am

Hi Matt,

If Table.Distinct finds more than one row with duplicate values in the specified column list (which just contains column CustomerKey, in this case — Table.Distinct(#”Sorted Rows”, {“CustomerKey”})), I believe it’s free to return any one of those duplicate rows, at its convenience (documentation [https://docs.microsoft.com/en-us/powerquery-m/table-distinct] doesn’t promise that it will return the first or last row out of a sorted set of duplicates). This makes me think that the non-working solution is really working correctly–just not producing the desired output.

If Table.Distinct happens to respect sort order in some circumstances (like when working with a buffered input), it seems like this would be an internal implementation detail that could change and might not hold true across all platforms (PBI, ADF, etc.).

Ben

Reply
1. Matt Allington
  February 12, 2020 at 9:04 am
  
  Yes, I wasn’t trying to imply it wasn’t working correctly – just that it doesn’t do what you may expect
  
  Reply
John Thomas
February 9, 2020 at 4:58 am

Nice trick. One that I have used often especially with big data. What’s the hit to performance however?

Reply
1. Matt Allington
  February 9, 2020 at 7:41 am
  
  Fair question, but not as Important as getting the correct answer. I suspect performance will be fine except for large data. The data has to be loaded anyway. Even if you use groupby, PQ still has to load all the data before proceeding.
  
  Reply
Wim Lucassen
February 9, 2020 at 1:21 am

Hi Matt,
Great post, thanks for sharing.
I added Conditional Formatting in the Sales_IncorrectSolution.
1 Added Column: RelCorrectDate = RELATED(Sales_CorrectSolution[OrderDate])
1 Added Column:EqualYesNo = IF(RELATED(Sales_CorrectSolution[OrderDate])=Sales_IncorrectSolution[OrderDate];1;0)
Conditional Formatting based on the column EqualYesNo

Greetings

Reply
mbosnjak
February 8, 2020 at 12:27 pm

Nice work Matt, I was working on a DiFOT report just yesterday and did the same thing you mentioned; sort by then remove duplicates. Then your email came out last night, talk about timing. Thank you.

Reply
Wyn
February 8, 2020 at 10:51 am

Matt, I add an INDEX column after the sort step. Fixes the issue. Think I read that on Reza’s blog once upon a time

Like you say, a number of ways to address this. Maybe a sort should automatically trigger an index step? Things like Promote Headers triggers an automatic change type step.

Reply
Nithyanandam Raman
February 8, 2020 at 9:52 am

Hi Matt,
Although there are multiple ways to resolve any problems, thanks for sharing this with an example, I learnt new and unique command(Table.Buffer()) today, which will be more helpful when I working in power query,

Reply
Mark
February 8, 2020 at 8:31 am

Thanks for sharing this – it’s useful to learn the quirks and potential solutions. You never know when that information might come in handy.

Reply
Bruno
February 8, 2020 at 3:05 am

Sweet solution. I was not aware of Table.Buffer().

side note: There is a phrase between brackets before the last table that is not supposed to be there 🙂

regards

Reply
1. Matt Allington
  February 8, 2020 at 6:21 am
  
  Thanks x 2 🙂
  
  Reply
2. Matt Allington
  February 8, 2020 at 8:46 am
  
  That’s it. That is the point of this post
  
  Reply
Scott
February 8, 2020 at 2:53 am

I’m thinking that you should just reference the original table, group by customer key and do a max on the date field, and then use this result to inner join back to the original table on customer key and date. Seems like a straight forward solution. It probably pushes down to the database too. And no need to hack code?

Thanks,
Scott

Reply
1. Matt Allington
  February 8, 2020 at 6:23 am
  
  Yes, that is a valid solution. However I like to share what I learn about these tools to expand readers understanding and capabilities.
  
  Reply
Jerzy D.
February 7, 2020 at 8:42 pm

Try this trick 🙂 … Insert “stupid step” between Table.Sort and Table.Distinct… e.g. you can change type OrderDate on the type datetime

It “freezing” table in sorted state…. and after that you can remove duplicates properly.

Polish/English version :
…
#”Posortowano wiersze” = Table.Sort(#”Zmieniono typ”,{{“OrderDate”, Order.Descending}}),

#”Zmieniono typ1″ = Table.TransformColumnTypes(#”Posortowano wiersze”,{{“OrderDate”, type datetime}}),

#”Usunięto duplikaty” = Table.Distinct(#”Zmieniono typ1″, {“CustomerKey”})
…

Reply
Brian
February 7, 2020 at 6:37 pm

Maybe I’m not understanding the full extent of the problem but I found simply grouping on CustomerKey and taking the Max of OrderDate produced the same result.

let
Source = Excel.Workbook(File.Contents(“C:\Projects\Consulting\PowerBI\Removing-Duplicates-Sample\5 Tables in Excel.xlsx”), null, true),
Sales_Sheet = Source{[Item=”Sales”,Kind=”Sheet”]}[Data],
#”Promoted Headers” = Table.PromoteHeaders(Sales_Sheet, [PromoteAllScalars=true]),
#”Changed Type” = Table.TransformColumnTypes(#”Promoted Headers”,{{“ProductKey”, Int64.Type}, {“OrderDate”, type date}, {“CustomerKey”, Int64.Type}, {“SalesTerritoryKey”, Int64.Type}, {“SalesOrderNumber”, type text}, {“SalesOrderLineNumber”, Int64.Type}, {“OrderQuantity”, Int64.Type}, {“UnitPrice”, type number}, {“ExtendedAmount”, type number}, {“TotalProductCost”, type number}, {“TaxAmt”, type number}, {“Freight”, type number}, {“RegionMonthID”, type text}}),
#”Grouped Rows” = Table.Group(#”Changed Type”, {“CustomerKey”}, {{“Last Order Date”, each List.Max([OrderDate]), type date}})
in
#”Grouped Rows”

Reply
1. Matt Allington
  February 7, 2020 at 6:56 pm
  
  Yes, that will also work. But remember this is a simplified example to demonstrate the technique. What I didn’t say in the post was that in reality the customer need was a bit different and she wanted to keep the entire record with 15 columns. This can still be done with group by but is harder.
  
  Reply