Extract Word Metadata with Excel VBA - Excelerator BI

Extract Word Metadata with Excel VBA

I had read an interesting request on the Whirlpool forum over the weekend. Whirlpool user Alicia2 wanted to extract Word document metadata into a Spreadsheet. That seemed like an interesting problem and something I could use some time. So I decided to help her out.

I built this Spreadsheet that you put into a folder along with the Word documents (DOCX format). The Spreadsheet then uses Power Query to suck in the file names in the folder (manual refresh required). Finally you click the Extract Metadata button, and the Spreadsheet does the rest.

Word Metadata Extractor

Here is the extract word metadata Spreadsheet.

And here is a video that shows you how it works.

8 thoughts on “Extract Word Metadata with Excel VBA”

  1. I was able to extract the metadata but not from field mappings and custom document properties. How can I add additional types of metadata to extract?

  2. Hi Matt,
    This has provided exactly what I needed in a superb and rapid way! I don’t have Power Query to generate the path and document name, but your excellent layout meant that I could get the path & name into the table using an alternate approach.
    I ran the macro and got all the words / lines / characters / paragraphs metadata I was after.
    I am using it to quickly measure the changes to multiple versions of a controlled document. The data shows when people are adding more paragraphs or even just individual characters. Perfect!!
    Thanks so much for putting this up on the web, I will be coming back to this site for sure.
    Mike

  3. Any chance you know how to do this but with PDF files instead of docs files. I have tried everything, but it seems to be so difficult. Thanks in advance

  4. Hi Matt,

    I’ve just downloaded and played with this and it looks really useful. Two questions:
    1. The trials I’ve done seem to indicate that the files are opened and the date/time stamp is updated – is this correct or maybe I did something wrong in my initial trial?
    2. I’d also like to open other Office files, at the same time as Word files (so there might be a mix of Excel, PowerPoint, Word, etc. files in the same directory). I tried but it didn’t seem to like the .xlsx extension so I’m wondering whether this is feasible.

    FYI, I’m interested in scanning files submitted by students for instances of file sharing.

    Thanks for any feedback!

    Anthony

    1. Hi Anthony, yes this tool specifically designed to work with Word and it is not a simple as just ‘adding’ PPT etc. Each file format handled will need to be written manually from scratch. And yes it opens every file. Actually what happens is that Excel starts an instance of Word in the background and then opens each file, extracts the data and then closes the file.

      Unfortunately I don’t have time to look at enhancing this at the moment.

  5. This is a really good point Rainier. I remember years ago that Microsoft was under fire because it did not provide an easy way to remove the metadata from docs. It is quite easy to modify this VBA to clean the metadata from the document. I may do that some time soon. If/When I do, I will edit the main post above

  6. Hi Matt

    This could have great potential in assess the risk an organisation has with data leakage security risk. Often people do not understand what information is stored in the meta data. This would be a great way to audit it. Corporate forensic teams could really use this.

    Cheers
    Rainier

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top