Download Source Code: OfficeProperties.zip - 61.15KB
| Multiple ways to enumerate summary/built-in and custom Office document properties. Easy-to-use from VB6 sample, followed by similar access from managed C# code, and finally a much better way, using the DSOFile COM component provided by Microsoft's support developers. | ||
Summary and Custom Properties
In Extract Genuine File Information, we presented the way you can open, from C# code, the File Properties dialog, that you frequently meet in many Windows applications. One of these is Windows Explorer: just select a file, right-click and choose the Properties menu command. The C# code sequence is calling ShellExecuteEx Win32 functions, through PInvoke:
/// <summary>
/// Opens the File Property dialog, provided by the system
/// </summary>
/// <param name="filename">Full file path</param>
public static void ShowFileProperties(string filename)
{
SHELLEXECUTEINFO info = new SHELLEXECUTEINFO();
info.cbSize = Marshal.SizeOf(info);
info.lpVerb = "properties";
info.fMask = (int)SEE_MASK.INVOKEIDLIST;
info.nShow = (int)Win32.SW_SHOWNORMAL;
info.lpFile = filename;
Win32.ShellExecuteEx(info);
}First "General" tab of this dialog is showing genuine or system-level defined properties, which values are usually stored or defined by the operating system. Depending on the type of file you have, you may get other tabs, with more specialized information. Two other types of tabs you will get for Microsoft Office documents, and potentially other kinds of documents, are "Summary" and "Custom".
Summary (or Built-In) Properties are pre-defined properties for all Office documents. They include document's Title, Subject, Category, Author and more. While they are standard, the application may provide a class with specialized get/set properties for each.
Custom Properties are user-defined properties, with any kind of name and value type. The File Properties tab may suggest some typical names and help you fill-in values. While Custom Properties are all user-defined, and can be created or removed as you wish, Summary Properties are always the same, you cannot define new properties or delete them. Values left blank behave like undefined properties.
Summary and Custom property support has not been created for Office applications only. In fact, any application can access and store such properties in its own documents, through the IPropertyStorage COM interface. The property storage support is implemented at the document format level, not at the application level. But, as for IStorage COM interface - see our article on the CHM Help File Extractor - you'll never need to know the actual storage format, as long as you will look at your document only through COM interfaces, not at the bytes level. From this perspective, your document appears rather as a component, not as a static storage repository. This is also why this article is in the category of Executables as well.
One method to extract Office document properties is through their Office applications. Most Office DLLs are automation components, so you can instantiate them as inproc servers. Unfortunately, it takes a long time to load each Word, Excel or PowerPoint application, at least the first time, just to get access to some properties from the saved document. It is indeed another way to do it faster, through a component implemented by the support guys from Microsoft, and it will be presented at the end.
From Visual Basic 6

We'll start with the typical way to get to these properties from Visual Basic. We should remember that Microsoft Office has been focusing a lot, at some point, in incorporating full automation support for Visual Basic. Visual Basic for Application (or VBA) was for a long time the language of choice for building custom applications from the Office application themselves, using their Integrated Development Environments (IDEs). Visual Basic was not a strong-typing language and anything was able to be passed as a Variant, including COM interface pointers. This is possibly why most Office components do not care too much about strong-typing and expose many types as generic objects. From within a strong typing language, like C++, it was customary anyway to call QueryInterface and eventually discover at run-time the kind of COM object. But for VB, it was a breeze.
Let's see first how accessing Office properties would look in the old VB6. We create a very simple Standard EXE project, with a .bas module and Sub Main startup method. From which we call a dedicated method for each type of Office document: Word, Excel and PowerPoint. We will limit in our demo to these three kind of applications from the whole suite, but the procedure should be more or less generic for other types of Office apps, such as Microsoft Access, Outlook, FrontPage, Publisher, InfoPath, Visio or Project.
Each Office document needs its own application started first. We are able to start Word, Excel and PowerPoint in invisible or hidden mode, and read-only - because we'll simply read document properties, without changing, deleting or creating new ones.
An Office document loading consists in adding the document to the main collection of the started application: either Documents, Workbooks or Presentations. Look at how clean and short the Open calls are. This is because, in VB, we can skip optional parameters, that already have a default value. And, as we said before, we don't need type conversions, everything can be treated as a Variant.
After we show the properties, we'll cleanup, in a very similar way: we close the document, leave the application and release the object pointers. Here is the first part of our VB6 code, clean and simple:
Sub Main()
Dim crtDir As String
crtDir = CurDir$
ChDir ".."
ShowWordProperties CurDir$ & "\test.doc"
ShowExcelProperties CurDir$ & "\test.xls"
ShowPowerPointProperties CurDir$ & "\test.ppt"
ChDir crtDir
End Sub
Sub ShowWordProperties(filename As String)
' Launch Word (in hidden mode) and open the document
Dim app As New Word.Application, doc As Word.Document
Set doc = app.Documents.Open(filename, , True)
WriteLine ">>> " & filename
' Show all builtin and custom document properties
ShowBuiltinProperties doc
ShowCustomProperties doc
' Cleanup
doc.Close
app.Quit
Set doc = Nothing
Set app = Nothing
End Sub
Sub ShowExcelProperties(filename As String)
' Launch Excel (in hidden mode) and open the document
Dim app As New Excel.Application, doc As Excel.Workbook
Set doc = app.Workbooks.Open(filename, , True)
WriteLine ">>> " & filename
' Show all builtin and custom document properties
ShowBuiltinProperties doc
ShowCustomProperties doc
' Cleanup
doc.Close
app.Quit
Set doc = Nothing
Set app = Nothing
End Sub
Sub ShowPowerPointProperties(filename As String)
' Launch PowerPoint (in hidden mode) and open the document
Dim app As New PowerPoint.Application
Dim doc As PowerPoint.Presentation
Set doc = app.Presentations.Open( _
filename, msoTrue, msoTrue, msoFalse)
WriteLine ">>> " & filename
' Show all builtin and custom document properties
ShowBuiltinProperties doc
ShowCustomProperties doc
' Cleanup
doc.Close
app.Quit
Set doc = Nothing
Set app = Nothing
End SubWhat is great now, because of the loosely-typing of VB6, we can pass whatever kind of document we have, as an Object, to the same generic ShowProperties method. So one single method implementation for any kind of document. Because any Office document has a BuiltInDocumentProperties collection (for Summary properties) and a CustomDocumentProperties collection.
They are not the same! In fact, internally, they return different classes, from different components. But in VB6 we simply don't care, as long as both collections are enumerable and expose similar members. And all we need is the property Name and its Value. We may skip some properties for which the Value read throws and error:
' Enumerate builtin properties of any kind of Office document
Sub ShowBuiltinProperties(doc As Object)
WriteLine "Builtin Properties:"
For Each prop In doc.BuiltInDocumentProperties
On Error Resume Next
WriteLine " " & prop.Name & " = " & prop.Value
Next
End Sub
' Enumerate custom properties of any kind of Office document
Sub ShowCustomProperties(doc As Object)
WriteLine "Custom Properties:"
For Each prop In doc.CustomDocumentProperties
WriteLine " " & prop.Name & " = " & prop.Value
Next
End Sub
Sub WriteLine(text As String)
Debug.Print text
End SubWe naturally expect the same kind of simplicity and uniformity from .NET. As we'll see, it's not the case. Many blamed Microsoft when they made VB.NET a strong-typing language. The case presented before could be one of the reasons why.
To access Office documents from VB6 or VBA was nice and easy. In your projects, do not forget to set Project References to the Office object libraries you need. Depending on the kind of documents to read, these could be:
- Microsoft Word 10.0 Object Library
- Microsoft Excel 10.0 Object Library
- Microsoft PowerPoint 10.0 Object Library
The version number can be different. It depends on what you have installed as Microsoft Office, Interop Office Libraries or other Office components lately. For other Office applications - Access, FrontPage etc - you need to find and reference similar libraries.
This is a partial result from the output for our test Office document, from the Immediate window, if you download and execute the project OfficePropertiesVB6:
>>> ...\OfficeProperties\OfficePropertiesVB6\test.doc
BUILTIN PROPERTIES:
Title = This is the title of this document
Subject = This is the subject of this document
Comments = Some comments here
Yep, this is it!
Template = Normal.dot
CUSTOM PROPERTIES:
Client = Client's Name
Projet = This is the project name
Vérifié par = Checked by me
Terminée le = 1/1/2007
>>> ...\OfficeProperties\OfficePropertiesVB6\test.xls
BUILTIN PROPERTIES:
Title = This is some title of this Excel document
Subject = and this is the subject
Application name = Microsoft Excel
Creation date = 4/2/2007 7:30:34 PM
CUSTOM PROPERTIES:
Customer = for the WWW
Phone = 555-5555
>>> ...\OfficeProperties\OfficePropertiesVB6\test.ppt
BUILTIN PROPERTIES:
Title = Microsoft PowerPoint test file
Subject = no subject, please
Number of slides = 1
Number of notes = 0
CUSTOM PROPERTIES:
From = Me
To = You
July 06, 2007 at 02:49 PM
Well done! :) It looks like you covered most possible ways to deal with this. I didn't know about WdBuiltInProperty; it looks like something I could use.
About DSOFile... I really didn't move to .NET to have to deal with registration of COM components. Why oh why the MS guys do not provide alternative .NET assemblies?!
Best Wishes,
Michael