Understanding Strings In COM[2]

[入库:2005年8月18日] [更新:2007年3月24日]

本文简介:选择自 edyang 的 blog

the main string data type in com is named olechar, which is the kind of variable expected by almost all com library functions and well-educated interfaces' methods. an olechar represents a single ole-compatible character, therefore you can speak of a string only when you have an array of olechars. it is obvious to everyone who has utilized c++ for some time that there is not an olechar built-in data type in the language, as underlined (among other things) by the upper case of the name. the c and c++ standard specifications dictate the existence of only two character types: char and wchar_t. hence, olechar must be an alias to one of them, and in fact it is. its relation is established by the standard win32 header file wtypes.h, which we will meet again later in this article. the following code snippet, adapted from the header file for clarity, represents the official definition of olechar in c/c++:

#if defined(_win32) && !defined(ole2ansi)
typedef wchar olechar;
#else
typedef char olechar;
#endif

the same file defines also the lpolestr and lpcolestr types:

#if defined(_win32) && !defined(ole2ansi)
typedef olechar __rpc_far *lpolestr;
typedef const olechar __rpc_far *lpcolestr;
#else
typedef lpstr     lpolestr;
typedef lpcstr    lpcolestr;
#endif

as aliases of olechar* and const olechar* in win32, but aliases of lpstr and lpcstr in windows 3.1x. the __rpc_far symbol can be ignored as it expands to nothing, so for all practical purposes bstr and olechar* can be deployed interchangeably.

as you can see, the bstr type does not map to the same actual built-in type on every platform. if the code is compiled on 32-bit windows, which can be detected from the _win32 preprocessor symbol definition, all com characters are unicode string (wchar is itself a typedef'ed data type that translates to the built-in wchar_t type). if not, then the build command is probably targeting windows 3.1x, which does not support unicode strings at all, so all the strings are regular old arrays of char. note that on sun solaris, the main unix flavor to benefit from a porting of the (d)com implementation to date, olechars are 16-bit unicode characters exactly as on win32.

the original microsoft engineers who designed com made a pretty courageous decision: they de facto imposed unicode to everyone in the 32-bit world at a time when the original version of windows nt was barely taking shape and the doubled amount of ram required to hold the same strings could easily become problematic due to the high cost of memory. but the decision proved advantageous, as it saved com developers from having to implement two variants of each interface (and relative coclasses implementing it) just to deal with every possible type of client.

now we have seen how to define a com-compliant character and by extension a com-compliant string, but we have not revealed yet how one can initialize such a string with a string literal. the following statement:

const olechar* pcomstr;
pcomstr = "i love vcdj and com";

does work in windows 3.1x because only ansi strings exist there, but will fail to compile on win32 and solaris because we are trying to copy an ansi string to a unicode array of characters. the following form:

const olechar* pcomstr;
pcomstr = l"i love vcdj and com";

will give the exact opposite results: working on win32, incorrect on windows 3.1. what we really need is a way to define the type of a string irrespective of the platform. nothing could fit the bill better than a macro, as in the code below:

const olechar* pcomstr;
pcomstr = olestr("i love vcdj and com");

the olestr() macro is translated differently depending on the target of the build process, so we obtain the correct definition in all cases. wtypes.h reports it as follows, with some secondary adjustments made to clarify the original code:

#if defined(_win32) && !defined(ole2ansi)
#define olestr(str) l##str
#else
#define olestr(str) str
#endif

 

note: in all other win32 api implementations there is a discrepancy between windows 95 / windows 98 and windows nt's string treatment, since the former employs one-byte ansi characters and the latter internally works only with two-byte unicode characters. however, when it comes to com, both operating systems agree on the use of unicode strings.

at this point you may be curious as to why the data type was called olechar rather than the more obvious comchar. the answer to this question has its roots partly in history and partly in marketing: until a few years ago ole2, the main family of technologies relying on the com foundation, was deemed more important than com itself, hence the acronym ole spread everywhere. the later change of marketing orientation could not be reflected in the symbol names to avoid breaking a lot of existing and correctly functioning com/ole code. (see my q&a column in vcdj print and online for extensive info on this sometimes unclear transition of terms and intents.)

olechars are the standard way to create strings in com code and by far the most comfortable as long as c and c++ are used in both the client side and the server side. other languages and tools bring their burden of special constraints that open the way to another kind of string, which constitute the topic of the next paragraph.

continued...

copyright © 1999 - visual c++ developers journal

本文关键:Understanding Strings In COM
 

本站最佳浏览方式为 分辨率 1024x768 IE 6.0(或更高版本的 IE浏览器)

go top