- Object Tracking
- Class Information
- Helper Support
- Archive Portability
- Binary Archives
- XML Archives
- Exporting Class Serialization
- Static Libraries and Serialization
- DLLS - Serialization and Runtime Linking
- Archive Exceptions
- Exception Safety
Depending on how the class is used and other factors, serialized objects may be tracked by memory address. This prevents the same object from being written to or read from an archive multiple times. These stored addresses can also be used to delete objects created during a loading process that has been interrupted by throwing of an exception.
This could cause problems in progams where the copies of different objects are saved from the same address.
In this case, the data to be saved exists on the stack. Each iteration of the loop updates the value on the stack. So although the data changes each iteration, the address of the data doesn't. If a[i] is an array of objects being tracked by memory address, the library will skip storing objects after the first as it will be assumed that objects at the same address are really the same object.
To help detect such cases, output archive operators expect to be passed reference arguments.
Given this, the above code will invoke a compile time assertion. The obvious fix in this example is to use
which will compile and run without problem. The usage of by the output archive operators will ensure that the process of serialization doesn't change the state of the objects being serialized. An attempt to do this would constitute augmentation of the concept of saving of state with some sort of non-obvious side effect. This would almost surely be a mistake and a likely source of very subtle bugs.
Unfortunately, implementation issues currently prevent the detection of this kind of error when the data item is wrapped as a name-value pair.
A similar problem can occur when different objects are loaded to an address which is different from the final location:
In this case, the address of is the one that is tracked rather than the address of the new item added to the set. Left unaddressed this will break the features that depend on tracking such as loading an object through a pointer. Subtle bugs will be introduced into the program. This can be addressed by altering the above code thusly: This will adjust the tracking information to reflect the final resting place of the moved variable and thereby rectify the above problem.
If it is known a priori that no pointer values are duplicated, overhead associated with object tracking can be eliminated by setting the object tracking class serialization trait appropriately.
By default, data types designated primitive by the Implementation Level class serialization trait are never tracked. If it is desired to track a shared primitive object through a pointer (e.g. a used as a reference count), It should be wrapped in a class/struct so that it is an identifiable type. The alternative of changing the implementation level of a would affect all s serialized in the whole program - probably not what one would intend.
It is possible that we may want to track addresses even though the object is never serialized through a pointer. For example, a virtual base class need be saved/loaded only once. By setting this serialization trait to , we can suppress redundant save/load operations.
Some types, specially those with complicated lifetime behavior or limited access to their internal state, might need or benefit from elaborate serialization algorithms. The prinicple motivating case is that of shared_ptr. As instances are loaded, they have to be "matched up" with any other instances which have already been loaded. Thus, a table of previously loaded instances has to be maintained while the archive containing the shared_ptr instances is being loaded. Without maintaining such a table, the shared_ptr would be a serializable type.
To implement this facility, one declares a helper object associated to the current archive that can be used to store contextual information relevant to the particular type serialization algorithm.
creates a helper object associated to the archive the first time it is invoked; subsequent invocations return a reference to the object created in the first place, so that can effectively be used to store contextual information persisting through the serialization of different objects on the same archive.
Helpers may be created for saving and loading archives. The same program might have several different helpers or the same helper instantiated separately from different parts of the program. This is what makes the helper_instance_id necessary. In principle it could be any unique integer. In practice it seems easiest to use the address of the serialization function which contains it. The above example uses this technique.
By default, for each class serialized, class information is written to the archive. This information includes version number, implementation level and tracking behavior. This is necessary so that the archive can be correctly deserialized even if a subsequent version of the program changes some of the current trait values for a class. The space overhead for this data is minimal. There is a little bit of runtime overhead since each class has to be checked to see if it has already had its class information included in the archive. In some cases, even this might be considered too much. This extra overhead can be eliminated by setting the implementation level class trait to: .
Turning off tracking and class information serialization will result in pure template inline code that in principle could be optimised down to a simple stream write/read. Elimination of all serialization overhead in this manner comes at a cost. Once archives are released to users, the class serialization traits cannot be changed without invalidating the old archives. Including the class information in the archive assures us that they will be readable in the future even if the class definition is revised. A light weight structure such as a display pixel might be declared in a header like this:
Several archive classes create their data in the form of text or a portable binary format. It should be possible to save such a class on one platform and load it on another. This is subject to a couple of conditions.
The architecture of the machine reading the archive must be able hold the data saved. For example, the gcc compiler reserves 4 bytes to store a variable of type while other compilers reserve only 2 bytes. So it's possible that a value could be written that couldn't be represented by the loading program. This is a fairly obvious situation and easily handled by using the numeric types in <boost/cstdint.hpp>
A special integral type is which is a typedef of an integral types guaranteed to be large enough to hold the size of any collection, but its actual size can differ depending on the platform. The wrapper exists to enable a portable serialization of collection sizes by an archive. Recommended choices for a portable serialization of collection sizes are to use either 64-bit or variable length integer representation.
Another potential problem is illustrated by the following example: If uses default serialization traits there could be a problem. With the default traits, each time a new type is added to the archive, bookkeeping information is added. So in this example, the archive would include such bookkeeping information for and for . Or would it? What about compilers that treat as a synonym for ? In this case there is only one distinct type - not two. If archives are passed between programs with compilers that differ in their treatment of the load operation will fail in a catastrophic way.
One remedy for this is to assign serialization traits to the template such that class information for instantiations of this template is never serialized. This process is described above and has been used for Name-Value Pairs. Wrappers would typically be assigned such traits.
Another way to avoid this problem is to assign serialization traits to all specializations of the template for all primitive types so that class information is never saved. This is what has been done for our implementation of serializations for STL collections.
Standard stream i/o on some systems will expand linefeed characters to carriage-return/linefeed on output. This creates a problem for binary archives. The easiest way to handle this is to open streams for binary archives in "binary mode" by using the flag . If this is not done, the archive generated will be unreadable.
Unfortunately, no way has been found to detect this error before loading the archive. Debug builds will assert when this is detected so that may be helpful in catching this error.
XML archives present a somewhat special case. XML format has a nested structure that maps well to the "recursive class member visitor" pattern used by the serialization system. However, XML differs from other formats in that it requires a name for each data member. Our goal is to add this information to the class serialization specification while still permiting the the serialization code to be used with any archive. This is achived by requiring that all data serialized to an XML archive be serialized as a name-value pair. The first member is the name to be used as the XML tag for the data item while the second is a reference to the data item itself. Any attempt to serialize data not wrapped in a in a name-value pair will be trapped at compile time. The system is implemented in such a way that for other archive classes, just the value portion of the data is serialized. The name portion is discarded during compilation. So by always using name-value pairs, it will be guaranteed that all data can be serialized to all archive classes with maximum efficiency.
Exporting Class Serialization
Elsewhere in this manual, we have described . Export implies two things:
- Instantiates code which is not otherwise referred to.
- Associates an external identifier with the class to be serialized. The fact that the class isn't explicitly referred to implies this requirement.
In C++, usage of code not explicitly referred to is implemented via virtual functions. Hence, the need for export is implied by the usage of a derived class that is manipulated via a pointer or reference to its base class.
in the same source module that includes any of the archive class headers will instantiate code required to serialize polymorphic pointers of the indicated type to the all those archive classes. If no archive class headers are included, then no code will be instantiated.
Note that the implemenation of this functionality requires that the macro appear after the inclusion of any archive class headers for which code is to be instantiated. So, code that uses will look like the following:
This will be true regardless of whether the code is part of a stand alone executable, a static library or a dyanmic or shared library.
Including in the "a.hpp" header itself as one would do with other serialization traits will make it difficult or impossible to follow the rule above regarding inclusion of archive headers before is invoked. This can best be addressed by using in the header declarations and in the class definition file.
This system has certain implications for placing code in static or shared libraries. Placing in library code will have no effect unless archive class headers are also included. So when building a library, one should include all headers for all the archive classes which he anticipates using. Alternatively, one can include headers for just the Polymoprhic Archives.
Strictly speaking, export should not be necessary if all pointer serialization occurs through the most derived class. However, in order to detect what would be a catastophic error, the library traps ALL serializations through a pointer to a polymorphic class which are not exported or otherwise registered. So, in practice, be prepared to register or export all classes with one or more virtual functions which are serialized through a pointer.
Note that the implementation of this functionality depends upon vendor specific extensions to the C++ language. So, there is no guaranteed portability of programs which use this facility. However, all C++ compilers which are tested with boost provide the required extensions. The library includes the extra declarations required by each of these compilers. It's reasonable to expect that future C++ compilers will support these extensions or something equivalent.
Static Libraries and Serialization
Code for serialization of data types can be saved in libraries just as it can for the rest of the type implementation. This works well, and can save a huge amount of compilation time.
- Only compile serialization definitions in the library.
- Explicitly instantiate serialization code for ALL archive classes you intend to use in the library.
- For exported types, only use in headers.
- For exported types, only use in definitions compiled in the library. For any particular type, there should be only one file which contains for that type. This ensures that only one copy of serialization code will exist within the program. It avoids wasted space and the possibility of having different versions of the serialization code in the same program. Including in multiple files could result in a failure to link due to duplicated symbols or the throwing of a runtime exception.
- Code for serialization should be only in the library,
- Familiarize yourself with the PIMPL idiom.
This is illustrated by , and where implementation of serializaton is in a static library completely separate from the main program.
DLLS - Serialization and Runtime Linking
Serialization code can be placed in libraries to be linked at runtime. That is, code can be placed in DLLS(Windows) Shared Libraries(*nix), or static libraries as well as the main executable. The best technique is the same as that described above for libraries. The serialization library test suite includes the following programs to illustrate how this works:
, and where implementation of serializaton is also completely separate from the main program but the code is loaded at runtime. In this example, this code is loaded automatically when the program which uses it starts up, but it could just as well be loaded and unloaded with an OS dependent API call.
Also included are , and which are similar to the above but include tests of the export and no_rtti facilities in the context of DLLS.
For best results, write your code to conform to the following guidelines:
- Don't include code in classes used in DLLS. This will generate duplicate code in the DLLS and mainline. This needlessly duplicates code. Worse, it makes is possible for different versions of the same code to exist simultaneously. This type of error turns out to be excruciatingly difficult to debug. Finally, it opens the possibility that a module being referred to might be explictly unloaded which would (hopefully) result in a runtime error. This is another bug that is not always reproducible or easy to find. For class member templates use something like in the header, and in the implementation file. This will result in generation of all code required in only one place. The library does not detect this type of error for you.
- If DLLS are to be loaded and unloaded explicitly (e.g. using in *nix or in Windows). Try to arrange that they are unloaded in the reverse sequence. This should guarantee that problems are avoided even if the above guideline hasn't been followed.
In order to implement the library, various facilities for runtime manipulation of types at runtime were required. These are for associating classes with external identifying strings (GUID
) and for casting between pointers of related types. To complete the functionality of the ability to construct and destroy corresponding types has been added. In order to use this functionality, one must specify how each type is created. This should be done at the time a class is exported. So, a more complete example of the code above would be: With this in place, one can construct, serialize and destroy a class about which is known only the GUID
and a base class.
The fundamental purpose of serialization would conflict with multiple threads concurrently writing/reading from/to a single open archive instance. The library implementation presumes that the application avoids such a situtation.
However, Writing/Reading different archives simultaneously in different tasks is permitted as each archive instance is (almost) completely independent from any other archive instance. The only shared information is some type tables which have been implemented using a lock-free thread-safe described elsewhere in this documentation.
This singleton implementation guarantees that all of this shared information is initialized when the code module which contains it is loaded. The serialization library takes care to ensure that these data structures are not subsequently modified. The only time there could be a problem would be if code is loaded/unloaded while another task is serializing data. This could only occur for types whose serialization is implemented in a dynamically loaded/unloaded DLL or shared library. So if the following is avoided:
- Accessing the same archive instance from different tasks.
- Loading/Unloading DLLS or shared libraries while any archive instances are open.
The library should be thread safe.
In performance critical applications that serialize large sets of contiguous data of homogeneous types one wants to avoid the overhead of serializing each element individually, which is the motivation for the wrapper. Serialization functions for data types containing contiguous arrays of homogeneous types, such as for , or should serialize them using an wrapper to make use of these optimizations. Archive types that can provide optimized serialization for contiguous arrays of homogeneous types should implement these by overloading the serialization of the wrapper, as is done for the binary archives.
© Copyright Robert Ramey 2002-2004. Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)