分类: C/C++
2010-07-28 17:41:52
Two-phase initialization is an architectural pattern for artificially breaking and managing coupling between strongly coupled components. The motivation and implementation of this pattern are not always obvious, so I will give a couple of examples to demonstrate.
Let’s take an operating system as an example. Some of the components involved in the initialization of the operating system are the I/O manager, the memory manager, the object manager and many others. At runtime, the strong coupling between the various components is obvious and beneficial – they tend to use each other, all the time.
However, during system startup, these dependencies (especially if startup is performed synchronously) can lead to a dead end. For example:
Another example can be taken from an ESB infrastructure I have been implementing lately. The infrastructure services include a configuration service, a publish/subscribe service and a “DNS”-style service. These services are typically used by other system components, but they also need each other:
Disentangling these dependencies can be done in various ways. For example, we could say that the infrastructure services are not allowed to use each other – the pub/sub service will use local configuration, the “DNS” service will have a predefined list of registered endpoints, etc.
However, in an operating system we can’t resort to a solution in which the object manager manages its own memory, and the memory manager manages its own objects.
The only feasible alternative is two-phase initialization.
When using two-phase initialization, infrastructure components initialize in two phases. In the first phase, they do not rely on any other components to reach a stable state in which they are able to provide basic services to the rest of the system. In the second phase, they transition to a fully-functional state in which they rely on other components (which have not necessarily reached the second phase yet).
Using this model in our example, the “DNS” service can start with a predefined list of endpoints that will be used to communicate with the infrastructure services while they are in the first phase. In the second phase, these predefined endpoints will be replaced by the actual endpoints for the actual services. The pub/sub service can start with a local configuration during the first phase, and retrieve its configuration when the configuration service becomes available (enters the first phase), and so on.
Providing a generic implementation for all infrastructure and non-infrastructure services to account for two-phase initialization is exceptionally difficult, but achievable if the proper metadata is in place. Components must provide metadata regarding their explicit dependencies and ways to make forward progress while these dependent components are not yet available.
This sounds simple, but in reality it really isn’t. Multiple issues plague the two-phase initialization pattern, but do not undermine its principal validity:
The two-phase initialization approach is used by Windows. In the first phase (called phase 0), initialization proceeds in a single thread and bring up only the minimal services required for the second phase. In the second phase (called phase 1), system components can rely on other components being present to start transitioning into their fully-functional state.
To summarize, two-phase initialization is difficult to manage and implement, but in the real world where components circularly depend on each other there is rarely a better alternative.
An object with one-phase construction is fully "built" with the constructor. An object with two-phase construction is minimally initialized in the constructor and fully "built" using a class method. Frequently copied objects with expensive constructors and destructors can be serious bottlenecks and are great candidates for two-phase construction. Designing your classes to support two-phase construction, even if internally they use one-phase, will make future optimizations easy.
The following code shows two different objects, OnePhase and TwoPhase, based on a Bitmap class. They both have the same external interface. Their internals are quite different. The OnePhase object is fully initialized in the constructor. The code for OnePhase is very simple. The code for TwoPhase, on the other hand, is more complicated. The TwoPhase constructor simply initializes a pointer. The TwoPhase methods have to check the pointer and allocate the Bitmap object if necessary.
class OnePhase
{
private:
Bitmap m_bMap; // Bitmap is a "one-phase" constructed object
public:
bool Create(int nWidth, int nHeight)
{
return (m_bMap.Create(nWidth, nHeight));
}
int GetWidth() const
{
return (m_bMap.GetWidth());
}
};
class TwoPhase
{
private:
Bitmap* m_pbMap; // Ptr lends itself to two-phase construction
public:
TwoPhase()
{
m_pbMap = NULL;
}
~TwoPhase()
{
delete m_pbMap;
}
bool Create(int nWidth, int nHeight)
{
if (m_pbMap == NULL)
m_pbMap = new Bitmap;
return (m_pbMap->Create(nWidth, nHeight));
}
int GetWidth() const
{
return (m_pbMap == NULL ? 0 : m_pbMap->GetWidth());
}
};
What kind of savings can you expect? It depends. If you copy many objects, especially "empty" objects, the savings can be significant. If you don't do a lot of copying, two-phase construction can have a negative impact, because it adds a new level of indirection.
三、使用two-phase construction 解决calling virtual during initialization 问题
参考:
1.%2B%2B_Idioms/Calling_Virtuals_During_Initialization
2.
the Dynamic Binding During Initialization idiom (AKA Calling Virtuals During Initialization).
To clarify, we're talking about this situation:
This FAQ shows some ways to simulate dynamic binding as if the calls made in Base's constructor dynamically bound to the this object's derived class. The ways we'll show have tradeoffs, so choose the one that best fits your needs, or make up another.
The first approach is a two-phase initialization. In Phase I, someone calls
the actual constructor; in Phase II, someone calls an "init" method on the
object. Dynamic binding on the this object works fine during Phase II, and
Phase II is conceptually part of construction, so we simply move some
code from the original
The only remaining issues are determining where to call Phase I and where to call Phase II. There are many variations on where these calls can live; we will consider two.
The first variation is simplest initially, though the code that actually wants to create objects requires a tiny bit of programmer self-discipline, which in practice means you're doomed. Seriously, if there are only one or two places that actually create objects of this hierarchy, the programmer self-discipline is quite localized and shouldn't cause problems.
In this variation, the code that is creating the object explicitly executes
both phases. When executing Phase I, the code creating the object either
knows the object's exact class (e.g.,
Note: Phase I often, but not always, allocates the object from the heap. When it does, you should store the pointer in some sort of , such as a , a , or some other object whose . This is the best way to prevent memory leaks when Phase II might . The following example assumes Phase I allocates the object from the heap.
The second variation is to combine the first two lines of the joe_user
function into some create function. That's almost always the right
thing to do when there are lots of joe_user-like functions. For
example, if you're using some kind of factory, such as a registry and
, you could move those
two lines into a static method called
template <class D, class Parameter>...
static Ptr Create (Parameter p)
{
std::auto_ptr <Base> ptr (new D (p));
ptr->init ();
return ptr;
}
This simplifies all the joe_user-like functions (a little), but more
importantly, it reduces the chance that any of them will create a Derived
object without also calling
Base::Ptr b = Base::Create <Derived> ("para");
}
If you're sufficiently clever and motivated, you can even eliminate
the chance that someone could create a Derived object without also calling