[aside: for many web sites in the past, CPU time _has_ been the bottleneck]
There is no need to parse each document every time you need to find out what
images it references. The load set can be generated lazily by parsing the 
document the first time it's referenced when changed, or using a dynamic 
approach, a log module could be pushed to generate load-sets based on a 
the documents retrieved  in a virtual session.
Generating composite mime documents is not the ideal way to solve this
problem, since it causes real hardship for streamed documents whose size
is not known until the transaction is complete.
Simon