4. IO And Analysis¶
Pressomancy’s IO layer is useful for the same reason its object model is useful: it preserves structure. Instead of writing only a flat list of particle properties, the HDF5 writer stores those properties together with the ownership and connectivity information that explains which simulation object each particle came from. That makes the saved file useful in three different roles at once: as a trajectory, as an analysis source, and as the starting point for a new simulation state.
4.1. Why Use The IO Stack¶
For each registered object family, pressomancy writes particle data under
/particles/<Group>/<property> using the H5MD-style triplet of value,
step, and time datasets. The corresponding property tensors have shape
(T, N, D), where T is the number of stored frames, N is the number
of particles in the registered flat view for that family, and D is the
dimensionality of the property. In parallel, the writer stores object-context
information under /connectivity so that the file can answer questions a
plain trajectory cannot, such as which particles belong to a given
Filament or which child
objects belong to a given parent object.
That combined layout is the real benefit of using the built-in IO stack. The particle-property part is close enough to H5MD to be familiar and efficient, while the connectivity extension makes the result meaningful inside a pressomancy workflow. The payoff is practical rather than theoretical: one format supports long-run storage, object-aware analysis, and source-driven reconstruction without format conversion in between.
4.2. Write And Restart A Trajectory¶
The usual write workflow starts by telling the simulation which object
families should be persisted. That happens through
inscribe_part_group_to_h5(), which
creates the HDF5 layout and records the flat particle views that later writes
will use. Concretely, the simulation stores those particle handles in
sim.io_dict['flat_part_view']. That view is the ordered particle list used
by write_part_group_to_h5() when each
new frame is appended. In other words, flat_part_view is the bridge
between the simulation-object hierarchy and the flat per-group particle layout
written to HDF5.
step_index = sim.inscribe_part_group_to_h5(
group_type=[Filament, Crowder],
h5_data_path="run.h5",
mode="NEW",
)
for _ in range(n_steps):
sim.sys.integrator.run(1)
sim.write_part_group_to_h5(time_step=step_index)
step_index += 1
On the current branch,
inscribe_part_group_to_h5() supports
four modes. In practice, they are not interchangeable and it is worth being
explicit about what each one assumes.
NEWcreates a fresh HDF5 file and registers the current flat particle view for later writes.LOAD_NEWreopens an existing HDF5 file and rebuilds the flat particle view from the connectivity already stored in that file. In practice this is the preferred continuation mode when you have both the ESPResSo binary checkpoint and the HDF5 file available, because it does not require you to reconstruct the full Python-side object graph before continuing IO.LOADreopens an existing HDF5 file, but it assumes that the simulation setup has already been reconstructed in memory and that the current objects already own the right particles. This mode therefore depends only on the ESPResSo checkpoint for particle state, but it requires the whole simulation script to rebuild the object hierarchy before the IO layer can safely resume.INIT_SRCprepares the IO layer for source-driven initialization. It is not a continuation mode in the checkpoint sense. It is the mode used when a previous HDF5 file should act as the source of positions, orientations, and selected properties for a newly constructed simulation.
In day-to-day restart work, LOAD_NEW is usually the strongest option. The
connectivity tables in the HDF5 file give pressomancy enough information to
rebuild the flat particle view directly from saved object ownership, which is
why it offers extra functionality compared with LOAD.
This is also a good place to keep the role of ESPResSo checkpointing in perspective. The binary checkpoint mechanism is powerful and robust, but it relies on binary state and pickle-based reconstruction. In practice, that means it is less portable across systems, package versions, ESPResSo versions, and runtime layouts such as MPI rank counts. The HDF5 files are less tied to that execution environment and can, in principle, be used to reconstruct a simulation state while sidestepping binary checkpoint portability limits. That does not remove the need for care, but it is one of the reasons the HDF5 path is so valuable in longer-lived workflows.
4.3. Keep Checkpoints And HDF5 In Sync¶
The force_resize_to_size argument exists for a workflow that matters in
practice on HPC platforms. Long simulations often save two independent kinds
of state: an ESPResSo checkpoint for the live system, and an HDF5 trajectory
for analysis. If a job is interrupted between those two writes, the last valid
binary checkpoint and the last valid HDF5 frame may no longer correspond to
one another.
That is why it is a good idea to maintain a global step counter and treat it
as the record of the last reliable state. When that counter says the last safe
frame is smaller than what the HDF5 file currently contains,
force_resize_to_size lets LOAD_NEW truncate every stored property to
that known-good length before the next write. In other words, it drops ragged
trailing data so that the HDF5 trajectory and the binary checkpoint are again
synchronized.
This is especially useful when a run writes checkpoints and HDF5 frames on
different cadences, or when a failure occurs after one output path succeeded
and the other did not. In that situation, LOAD_NEW plus
force_resize_to_size gives you a clean way to continue without carrying a
silently inconsistent tail in the HDF5 data.
4.4. Analysis API¶
Reading follows the same explicit style. The entry point is
H5DataSelector. The important
idea is that the selector never asks you to guess which axis an index belongs
to. Timesteps and particles are separate iteration contexts, exposed through
.timestep and .particles.
with h5py.File("run.h5", "r") as h5_file:
data = H5DataSelector(h5_file, particle_group="Filament")
frame = data.timestep[-1]
particle_window = data.particles[100:150]
composed_a = data.timestep[-1].particles[100:150]
composed_b = data.particles[100:150].timestep[-1]
Because the selector stores one timestep slice and one particle slice
internally, timestep and particle selection can be composed in either order.
That gives you unambiguous iteration contexts for both axes and makes slicing
behavior robust even in longer chained expressions. If you want to iterate
frame by frame, iterate over data.timestep. If you want to iterate over a
particle subset, iterate over data.particles. If you want both, compose
the two contexts first and then access the properties.
Predicates are the next important layer. They let you refine a selected view without leaving the selector API.
with h5py.File("run.h5", "r") as h5_file:
data = H5DataSelector(h5_file, particle_group="Filament")
frame = data.timestep[-1]
filament_zero = frame.select_particles_by_object(
object_name="Filament",
connectivity_value=0,
predicate=lambda subset: subset.type == sim.part_types["real"],
)
magnetic_filaments = frame.get_connectivity_values(
"Filament",
predicate=lambda subset: np.any(subset.dip[..., 2] > 0.0),
)
This is useful because it lets you express selection logic in the same coordinate system in which the data are stored. You can first reduce the view by timestep, then by object ownership, then by a property predicate, without having to manually reconstruct index arrays outside the API.
The object-context helpers are what make the selector especially useful for
pressomancy-generated data. select_particles_by_object gives you the
particle subset belonging to one saved object instance, such as one
Filament or one
Quadriplex.
get_connectivity_values()
lets you enumerate object IDs, optionally filtered by a predicate.
get_child_ids(),
get_parent_ids(), and
get_connectivity_map()
then let you move up and down the saved object graph. In practice, this enables
workflows such as selecting all particles of one
Filament at one frame,
querying which
Quadriplex objects belong
to that filament, or filtering object IDs based on per-object property tests
without re-deriving connectivity from raw coordinates.
4.5. Use Output As A Source¶
Source-driven initialization is best understood as a workflow for building a
new simulation from the structural state of an older one. A representative
example is a long run of fixed-point dipole chains used to reach an
equilibrium self-assembly picture. A later run can then take a chosen snapshot
from that older trajectory, reuse its positions and orientations, and
substitute a different local particle model such as
EGGPart while keeping the
larger assembled structure.
The first step is to declare the source mapping with
set_init_src().
sim.set_init_src(
path="run.h5",
pos_ori_src_type=["real"],
type_to_type_map=[("real", "real"), ("virt", "virt")],
prop_to_prop_map=[("dip", "director")],
)
In this workflow, positions and orientations are special.
get_pos_ori_from_src() is called
implicitly when you place objects through
set_objects() with mode='INIT_SRC'.
That means the normal placement step is replaced by source-backed placement,
and the top-level objects, for example
Filament instances, are
created directly at the positions and orientations read from the HDF5 file.
sim.store_objects(objects)
sim.set_objects(objects, mode="INIT_SRC")
Additional property transfer happens separately through
set_prop_from_src(). This call should
come after the objects have been created and after any object-local setup that
must already exist for the property mapping to make sense, but before the
production dynamics begins. In a chain-suspension workflow, that means the
object hierarchy is built first, anchors and related local structure are
added, and only then are source properties copied onto the newly created
particles.
sim.store_objects(objects)
sim.set_objects(objects, mode="INIT_SRC")
for filament in objects:
filament.add_anchors(type_name="real")
filament.bond_anchors()
sim.set_prop_from_src(objects)
That separation is important. set_objects(..., mode='INIT_SRC') handles
where the new objects should be placed. set_prop_from_src handles which
non-positional properties should then be copied from the source snapshot onto
those already created local particles.
A related helper,
mk_src_file(), is useful when the
source you want is hidden inside a larger trajectory. It copies an existing
HDF5 file, collapses it to a single timestep, and can optionally append any
missing per-particle properties needed by the next run. In practice, that is a
clean way to turn a trajectory frame into a reusable source file.
4.6. Performance And Common Mistakes¶
The HDF5 writer stores each value dataset with timestep-shaped chunks,
namely (1, N, D), and uses moderate gzip compression. This is not just a
storage detail. It means a read such as data.timestep[-1] naturally
touches one frame at a time rather than encouraging whole-trajectory reads.
For long runs, that keeps memory use under control and makes frame-local,
object-aware analysis practical in an interactive workflow.
The most common IO failures are mapping failures rather than storage failures.
If a source-driven workflow breaks, the first things to check are whether
set_init_src was called and whether the declared type names exist both in
the local simulation and in the source file. The next thing to check is the
correspondence between type_to_type_map and prop_to_prop_map. If those
two lists are not aligned conceptually, the restart logic has no sensible way
to match source data to local particles.
The other recurring mistake is to postpone the structured IO path until the analysis stage. By that point, the important ownership information may simply never have been written. If you expect object-level analysis or source-driven restarts later, the built-in HDF5 path is worth using from the first run.
For a case where hierarchy, placement, and IO all matter at once, continue to G-quadruplex Assembly.