# Reading numpy structured from a text file¶

Numpy has a very nice feature: a structured array, that is array in which rows have some structure, and can store different types of data in each column.

For example:

>>> import numpy as np
>>> arr = np.zeros(10, dtype=[['id', np.uint16], ['position', np.dtype('3float32')], ['momentum', np.dype('3float32')]])


We have defined a structured array in each row we store: id of a particle (unsigned int), its position (three floats) and momentum (again three floats).

You can easily select from this array:

>>> arr['position']
>>> arr[0]['position']
>>> arr[arr['id']=1]['position']


This is a nice format because:

• Your data has structure. No more off-by-one errors: particle position is labeled.
• Very easy to load from binary files

Loading from text files is a entirely different matter — because writing to such arrays is kind of pain.

My requirements were:

• Array structure is the same as source file structure (order of fields is the same)
• Array structure is defined only in single place: that is dtype defintion

## Solution¶

Solution is to:

• Read file line by line parsing contents to unstructured array.
• Create structured view
• Should be fast, that means no copying of large arrays.

Actual dtype used:

URQMD_DATA_DTYPE = [
("time", np.float32),
("position", np.dtype("3float32")),
("energy", np.float32),
("momentum", np.dtype("3float32")),
("mass", np.float32),
("particle_type", np.float32),
]


Helper function that takes structured dtype, and turns it to dtype that has the same number of fields but is unstructured:

def serialize_dtype(dt):
dt = np.dtype(dt)
newdt = []
for item in dt.descr:
if len(item) == 2:
count = 1,
name, type = item
else:
name, type, count = item
if len(count) > 1:
raise ValueError()
count = count[0]
for ii in range(count):
newdt.append(type)
return np.dtype(", ".join(newdt))


Now frame is a list of lines from text file.

parsed = np.zeros(len(frame), dtype=serialize_dtype(URQMD_DATA_DTYPE)) # Create array without structure
for ii, line in enumerate(frame):
data = [float(x) for x in line.split()] # Parse lines
#-- ignoring wheher it is a float or int
parsed[ii] = tuple(data) # Now numpy will convert single row to proper types
parsed = parsed.view(URQMD_DATA_DTYPE) # Create a structured view (no copy!)


Sound simple but took me some time to get it right.