Is there a way to get a view into a python array.array()?
I’m generating many largish ‘random’ files (~500MB) in which the contents are the output of repeated calls to
Is there a way to do this that I’m not seeing?
I went with numpy as user42005 and hgomersall suggested. Unfortunately this didn’t give me the speedups I was looking for. My dirt-simple C program generates ~700MB of data in 11s, while my python equivalent using numpy takes around 700s! It’s hard to believe that that’s the difference in performance between the two (I’m more likely to believe that I made a naive mistake somewhere…)
Numpy is incredibly flexible and powerful when it comes to views into arrays whilst minimising copies. For example:
b is now a view of the original array that was created.
Numpy arrays allow all manner of access directly to the data buffers, and can be trivially typecast. For example:
b is now view into the memory with the data all as 8-bit integers (the data itself remains unchanged, so that each 64-bit int now becomes 8 8-bit ints). These buffer objects (from a.data) are standard python buffer objects and so can be used in all the places that are defined to work with buffers.
The same is true for multi-dimensional arrays. However, you have to bear in mind how the data lies in memory. For example:
will work, but
returns an error about being unable to get single-segment buffer for discontiguous arrays. This problem is not obvious because simply allocating that same view to a variable using
returns a perfectly adequate numpy array. However, it is not contiguous in memory as it’s a view into the other array, which need not be (and in this case isn’t) a view of contiguous memory. You can get info about the array using the flags attribute on an array:
which returns (among other things) both C_CONTIGUOUS (C order, row major) and F_CONTIGUOUS (Fortran order, column major) as False, but
returns them both as True (in 2D arrays, at most one of them can be true).