When reading Parquet we make an immutable intermediate data structures and accumulate them inti the final result. This is extremely expensive and slow. Since we know the number of rows/types we are reading ahead of time we can optimize the read time by:
These should be fairly simple changes. They will also help us clean up the parquet implementation as we go.
Original discussion: #147 Similar issue: #133