Streaming generated Excel
Our code makes use of Streaming version of XSSF from Apache POI (SXSSF). While this allows for smaller memory footprint to produce the resulting Excel file, it does not actually stream the rows to the client -- it generates the entire Excel file (using temp files) and only when done it will copy the bytes to the client.
We wish to send data to the client as soon as it is generated by the server (with some buffering for performance).
Background
- We don't merge cells and we don't change formats when we generate 100,000s of rows: All style definitions remain unchanged.
- We just add rows, 1 by 1 to a sheet
- We never seek or look at any other row in any of the sheets containing actual data
Considering the above, we can
- Prepare the Excel sheet metadata (styles, sheet names, ...) based on our template.
- We process sheets 1 by 1: allowing us to send row data to the client
Because of how we generate Excel, we could send content to the client as it is generated -- preventing client timeouts.
Concept
- Open template
- Modify the simple stuff (add new style references or similar)
- Write to temp file. This has to be very fast.
- Open temp file and stream fixed content in Excel format: metadata, sheet definitions, etc.
- For every data sheet that has generated data, stream it.
Howto
-
streaming.SXSSFWorkbook
only writes data to outputStream in#write(OutputStream)
method. We want to set the targetoutputStream
early in the generation process and flush as much data as possible to the client. We need a new method#setOutputStream(OutputStream)
. - We need a method that signals the Super
SXSSFWorkbook
that we are done with metadata or with a sheet so that it can send stuff down the target stream. Review the#write()
method and see how we can hack it to make the entire thing streamable.