Sometimes, data is stored in binary formats or needs to be read from byte streams. Pandas provides several methods to handle such scenarios effectively. Below are some common ways to read bytes into a Pandas DataFrame.
In a simple use case, you might have a byte string representing CSV data:
import pandas as pd
import io
df = pd.read_csv(io.BytesIO(byte_data))- It’s a file-like object that lives in memory instead of on disk.
- It behaves like a file you can read from and write to, but the content is stored as bytes in RAM, not in an actual
.txtor.csvfile. - You can use it anywhere a function expects a “file object” (
.read(),.write(),.seek(), etc.).
- Many libraries (like
pandas,PIL,requests,zipfile) want a file as input/output. - If you don’t want to write to disk (slower, messy, permissions issues), you can hand them a
BytesIOobject instead. - When you’re done, you can grab the raw bytes using
.getvalue().
import io
# Create a BytesIO object
buffer = io.BytesIO()
# Write some bytes
buffer.write(b"hello world")
# Move cursor back to the start
buffer.seek(0)
# Read the content
print(buffer.read()) # b'hello world'import pandas as pd
import io
df = pd.DataFrame({"x": [10, 20], "y": [30, 40]})
# Write DataFrame as CSV into BytesIO
buf = io.BytesIO()
df.to_csv(buf, index=False)
# Get raw bytes
csv_bytes = buf.getvalue()
print(csv_bytes[:50]) # shows the first few bytes of the CSVimport requests
# Suppose you want to POST a CSV without saving to disk
buf = io.BytesIO()
df.to_csv(buf, index=False)
buf.seek(0) # reset cursor so it reads from start
response = requests.post("http://example.com/upload", files={"file": buf})Think of BytesIO as a USB stick in RAM:
- You can open it, write data, move around, and read it back.
- Nothing ever touches your hard drive unless you explicitly write the bytes out.