Abstracting CRUD Operations in Pandas with Python – A Clean, Scalable Approach
Using Pandas as an In-Memory Database (CRUD Pattern)
Source: file
Introduction
Pandas is widely used in data science and analytics for its powerful tabular data manipulation capabilities. But what if we think of Pandas as more than just a data analysis tool — what if we use it like a lightweight in-memory database?
In this article, we’ll explore how to design a reusable, abstract CRUD interface over a Pandas DataFrame, giving you a flexible architecture for data management — great for prototyping or in-memory data manipulation during development.
The Motivation
When working on small projects, proof-of-concepts, or even parts of larger systems that don’t yet require a full database, managing in-memory data can get messy.
Instead of manually filtering or updating DataFrames everywhere, why not centralize these operations into a consistent API?
Benefits:
- Define a clear contract using an abstract base class
- Swap backend later (e.g., DataFrame → SQLite)
- Improve maintainability and scalability
Step 1: Define the Abstract Repository
We’ll use Python’s abc module to define a class with abstract CRUD methods.
from abc import ABC, abstractmethod
import pandas as pd
class DataFrameRepository(ABC):
@abstractmethod
def create(self, record: dict):
pass
@abstractmethod
def read(self, **query):
pass
@abstractmethod
def update(self, condition: dict, updates: dict):
pass
@abstractmethod
def delete(self, **condition):
pass
Step 2: Create a Concrete Implementation
This class implements the logic using an in-memory pandas.DataFrame.
class InMemoryDataFrameRepository(DataFrameRepository):
def __init__(self):
self.df = pd.DataFrame()
def create(self, record: dict):
new_record = pd.DataFrame([record])
self.df = pd.concat([self.df, new_record], ignore_index=True)
def read(self, **query):
if not query:
return self.df.copy()
mask = pd.Series(True, index=self.df.index)
for key, value in query.items():
mask &= self.df[key] == value
return self.df[mask].copy()
def update(self, condition: dict, updates: dict):
mask = pd.Series(True, index=self.df.index)
for key, value in condition.items():
mask &= self.df[key] == value
for key, value in updates.items():
self.df.loc[mask, key] = value
def delete(self, **condition):
mask = pd.Series(True, index=self.df.index)
for key, value in condition.items():
mask &= self.df[key] == value
self.df = self.df[~mask].reset_index(drop=True)
Logic Breakdown
🟢 Create
Adds a new row to the DataFrame.
- Input: Python dictionary
Example:{"id": 1, "name": "Alice"} - Steps:
- Convert to DataFrame:
pd.DataFrame([record]) - Append using
pd.concat() - Use
ignore_index=Trueto reset index
- Convert to DataFrame:
🔵 Read
Retrieves records based on filters.
- Accepts keyword arguments:
repo.read(id=1, name="Alice") - Behavior:
- No filters → return full DataFrame
- With filters:
- Build boolean mask
- Combine conditions using
&= - Return filtered rows
🟡 Update
Updates rows matching a condition.
condition: filter criteriaupdates: columns to modify
repo.update({'id': 2}, {'name': 'Robert'})
- Uses:
self.df.loc[mask, column] = value
🔴 Delete
Removes rows based on conditions.
Example:
repo.delete(name='Alice')Steps:
- Build mask
- Invert using
~mask - Filter DataFrame
- Reset index
Step 3: Try It Out!
repo = InMemoryDataFrameRepository()
# Creating records
repo.create({'id': 1, 'name': 'Alice'})
repo.create({'id': 2, 'name': 'Bob'})
# Reading
print(repo.read(id=1)) # Record with id = 1
# Updating
repo.update({'id': 2}, {'name': 'Robert'})
# Deleting
repo.delete(name='Alice')
# Final state
print(repo.read())
Why This Matters
This pattern provides:
- ✅ Clean, testable architecture
- ✅ Easy backend replacement (SQL / NoSQL)
- ✅ Ideal for prototyping and unit testing
- ✅ Extensible with business logic or validation
What’s Next?
You can extend this further by:
- Adding schema validation:
pydanticmarshmallow
- Supporting batch operations
- Persisting data:
- CSV
- Parquet
- Adding logging or versioning
- Building a REST API:
FastAPIFlask
Final Thoughts
Although Pandas is typically associated with data analysis, this approach shows how it can act as a structured data store with minimal overhead.
By applying object-oriented design principles, you can build a flexible system that scales from simple prototypes to production-ready foundations.