Abstracting CRUD Operations in Pandas with Python – A Clean, Scalable Approach

Using Pandas as an In-Memory Database (CRUD Pattern)

Source: file


Introduction

Pandas is widely used in data science and analytics for its powerful tabular data manipulation capabilities. But what if we think of Pandas as more than just a data analysis tool — what if we use it like a lightweight in-memory database?

In this article, we’ll explore how to design a reusable, abstract CRUD interface over a Pandas DataFrame, giving you a flexible architecture for data management — great for prototyping or in-memory data manipulation during development.


Placeholder Image


The Motivation

When working on small projects, proof-of-concepts, or even parts of larger systems that don’t yet require a full database, managing in-memory data can get messy.

Instead of manually filtering or updating DataFrames everywhere, why not centralize these operations into a consistent API?

Benefits:

  • Define a clear contract using an abstract base class
  • Swap backend later (e.g., DataFrame → SQLite)
  • Improve maintainability and scalability

Step 1: Define the Abstract Repository

We’ll use Python’s abc module to define a class with abstract CRUD methods.

from abc import ABC, abstractmethod
import pandas as pd

class DataFrameRepository(ABC):

    @abstractmethod
    def create(self, record: dict):
        pass

    @abstractmethod
    def read(self, **query):
        pass

    @abstractmethod
    def update(self, condition: dict, updates: dict):
        pass

    @abstractmethod
    def delete(self, **condition):
        pass

Step 2: Create a Concrete Implementation

This class implements the logic using an in-memory pandas.DataFrame.

class InMemoryDataFrameRepository(DataFrameRepository):
    def __init__(self):
        self.df = pd.DataFrame()

    def create(self, record: dict):
        new_record = pd.DataFrame([record])
        self.df = pd.concat([self.df, new_record], ignore_index=True)

    def read(self, **query):
        if not query:
            return self.df.copy()
        mask = pd.Series(True, index=self.df.index)
        for key, value in query.items():
            mask &= self.df[key] == value
        return self.df[mask].copy()

    def update(self, condition: dict, updates: dict):
        mask = pd.Series(True, index=self.df.index)
        for key, value in condition.items():
            mask &= self.df[key] == value
        for key, value in updates.items():
            self.df.loc[mask, key] = value

    def delete(self, **condition):
        mask = pd.Series(True, index=self.df.index)
        for key, value in condition.items():
            mask &= self.df[key] == value
        self.df = self.df[~mask].reset_index(drop=True)

Logic Breakdown

🟢 Create

Adds a new row to the DataFrame.

  • Input: Python dictionary
    Example:
    {"id": 1, "name": "Alice"}
    
  • Steps:
    • Convert to DataFrame: pd.DataFrame([record])
    • Append using pd.concat()
    • Use ignore_index=True to reset index

🔵 Read

Retrieves records based on filters.

  • Accepts keyword arguments:
    repo.read(id=1, name="Alice")
    
  • Behavior:
    • No filters → return full DataFrame
    • With filters:
      • Build boolean mask
      • Combine conditions using &=
      • Return filtered rows

🟡 Update

Updates rows matching a condition.

  • condition: filter criteria
  • updates: columns to modify
repo.update({'id': 2}, {'name': 'Robert'})
  • Uses:
    self.df.loc[mask, column] = value
    

🔴 Delete

Removes rows based on conditions.

  • Example:

    repo.delete(name='Alice')
    
  • Steps:

    • Build mask
    • Invert using ~mask
    • Filter DataFrame
    • Reset index

Step 3: Try It Out!

repo = InMemoryDataFrameRepository()

# Creating records
repo.create({'id': 1, 'name': 'Alice'})
repo.create({'id': 2, 'name': 'Bob'})

# Reading
print(repo.read(id=1))  # Record with id = 1

# Updating
repo.update({'id': 2}, {'name': 'Robert'})

# Deleting
repo.delete(name='Alice')

# Final state
print(repo.read())

Why This Matters

This pattern provides:

  • ✅ Clean, testable architecture
  • ✅ Easy backend replacement (SQL / NoSQL)
  • ✅ Ideal for prototyping and unit testing
  • ✅ Extensible with business logic or validation

What’s Next?

You can extend this further by:

  • Adding schema validation:
    • pydantic
    • marshmallow
  • Supporting batch operations
  • Persisting data:
    • CSV
    • Parquet
  • Adding logging or versioning
  • Building a REST API:
    • FastAPI
    • Flask

Final Thoughts

Although Pandas is typically associated with data analysis, this approach shows how it can act as a structured data store with minimal overhead.

By applying object-oriented design principles, you can build a flexible system that scales from simple prototypes to production-ready foundations.