Using Pandas as an In-Memory Database (CRUD Pattern)

Source: file

Introduction

Pandas is widely used in data science and analytics for its powerful tabular data manipulation capabilities. But what if we think of Pandas as more than just a data analysis tool — what if we use it like a lightweight in-memory database?

In this article, we’ll explore how to design a reusable, abstract CRUD interface over a Pandas DataFrame, giving you a flexible architecture for data management — great for prototyping or in-memory data manipulation during development.

Placeholder Image

The Motivation

When working on small projects, proof-of-concepts, or even parts of larger systems that don’t yet require a full database, managing in-memory data can get messy.

Instead of manually filtering or updating DataFrames everywhere, why not centralize these operations into a consistent API?

Benefits:

Define a clear contract using an abstract base class
Swap backend later (e.g., DataFrame → SQLite)
Improve maintainability and scalability

Step 1: Define the Abstract Repository

We’ll use Python’s abc module to define a class with abstract CRUD methods.

from abc import ABC, abstractmethod
import pandas as pd

class DataFrameRepository(ABC):

    @abstractmethod
    def create(self, record: dict):
        pass

    @abstractmethod
    def read(self, **query):
        pass

    @abstractmethod
    def update(self, condition: dict, updates: dict):
        pass

    @abstractmethod
    def delete(self, **condition):
        pass

Step 2: Create a Concrete Implementation

This class implements the logic using an in-memory pandas.DataFrame.

class InMemoryDataFrameRepository(DataFrameRepository):
    def __init__(self):
        self.df = pd.DataFrame()

    def create(self, record: dict):
        new_record = pd.DataFrame([record])
        self.df = pd.concat([self.df, new_record], ignore_index=True)

    def read(self, **query):
        if not query:
            return self.df.copy()
        mask = pd.Series(True, index=self.df.index)
        for key, value in query.items():
            mask &= self.df[key] == value
        return self.df[mask].copy()

    def update(self, condition: dict, updates: dict):
        mask = pd.Series(True, index=self.df.index)
        for key, value in condition.items():
            mask &= self.df[key] == value
        for key, value in updates.items():
            self.df.loc[mask, key] = value

    def delete(self, **condition):
        mask = pd.Series(True, index=self.df.index)
        for key, value in condition.items():
            mask &= self.df[key] == value
        self.df = self.df[~mask].reset_index(drop=True)

Logic Breakdown

🟢 Create

Adds a new row to the DataFrame.

Input: Python dictionary
Example:
```
{"id": 1, "name": "Alice"}
```
Steps:
- Convert to DataFrame: pd.DataFrame([record])
- Append using pd.concat()
- Use ignore_index=True to reset index

🔵 Read

Retrieves records based on filters.

Accepts keyword arguments:
```
repo.read(id=1, name="Alice")
```
Behavior:
- No filters → return full DataFrame
- With filters:
  - Build boolean mask
  - Combine conditions using &=
  - Return filtered rows

🟡 Update

Updates rows matching a condition.

condition: filter criteria
updates: columns to modify

repo.update({'id': 2}, {'name': 'Robert'})

Uses:
```
self.df.loc[mask, column] = value
```

🔴 Delete

Removes rows based on conditions.

Example:
```
repo.delete(name='Alice')
```
Steps:
- Build mask
- Invert using ~mask
- Filter DataFrame
- Reset index

Step 3: Try It Out!

repo = InMemoryDataFrameRepository()

# Creating records
repo.create({'id': 1, 'name': 'Alice'})
repo.create({'id': 2, 'name': 'Bob'})

# Reading
print(repo.read(id=1))  # Record with id = 1

# Updating
repo.update({'id': 2}, {'name': 'Robert'})

# Deleting
repo.delete(name='Alice')

# Final state
print(repo.read())

Why This Matters

This pattern provides:

✅ Clean, testable architecture
✅ Easy backend replacement (SQL / NoSQL)
✅ Ideal for prototyping and unit testing
✅ Extensible with business logic or validation

What’s Next?

You can extend this further by:

Adding schema validation:
- pydantic
- marshmallow
Supporting batch operations
Persisting data:
- CSV
- Parquet
Adding logging or versioning
Building a REST API:
- FastAPI
- Flask

Final Thoughts

Although Pandas is typically associated with data analysis, this approach shows how it can act as a structured data store with minimal overhead.

By applying object-oriented design principles, you can build a flexible system that scales from simple prototypes to production-ready foundations.