Rebuilding requirements.txt from an old Python project with a simple import scan
When I moved an old Python project to a new machine, I ran into a very ordinary problem: I still had the source code, but I no longer had a reliable requirements.txt.
This is the kind of situation many developers know too well. The code still runs somewhere in memory, or at least in a forgotten folder, but the environment that made it work has vanished. Maybe the project was created before pip freeze became part of the habit. Maybe it lived inside an old virtual environment that was deleted long ago. Maybe it was a collection of helper scripts built over time without much packaging discipline.
In my case, I wanted to move the helper scripts, set up the project properly on a new machine, and produce a cleaner requirements.txt instead of guessing package names one by one.
So I used a small Python snippet to scan the project folder, look for imports, and collect modules that were likely third-party dependencies.
The problem
An old Python project often contains dozens of .py files, and each one may import a different set of modules. Some imports belong to the Python standard library. Some point to local files inside the project. Others come from packages that need to be installed with pip.
The hard part is separating those categories in a practical way.
I did not want a perfect dependency resolver. I wanted a quick, useful first pass.
Here is the snippet I used:
import re, pkgutil, sys, os
from pathlib import Path
root = Path('.')
py_files = list(root.rglob('*.py'))
import_re = re.compile(r'^(?:from|import)\s+([\w\.]+)')
third_party = set()
stdlib = set(sys.stdlib_module_names) if hasattr(sys, 'stdlib_module_names') else set()
# Add builtins that are not in stdlib set maybe
stdlib |= set(sys.builtin_module_names)
for p in py_files:
text = p.read_text(encoding='utf-8', errors='ignore')
for line in text.splitlines():
line = line.strip()
if not line or line.startswith('#'):
continue
m = import_re.match(line)
if m:
mod = m.group(1).split('.')[0]
if mod in ('from', 'import'):
continue
if mod in stdlib:
continue
# skip local modules
# if module file exists in project as .py or package
if (root / (mod + '.py')).exists() or (root / mod).exists():
continue
third_party.add(mod)
print('Found', len(third_party), 'candidates')
for m in sorted(third_party):
print(m)
What this script is doing
The script walks through the current folder recursively and finds every Python file.
It then reads each file line by line and looks for statements that begin with either import or from.
For each match, it extracts the top-level module name. That matters because a line like this:
import pandas.core.frame
should still count as pandas for dependency purposes.
After that, the script filters out two obvious categories:
- standard library modules
- local project modules
Whatever remains gets added to a set of possible third-party packages.
At the end, it prints a deduplicated, sorted list of candidates.
Why this approach was useful
I like this kind of solution because it is honest about what it is.
It is not a lockfile generator. It is not a full dependency graph analyzer. It is not trying to understand every edge case in Python imports.
It is simply a recovery tool.
When an old project has drifted away from its original environment, a script like this gives you a practical inventory. It turns a vague question — “what did this project depend on?” — into a short reviewable list.
That alone can save a surprising amount of time.
Why the filtering matters
Without filtering, the list would be noisy and mostly useless.
If the script included standard library modules like os, sys, re, json, or pathlib, the output would look much more important than it really is.
If it included local modules, it would also blur the line between code you wrote and packages you need to install.
The real value comes from narrowing the result down to things that probably came from outside the project.
In statement form, the logic is simple:
- if a module is built into Python, ignore it
- if a module already exists in the project folder, ignore it
- if it is neither of those, treat it as a candidate dependency
That is a very practical rule for messy recovery work.
What this gets right
This method works well for a first-pass reconstruction because it catches the most visible dependency signal in Python code: import statements.
It is especially useful when:
- the project is script-heavy
- imports are mostly written in direct and conventional ways
- you need a quick starting point, not a perfect answer
- the original environment is gone
In many old personal projects, admin scripts, notebooks converted to scripts, and one-off automation tools, that is more than enough to get momentum back.
What this does not solve
The script is helpful, but it should not be mistaken for a complete dependency management solution.
There are several gaps.
1. Package name is not always the same as import name
This is the biggest limitation.
You may import bs4, but install beautifulsoup4.
You may import cv2, but install opencv-python.
You may import yaml, but install PyYAML.
You may import PIL, but install Pillow.
So the output is best treated as a candidate list, not a final requirements.txt.
2. Dynamic imports are invisible
If the project uses __import__(), importlib, plugin loading, or string-based module loading, a line-based regex scan will miss those completely.
3. Multi-line imports may be missed
Imports written across several lines are harder to capture with a simple line-based pattern.
Example:
from mypackage.submodule import (
thing_one,
thing_two,
)
A lightweight regex may only partially understand patterns like this.
4. Optional dependencies look the same as required ones
Some imports appear only in debug code, development utilities, test files, or optional features. The script cannot tell which ones are essential for production use.
5. It may still misclassify local code
The local-module check is simple and useful, but not perfect. A more deeply nested project structure or src/ layout can confuse this style of detection.
Why I still like it
Even with those limits, I still think this is a smart trick.
It solves the actual problem I had at the time: creating a reasonable shortlist so I could rebuild the environment faster.
That is an important lesson in itself. Sometimes the best tooling is not the most sophisticated tool. Sometimes it is just the smallest script that reduces uncertainty.
This snippet did exactly that.
It gave me a list.
The list gave me a review path.
The review path gave me a working requirements.txt.
That is a good trade.
A practical workflow after running the script
Once the candidate list is printed, the next step is not to trust it blindly. The next step is to turn it into a review process.
A sensible workflow looks like this:
- Run the script and collect candidates.
- Map import names to installable package names where needed.
- Install those packages into a fresh virtual environment.
- Run the project or key scripts.
- Fix missing packages one by one.
- Save the final result into
requirements.txt.
In other words, the script is the starting point, not the finish line.
Why this matters for old projects
Old projects are where software reality shows up most clearly.
Documentation fades. Environments disappear. Memory becomes unreliable. But source code usually tells at least part of the truth.
Import statements are one of the clearest traces of that truth.
They may not tell the whole story, but they often tell enough to rebuild.
Final thought
I would not use this script as a replacement for proper dependency management in an active modern project. For that, I would rather use virtual environments consistently, save dependencies deliberately, and keep installation files up to date.
But for recovering an old codebase, migrating helper scripts, or reconstructing a lost environment, this little import scanner is a very practical move.
It turns archaeology into a checklist.
And sometimes that is exactly what you need.