๐งน Cleanup Script for Blocked Domains
As part of our ongoing data hygiene process, we've implemented a custom Django management command that
removes contacts and offers associated only with BlockedDomain
entries.
๐ Purpose
Over time, some domains become inactive, spammy, or irrelevant. Instead of outright deleting them, we now
track them in a BlockedDomain
model. This allows us to:
- Preserve domain references and metadata
- Safely remove all offers and contacts tied to those domains
- Keep the domain itself (for logging, recovery, or audit reasons)
⚙️ How It Works
- Fetch all
BlockedDomain
entries from the database. - For each blocked domain:
- Delete all offers linked to each contact under that domain
- Delete the contacts themselves
- The domain record remains untouched
๐ BlockedDomain Model Structure
class BlockedDomain(models.Model):
domain = models.ForeignKey(Domain, null=True, blank=True, on_delete=models.SET_NULL)
url = models.URLField(max_length=512, unique=True)
permanently_deleted = models.BooleanField(default=False)
comment = models.TextField(blank=True, null=True)
created_at = models.DateTimeField(auto_now_add=True)
๐ Command Execution
python manage.py clean_contacts_with_offers
This command will output progress logs per domain, including any contacts or offers deleted.
๐ Python 2.7 Compatibility Note
Since we’re running on Python 2.7
, we added the following line at the top of the file to ensure
Unicode support:
# -*- coding: utf-8 -*-
✅ Results
After running the command, all irrelevant contacts and offers from blocked domains are removed, leaving the domain in place for reference or later review.
This cleanup helps keep our data lean, relevant, and ready for future processing or scraping tasks.
# -*- coding: utf-8 -*-
from domain.models import Domain, Contact
from realty.models import Offer
from your_app.models import BlockedDomain # Replace with your actual app name
def clean_blocked_domains():
blocked_domains = BlockedDomain.objects.all()
print("Found {} blocked domains.".format(blocked_domains.count()))
for blocked in blocked_domains:
domain = blocked.domain
if not domain:
print("⚠ No linked Domain object for BlockedDomain URL: {}".format(blocked.url))
continue
print("\nProcessing domain: {} (ID: {})".format(domain.url, domain.id))
contacts = domain.contacts.all()
for contact in contacts:
offers = Offer.objects.filter(contact=contact)
if offers.exists():
print("Deleting {} offers for contact ID {} - {}".format(offers.count(), contact.id, contact.name))
offers.delete()
print("Deleting contact ID {} - {}".format(contact.id, contact.name))
contact.delete()
print("✅ Done cleaning contacts and offers for domain {}\n".format(domain.url))
# Call the function directly
clean_blocked_domains()
Comments
Post a Comment