Today we built a powerful suite of Django management commands for improving and enriching Contact and Domain models in a real estate platform using Django 1.8 and Python 2.7 — all designed to work with legacy systems, while still leveraging smart NLP techniques like text summarization.
🛠️ Overview of Management Commands
1. update_contact_offer_counts
Purpose: Updates the count field of each Contact with the number of related Offer objects.
python manage.py update_contact_offer_counts
2. update_domain_contact_counts
Purpose: Updates the contact_count field in each Domain by counting how many Contact objects are assigned to it.
python manage.py update_domain_contact_counts
3. update_domain_ad_counts
Purpose: Sums up all Contact.count values for contacts linked to a Domain, and saves that total in the Domain.ad_count field.
python manage.py update_domain_ad_counts
4. show_contacts_with_multiple_offers_and_no_domain
Purpose: Lists all Contact objects that:
- Have more than one offer (
count > 1) - Have a non-empty website
- Do not yet have a
Domainassigned
python manage.py show_contacts_with_multiple_offers_and_no_domain
5. assign_domains_to_contacts
Purpose: For every Domain, finds Contact objects whose website URL contains the domain’s URL, and assigns that Domain if not already assigned.
python manage.py assign_domains_to_contacts
6. copy_contact_logos_to_domains
Purpose: For each Domain that has no logo, finds a related Contact that does, and copies the logo.
python manage.py copy_contact_logos_to_domains
7. generate_summaries_with_gensim
Purpose: Generates a short summary from each Domain.plain_rewrite using Gensim’s summarize() function, and stores it in the description field.
python manage.py generate_summaries_with_gensim
8. generate_rewrite_and_summary
Purpose: First strips html_rewrite into plain text (if plain_rewrite is empty). Then generates a summary using Gensim and saves it in description.
python manage.py generate_rewrite_and_summary
🧠Bonus: What Can Gensim Do With Text?
Gensim is a powerful NLP toolkit focused on semantic modeling, topic discovery, and similarity analysis — particularly useful when working with large sets of unstructured text like contact descriptions, real estate listings, or scraped HTML.
| Feature | Tool/Method | Use Case |
|---|---|---|
| Summarization | summarize() |
Auto-snippets, TL;DRs, meta descriptions |
| Keyword Extraction | keywords() |
Auto-tagging, search filtering, highlights |
| Topic Modeling | LdaModel, LsiModel |
Discover themes in ads or descriptions |
| Similarity Search | MatrixSimilarity |
Detect duplicates, recommend similar items |
| Word Similarity | Word2Vec, FastText |
Semantic search, user intent detection |
| Document Embedding | Doc2Vec |
Content recommendation, ML clustering |
| TF-IDF Modeling | TfidfModel |
Identify unique or weighted keywords |
Pro tip: Even in legacy Python 2.7 setups, Gensim 3.x remains a reliable and flexible choice for NLP-based processing without requiring heavy ML infrastructure.
🚀 Ready to Expand
With these tools in place, you now have:
- Clean, structured data (
count,ad_count,description) - Enriched content from HTML
- NLP summaries, keywords, and potential for auto-tagging
This lays the foundation for smart features like:
- Related listings
- Contact deduplication
- AI-assisted content suggestions
- Real-time domain health dashboards
Let me know if you’d like to expand this setup with TF-IDF, clustering, auto-tagging, or multi-language summaries next!
Comments
Post a Comment