
How I Used Web Scraping and Machine Learning to Save Hours Finding the Best Car Deals Online
Automating finding best car deals
“How I Used Web Scraping and Machine Learning to Save Hours Finding the Best Car Deals Online”
⸻
🧠 The Why (Problem)
Buying a used car online is like dating on Tinder—you scroll endlessly, most options are sketchy, and you’re never quite sure if the “low mileage” is real.
After weeks of manually: • Filtering Kijiji, AutoTrader, Facebook Marketplace, and dealership sites • Copy-pasting prices into Excel • Sorting by price-per-kilometer • And then doing it all again the next day…
I thought:
“Why not build something that does this for me?”
So I did.
⸻
✨ What I Built
Think of this as your personal car-buying concierge—who works 24/7, doesn’t sleep, and judges every deal with cold, algorithmic precision. • 🕷️ Web scraper that pulls listings from major car marketplaces • 📊 Filters cars by make, model, year, mileage, price, and city • 💡 Calculates price/km and price/year to quantify value • 🧠 Machine Learning model to flag “good” deals based on historical data • 📈 Tracks average market price for comparison • 📬 Spits out a clean table of only the best deals • 🧼 Deduplicates reposted listings • 🌎 Optional filters: province, listing freshness, dealership/private
Now, instead of reviewing 300 listings, I only review the top 10 worth my time.
⸻
😫 The Roadblocks I Hit (and Swerved Around)
This wasn’t just “build a scraper and call it a day.” It was a full-scale pit stop operation.
🛑 1. Websites Change. A Lot.
Scraping is fragile. Facebook Marketplace and Kijiji are notorious for blocking bots and changing page structures on a whim.
Solution: • Used requests + BeautifulSoup where possible • Added retry logic and URL freshness checks • Flagged broken selectors with logs so I know when things need a fix • Built in “selector profiles” for each site that can be updated without touching core logic
⸻
📈 2. Defining a “Good Deal” Isn’t Straightforward
Just because a car is cheap doesn’t mean it’s a deal. If it’s 12 years old with 300k on it? No thanks.
Solution: • Built a scoring algorithm that factors: • Price per km • Price per year • Model popularity (e.g., Toyota Corolla vs. a Fiat) • Dealer vs. private seller • Then trained a simple ML model to classify deals as: • 🟢 Good • 🟡 Decent • 🔴 Overpriced
⸻
🔁 3. Too Many Duplicates
Dealers love reposting the same car 20 different ways. I hate it.
Solution: • Normalized VINs, descriptions, and prices • Added fuzzy-matching to detect reposts • Removed duplicates with a ~90% accuracy rate
⸻
📬 4. Surfacing the Good Stuff
I didn’t want to scroll through code or CSVs. I wanted a list of today’s best deals—ready to browse with my coffee.
Solution: • Final output is a clean HTML file or pandas DataFrame • Sorted by “score” with conditional formatting (green/yellow/red) • Can export to Excel or send to email if needed