Automating finding best car deals

“How I Used Web Scraping and Machine Learning to Save Hours Finding the Best Car Deals Online”

⸻

🧠 The Why (Problem)

Buying a used car online is like dating on Tinder—you scroll endlessly, most options are sketchy, and you’re never quite sure if the “low mileage” is real.

After weeks of manually: • Filtering Kijiji, AutoTrader, Facebook Marketplace, and dealership sites • Copy-pasting prices into Excel • Sorting by price-per-kilometer • And then doing it all again the next day…

I thought:

“Why not build something that does this for me?”

So I did.

⸻

✨ What I Built

Think of this as your personal car-buying concierge—who works 24/7, doesn’t sleep, and judges every deal with cold, algorithmic precision. • 🕷️ Web scraper that pulls listings from major car marketplaces • 📊 Filters cars by make, model, year, mileage, price, and city • 💡 Calculates price/km and price/year to quantify value • 🧠 Machine Learning model to flag “good” deals based on historical data • 📈 Tracks average market price for comparison • 📬 Spits out a clean table of only the best deals • 🧼 Deduplicates reposted listings • 🌎 Optional filters: province, listing freshness, dealership/private

Now, instead of reviewing 300 listings, I only review the top 10 worth my time.

⸻

😫 The Roadblocks I Hit (and Swerved Around)

This wasn’t just “build a scraper and call it a day.” It was a full-scale pit stop operation.

🛑 1. Websites Change. A Lot.

Scraping is fragile. Facebook Marketplace and Kijiji are notorious for blocking bots and changing page structures on a whim.

Solution: • Used requests + BeautifulSoup where possible • Added retry logic and URL freshness checks • Flagged broken selectors with logs so I know when things need a fix • Built in “selector profiles” for each site that can be updated without touching core logic

⸻

📈 2. Defining a “Good Deal” Isn’t Straightforward

Just because a car is cheap doesn’t mean it’s a deal. If it’s 12 years old with 300k on it? No thanks.

Solution: • Built a scoring algorithm that factors: • Price per km • Price per year • Model popularity (e.g., Toyota Corolla vs. a Fiat) • Dealer vs. private seller • Then trained a simple ML model to classify deals as: • 🟢 Good • 🟡 Decent • 🔴 Overpriced

⸻

🔁 3. Too Many Duplicates

Dealers love reposting the same car 20 different ways. I hate it.

Solution: • Normalized VINs, descriptions, and prices • Added fuzzy-matching to detect reposts • Removed duplicates with a ~90% accuracy rate

⸻

📬 4. Surfacing the Good Stuff

I didn’t want to scroll through code or CSVs. I wanted a list of today’s best deals—ready to browse with my coffee.

Solution: • Final output is a clean HTML file or pandas DataFrame • Sorted by “score” with conditional formatting (green/yellow/red) • Can export to Excel or send to email if needed