As a tech enthusiast and an follower of the “eat your own dog food” philosophy, I’ve embarked on a project that intertwines my passion for cars with my profession in Machine Learning. In this initial development log, I’d like to share the progress of a tool that’s no longer just a concept but a practical application of data-driven decision-making. 🛠️📊
The Beginning 🌟
My journey began with Autoscout.ch, a treasure trove for car enthusiasts and a perfect source of data for a machine learning project. It offers a diverse range of car listings, each rich with details essential for a robust analysis. 🚘🔍
Building the Crawler: Python & Playwright 🐍🎭
To extract the data, I developed a crawler using Python and Playwright. Python’s versatility and Playwright’s ability to automate web interactions made them ideal choices. The crawler was meticulously designed to navigate through listings, capturing key details such as age, mileage, model, color, and horsepower, along with the type of sale – private or through a garage. 📈👨💻
Data Analysis and Storage: Duckdb + Parquet 🗃️📚
The next step was managing this influx of data. I used Duckdb as an efficient analytical database, it allowed me to store and query large datasets with speed, making the data preparation phase both smooth and efficient. The data is stored in Parquet files, that I can directly query using SQL. 🖥️💾
The Core: Machine Learning with scikit-learn 🧠🤖
The heart of this project is the machine learning model, built using scikit-learn. I created models that considered a multitude of factors – from the obvious ones like age and mileage to the more nuanced like color and model type and variations, private or garage sale. The objective? To estimate the market price of current models accurately. 🏷️📈
The most exciting part of this project is its practical application. By feeding the model current and past market data, I’ve been able to identify listings that are potentially under-priced – a boon for both car enthusiasts and car buyers. This model doesn’t just give a glimpse into the market; it offers actionable insights. 🚀🎯
Looking Forward 🤖🔄
The execution is still manual, and I hope to automate this as soon as possible. I now have a curated list of interesting car listings and it’s already showing its value. This list isn’t just a compilation of cars; it’s a reflection of data-driven decision-making, combining the thrill of car hunting with the precision of machine learning. 🚗💡
This project stands at the intersection of my personal interests and professional skills. As this journey continues, I look forward to sharing more insights and developments. Stay tuned for the next DevLog! 🌟📝