Data Engineering Tools and Technologies

🚀 My personal Hall of Fame of tools and technologies 🛠️ (Python ecosystem edition)

These tools and technologies have served me well over the years, so I thought I’d share them:

🔵 Click, Loguru: Creating command line interfaces
🔵 dotenv, pyyaml: Managing configuration
🔵 pytest: For making sure your code does actually what you want
🔵 Pandas: Extracting data from CSV/Excel/Parquet
🔵 MinIO, Parquet, SQLite, PostgreSQL, Snowflake: Storing structured and unstructured data
🔵 dbt: SQL management
🔵 Superset: Reporting and dynamic dashboarding
🔵 FastAPI: Building robust APIs
🔵 Streamlit: Rapid prototyping
🔵 Flask, Jinja2, Bootstrap CSS, JQuery, Gunicorn, Nginx: A simple yet powerful webstack
🔵 scikit-learn, lightgbm, pytorch: Batch ML for predictive analytics
🔵 River: Online ML for real-time insights
🔵 Docker: Seamlessly packaging applications for deployment
🔵 Jenkins: Orchestrating builds and automating workflows effortlessly
🔵 Git: Keeping code versioned
🔵 cron: Triggering tasks even when I’m sleeping 💤
🔵 Ubuntu: A rock-solid OS to build servers that run for years
🔵 DigitalOcean: Fast VMs in the cloud

If you’d like to hear a more detailed break-down of the why’s and how’s, let me know. 📩 hashtag#dataengineering hashtag#tools hashtag#productivity

Matt von Rohr
Matt von Rohr

#ai #datascience #machinelearning #dataengineering #dataintegration

Articles: 32

Leave a Reply

Your email address will not be published. Required fields are marked *

×