Ha.nnes.dev
Johannes Smit
Experienced data engineer in Taiwan with a background in statistics and mathematics.
Employment
- November 2023 to Present: Data Engineer, Mobagel (Taipei, Taiwan)
- Optimised raw-data loading pipeline to reduce costs by 90%, enabling onboarding a new client with 10 times more data than all previous clients.
- Lead the design and development of a new data pipeline and client-facing dashboard to bring insights about traffic sources to clients, while working closely with product team to iterate quickly on features.
- October 2021 to November 2022: Data Engineer, Sainsbury’s (London,
UK)
- Maintained and developed ELT pipelines processing structured and semi-structured data from a variety of data sources including Kafka, SFTP and S3. (Airflow, Snowflake, AWS, Terraform)
- Was part of a team to design and develop the data engineering department’s new standardised data pipeline tool, integrating dbt with Airflow to allow for significantly faster development times and enabling efficient cross-team collaboration. (dbt, Airflow, GitHub Actions, Python)
- Designed and built a robust application to download and process large (10GB) JSON files that couldn’t fit in memory by streaming the data in chunks, reducing memory requirements and preventing crashes. (Python, boto3)
- Identified and fixed DevOps issues across teams, implementing solutions to prevent future issues through CI and encouraging software engineering best practices. (GitHub Actions, Circle CI)
- July 2019 to October 2021: Strategic Consultant (Data engineering
and data science), Amey Strategic Consulting (London, UK)
- Technical lead of a GIS (Geographic Information System) dashboard
creating insights into operations and tracking KPIs for a local county
council.
- Built an accessible interactive UI using maps and graphs to empower users to explore data. (React, Mapbox GL JS, Plotly)
- Deployed and maintained a robust backend to power the UI while keeping a high uptime and remaining secure. (Flask, SQLAlchemy, Alembic, AWS, Terraform)
- Created ETL pipelines to load data from APIs, emails and MQTT messages into a central database. (Python, Airflow, Postgres)
- Identified an inefficiency in the client’s work allocation and developed a predictive machine learning model to enable data-driven planning.
- Designed and built a proof-of-concept application to manage and collect data from a network of custom IoT sensors in real-time while keeping power consumption at a minimum. (MQTT, Python, Postgres)
- Collaborated with the HR department to analyse the gender pay gap of
thousands of employees across all departments in the company. (Python,
R)
- The visualisations, statistics and forecasts I produced guided the company when becoming a living-wage employer.
- Developed a Python package to automatically perform these forecasts and trained an HR analyst with no prior Python knowledge to use the package to produce new data reports.
- Completed a focused two-week analytics sprint evaluating the output
of a contract with a local council.
- Collected user-stories and datasets from the client’s team to understand their processes and goals.
- Delivered an engaging presentation with insightful data visualisations highlighting key successes and opportunities for improvement.
- Joined a team during “crunch time” to successfully complete delivery of an interactive computer-vision dashboard to review dashcam footage, removing the need for a human co-driver. (Python, Azure ML SDK)
- Technical lead of a GIS (Geographic Information System) dashboard
creating insights into operations and tracking KPIs for a local county
council.
- June 2018 to September 2018: Data Science Services Intern, Advant
Analytics (Taipei, Taiwan)
- Created extensions for data cleansing and data mining in SPSS and
SPSS Modeller using Python and R.
- Added features quickly, made an intuitive and powerful interface for the extensions and documented the processes used.
- Wrote an in-depth tutorial on programming using SPSS Syntax from scratch, allowing new users to understand the important features of the language quickly and start analysing data.
- Created extensions for data cleansing and data mining in SPSS and
SPSS Modeller using Python and R.
Education
- 2015 to 2019: MMath Mathematics, 2:1 (University of Exeter, UK)
- Dissertation used regression and Gaussian processes to provide
human-understandable insights into to machine learning algorithms.
- Successfully modelled a neural network using multiple techniques in a way that simplified the complex internal system to well-understood statistical processes. (R, Keras)
- Other modules included statistical modelling, machine learning and spatio-temporal statistics.
- Dissertation used regression and Gaussian processes to provide
human-understandable insights into to machine learning algorithms.
Skills
- As well as being fluent in Python, SQL and Terraform, I develop
open-source projects in Julia, Rust and Roc as a hobby. My portfolio of
personal projects can be found on my website (
ha.nnes.dev
). - Languages: English (native), Afrikaans (fluent), Spanish (intermediate) and Mandarin (beginner).