Projects
A list of projects I have been working on or built
End-to-End ELT Pipeline Implementation Using dbt, Snowflake, and Airflow
The project focuses on the development and deployment of an ELT (Extract, Load, Transform) pipeline utilizing industry-standard tools such as dbt (data build tool), Snowflake, and Airflow. The pipeline is designed to handle the transformation and loading of data from source tables to final data marts, ensuring efficient data processing and robust data management.
Dr. Semmelweis and the Discovery of Handwashing
Reanalyzed the data behind one of the most important discoveries of modern medicine: handwashing. In 1847, the Hungarian physician Ignaz Semmelweis makes a breakthough discovery: He discovers handwashing. Contaminated hands was a major cause of childbed fever and by enforcing handwashing at his hospital he saved hundreds of lives. In this Python project, I reanalyzed the medical data Semmelweis collected. This project was done as part of the DataCamp Data Science with Python Career Track.
Data Engineering Project using Sales Data
Data Engineering in Hadoop using Cloudera. Performed the principle tasks involved in managing, loading, extracting, and transforming data. This project respository holds the scripst that I wrote during the whole project. The project was done in Cloudera using Hadoop.
Stock price prediction with Apache spark and cassandra
This is a data pipeline for predicting stock prices using Apache Spark, Apache Cassandra, and machine learning techniques. It collects and preprocesses stock data from Alpha Vantage API, engineers features, trains models, and performs data analysis and predictions.
The GitHub History of the Scala language
Find out who has had the most influence on its development and who are the experts. Explore the evolution of the Scala language through its vibrant GitHub history. This is a comprehensive collection of historical data, commits, issues, and pull requests related to the development of Scala, a modern, multi-paradigm programming language.
Python Flask AI translation service
This is a web app made using Python-Flask framework that integrates the AI cognitive service of Azure.
The Forex Data Pipeline with Apache Airflow
The Forex Data Pipeline is a comprehensive solution designed to collect, process, and prepare currency exchange rate data for downstream machine-learning pipelines. This repository showcases the creation of a data pipeline that fetches currency rates from an external API, performs data transformation using PySpark, and loads the processed data into a Hive table within the Hadoop Distributed File System (HDFS). The primary goal is to provide clean and structured currency rate data for seamless integration into subsequent machine-learning workflows.
A visual history of Nobel Prize winners
The Nobel Prize is perhaps the world's most well-known scientific award. Except for the honor, prestige, and substantial prize money, the recipient also gets a gold medal showing Alfred Nobel (1833 - 1896), who established the prize. Every year it's given to scientists and scholars in chemistry, literature, physics, physiology or medicine, economics, and peace. The first Nobel Prize was handed out in 1901, and at that time, the Prize was very Eurocentric and male-focused, but nowadays it's not biased in any way whatsoever. Surely. Right?
Data Analysis Project: Stock Price Analysis and Forecasting
This repository contains the code and analysis for my data analysis project on stock price analysis and forecasting for my Internal attachment at Jomo Kenyatta University of Agriculture and Technology. The project analyzes historical stock price data, visualizes trends, and develops a forecasting model using Python and data science techniques.