Projects
A collection of data engineering projects and solutions I've built.
Projects marked with are sourced from GitHub
Machine Learning
Completed
Detecting & Classifying Fraudulent Ethereum Accounts
Developed a machine-learning framework combining supervised and unsupervised methods to detect fraudulent Ethereum accounts with >85% accuracy and <5% false positives, deployed as an interactive Streamlit app.
Python
Scikit-learn
TensorFlow
+7 more
Data Engineering
Completed
Real-Time Analytics Platform
Built a comprehensive real-time analytics platform processing 10M+ events per day using Kafka, Spark Streaming, and ClickHouse for sub-second query performance.
Apache Kafka
Spark Streaming
ClickHouse
+5 more
Data Engineering
GitHub
7 4
Stock Price Prediction Spark Cassandra
This is a data pipeline for predicting stock prices using Apache Spark, Apache Cassandra, and machine learning techniques. It collects and preprocesses stock data from Alpha Vantage API, engineers features, trains models, and performs data analysis and predictions.
Python
Apache Spark
Cassandra
+2 more
Data Science
GitHub
5
Stock Price Data Analysis
This repository contains the code and analysis for my data analysis project on stock price analysis and forecasting for my Internal attachment at Jomo Kenyatta University of Agriculture and Technology. The project analyzes historical stock price data, visualizes trends, and develops a forecasting model using Python and data science techniques.
Jupyter Notebook
Data
Data Analysis
+2 more
Data Engineering
GitHub
3
Dag Pipeline With Dbt
The project focuses on the development and deployment of an ELT (Extract, Load, Transform) pipeline utilizing industry-standard tools such as dbt (data build tool), Snowflake, and Airflow. The pipeline is designed to handle the transformation and loading of data from source tables to final data marts, ensuring efficient data processing.
Python
Data Science
GitHub
3
Product Network Analysis Using R
This Shiny web application analyzes product transactions to discover frequently purchased product pairs and visualize the relationships between them. The app uses association rule mining (Apriori algorithm) to identify frequent itemsets, and it applies community detection to find clusters of related products.
Data Engineering
GitHub
2 1
Fraud Detection Using Kafka Streams
This project demonstrates how to use Apache Kafka Streams to detect fraudulent activities by analyzing IP logs in real-time. By processing the streaming data, the system flags potential fraud by identifying suspicious patterns, such as repeated login attempts or access from unusual IP addresses.
Java
Data Science
GitHub
2
Retail Recommender System
The Retail Recommender System is a Shiny-based web application that provides recommendations for cross-sell opportunities using association rule mining. Built with R, it analyzes customer transaction data, extracts purchasing patterns, and generates rules for cross-sell recommendations.
Open Source
GitHub
2
Azure For Data Engineering
A Jupyter Notebook repository by Sabareh.
Jupyter Notebook
Open Source
GitHub
1
Hillwinds Data Engineer
The repo contains a Hillwinds data-engineer take‑home exercise defined in TAKE_HOME_ASSESSMENT (1).md
Jupyter Notebook
Open Source
GitHub
1
Technical Writing Training
This is a repository containing the course work that I have done in the course "Technical Writing: How to Write Software Documentation" on Udemy offered by Jordan Stanchev
Data Science
GitHub
1
Forecasting ML App
R machine learning application that performs forecasting on pharmaceutical medicine sales data using information obtained form NHS (UK) General Practitioner (GP) datasets.
Open Source
GitHub
1
Data Engineering Project Using Sales Data
Data Engineering Project using Sales Data in Hadoop using Cloudera
Data Engineering
GitHub
1
Using Python To Access Web Data
A Python repository by Sabareh.
Python
Data Engineering
GitHub
1
Alx Higher Level Programmingn
A Python repository by Sabareh.
Python
DevOps
GitHub
Alx Zero Day
I'm now a ALX Student, this is my first repository as a full-stack engineer
Shell
Web Development
GitHub
Blog
This is my home 🏡 on the internet. I am trying my best to document my data science journey.
JavaScript
Open Source
GitHub
Google Colabs
A Jupyter Notebook repository by Sabareh.
Jupyter Notebook
Data Engineering
GitHub
Sentiment Analysis Group Work
A Python repository by Sabareh.
Python