About
A Data Engineer with over 4 years of experience in banking: designing and automating data pipelines using Python, PySpark, Airflow, dbt, Docker, CI/CD GitHub Actions, SQL, and Google Cloud Platform (BigQuery, Cloud Storage, Cloud Composer, Dataproc). Ensuring data flows seamlessly from source to insight.
I believe in empowering organizations to achieve more in a competitive data-driven world by building reliable, scalable, and future-ready data systems.
Experiences
2025 — Present
Big Data Engineer · Bank Negara Indonesia
- Engineered end-to-end big data pipelines using IBM DataStage to ingest multi-source data into the Bronze layer, ensuring consistency and reliability across 10M+ records daily.
- Designed and implemented Silver and Gold layer transformations with PySpark and Hive/Impala QL, boosting data processing performance by 40% and enabling faster downstream analytics.
- Built scalable datamarts and database models to empower data analysts and scientists in applying business logic seamlessly, cutting query time by 60%.
- Optimized Python scripts and modularized reusable components to eliminate repetitive logic, reducing maintenance workload by 30%.
- Deployed, monitored, and tuned Spark workflows on Cloudera Machine Learning (CML), achieving 99.8% job reliability across scheduled data pipelines.
- Explored and implemented modern data tools (Airflow, dbt, Docker, Google Cloud Storage, Google Cloud Composer, Google BigQuery, GitHub Actions) to prototype modular ELT/ETL workflows and CI/CD-style data testing for analytics automation.
2023 — 2025
Business Analytics · Bank Negara Indonesia
- Transformed the process of lead generation and monitoring from a manual process (run the script every time) into a fully automated ETL pipeline. The solution included features such as auto-password-protected Excel files, FTP delivery, and email notifications; resulting in a 60% reduction in processing time and a significant boost in productivity.
- Designed and implemented ETL workflows for lead delivery to the in-house Digisales channel, including handling massive sales assignments. This automation reduced manual intervention by 70%, enhancing operational efficiency and accuracy.
- Developed and deployed automated data pipelines using Python and PySpark, efficiently processing over 1 million rows daily to ensure seamless data flow and availability for business-critical operations.
- Optimized high-speed fuzzy string matching for Big Data using RapidFuzz and Multiprocessing for parallel processing. The solution achieved a 30% increase in matching accuracy and reduced processing time by 50%, enabling efficient handling of datasets with over 2 million records.
- Developed an automated SQL query retry mechanism using Python to handle query execution failures. This implementation reduced query failure resolution time by 80% and increased workflow reliability by ensuring 100% query execution success without manual intervention.
2021 — 2023
Data Analytics · Bank Negara Indonesia
- Developed automation bash scripts for seamless data transfer between big data ecosystems and FTP servers, resulting in a 50% reduction in manual data handling time, improving accessibility and efficiency.
- Leveraged big data tools like Hadoop, Impala, Hive, Spark, and Python to efficiently process data, reducing processing time by 30% and enabling faster insights and decision-making.
- Orchestrated the preparation and management of datamarts for data scientists, using complex SQL queries and Cloudera Data Science Workbench.
- Maintained documentation of scripts and workflows to support ongoing optimization and knowledge transfer.
2020
Data Science & Machine Learning ·
Purwadhika Digital Technology School
Transitioned into the data and programming world with excitement, discovering my strength in embracing challenges and learning. Found my passion and uniqueness, which still drives me today.
2017 — 2020
Inspector · Jaya Construction Management
Before transitioning to tech, I leveraged my civil engineering background in a construction management role. I gained experience in project management, leadership, and coordinating contractors while overseeing timeline, cost, and quality control. This role emphasized the value of organization, teamwork, and delivering results efficiently.
Projects
Sales Summary ELT Pipeline using Composer, dbt, CloudStorage, BigQuery, CI/CD GitHub Actions
This project is an end-to-end (including CI/CD) E-Commerce ELT pipeline built on Google Cloud Composer, powered by Airflow 2.x. It’s designed to mimic a production-ready workflow but fully local for anyone who wants to understand how a modern ELT pipeline works end-to-end. Extracting from 3 different data sources like API, Postgres, and Google Cloud Storage.
Scale-up Banking Loan Risk ELT Pipeline using Airflow, PySpark, dbt, BigQuery, CloudStorage
This project is my take on building an end-to-end big data pipeline for a banking loan system from raw data to analytics-ready tables. The story starts with a simple question: “How do banks process millions of loan records daily and detect risky borrowers?” To explore that, I built this pipeline that mimics a real-world financial data flow.
Massive Lead Assignment to Sales ETL using Pandas, PySpark, Hive, and Cloudera Machine Learning
Manually managing lead distribution is time-consuming and inefficient, especially when dealing with unevenly assigned leads across regions and branches. This labor-intensive process is prone to errors and requires constant supervision. This project automates lead assignment using a round-robin process, dramatically reducing the need for manual intervention.
Automated Data Processing with Encryption, Send to FTP, and Notify via Email
This script automates the process of encrypting an Excel file, uploading it to an FTP server, and notifying recipients via email. It is designed to simplify repetitive tasks, improve data security, and ensure efficient communication.
High-Speed Fuzzy String Matching for Big Data
String matching in massive datasets can be painfully slow and inefficient when using traditional methods. This project addresses the problem by leveraging RapidFuzz for fuzzy matching, Multiprocess for parallelized processing, and PySpark for handling big data seamlessly. The result is a high-speed solution designed to efficiently process and match millions of records.
Automatic CDSW File Downloader Using Selenium
Imagine manually downloading over 10 monitoring reports weekly from Cloudera Data Science Workbench (CDSW/CML)—a time-consuming and repetitive task. Navigating through project directories and manually clicking download is inefficient and error-prone. Automating this process would save significant time and reduce errors.
Auto-retry SQL Query Execution
When running SQL queries through Hue for Impala or Hive, encountering errors is frustrating. Each time a query fails, I had to manually hit the 'run' button or press Ctrl + Enter repeatedly, which is time-consuming but also mentally exhausting. Automating this process would be far more efficient, ensuring queries retry automatically—giving me peace of mind that the execution will succeed without constant supervision.
Optimize LendingClub's Profit
with Analysis & Machine Learning
Investors made money from loan interest, while LendingClub (LC) made money by charging borrowers an origination fee and investors a service fee. However, investors still have risks like credit and liquidity risks. If investors don't get their interest return, LC does not get money and profit from service fees. The project aims to find out the characteristics of borrowers who stop repaying and how to optimize LendingClub's profit.
End-to-End Customer Segmentation and
Multi-Product Leads SQL Pipeline
The bank's marketing team needs an efficient way to target high-value customers with personalized financial products. The manual process of gathering customer data is time-consuming and prone to errors. This project built a SQL-based pipeline for customer segmentation and generating multi-product offerings, streamlining the identification of high-value customers and providing them with personalized product recommendations.