VISHAL CHERUPALLYVC

Built While Learning in Public.

Hands-on work that connects theory with implementation. Each project started with a real engineering problem and ended with something that taught me more than the problem itself.

Data Engineering

Completed

Hive to Unity Catalog Migration

Automated migration of a large Databricks codebase from Hive metastore to Unity Catalog using Python and LibCST for AST-level code transformations.

Parsed thousands of Python notebook files with LibCST

Automatically rewrote catalog references and table paths

Reduced manual migration effort from weeks to hours

Cloud / AWS

Completed

S3 Access Log Optimization

Redesigned PySpark jobs processing S3 access logs to reduce runtime and cloud spend through partition pruning, broadcast joins, and schema evolution handling.

Reduced job runtime by ~60% through partition-aware reads

Avoided full-scan anti-patterns with predicate pushdown

Added incremental processing to eliminate reprocessing costs

More projects will be published here as they are completed.