● 7+ years of industry experience in Data warehousing, Data processing, Data modelling, Database development, Business Intelligence, database performance tuning including HADOOP ETL
● Expert in Databases-> Greenplum, Amazon Redshift (MPP database) ,Oracle ,Postgres
● Experienced in writing custom hadoop map reduce functions,processing unstructured log files, text files in Python
● Experienced in building Hadoop based Hive/Python ETL pipeline
● Expert in Advanced SQL, Advanced Analytical functions (windowing, partition by ranking, pivoting )
● Expert in Performance Tuning and Query Optimization using Partitioning, Indexing(global, local, bitmap, bitmap join), statistics, Virtual Private Database (policy functions to access local indexed partitions)
● Expert in Performance tuning Sql queries in Oracle, Redshift
● Expert in ETL mappings in PLSQL
● Expert in multiple ETL tools - Talend(Open Source ETL), Informatica, Oracle Data Integrator, Datastage
● Experienced in leveraging Python to push pull data from RDMS to Hadoop and vice versa
● Experienced in Python and Unix Shell scripting
● Knowledge of Apache Hadoop Cluster planning including choosing the Hardware and operating systems to host a Apache Hadoop cluster
● Experienced with Reporting tools - OBIEE (Oracle Business Intelligence), Tableau
● Knowledge on Oracle Exadata database machine and Exadata Storage server
● Continuous learner of new technologies
Passion Python Projects:
● Parsed and extracted information from HTML files, text file
● Parsed Log files, extracted URL info and downloaded the contents of the URLs
● Performed Hadoop Streaming with custom mapper and reducer functions for processing text file
● Built a web page for movies which plays movie trailers
Lead Data Engineer @ From March 2015 to Present (10 months) Lead Data Engineer @ From 2015 to Present (less than a year) Datawarehouse Engineer @ Hightail is a Cloud file sharing/storage midsize company.I am designing a robust, scalable Dataware house platform to suffice the tableau dashboards as well as the adhoc reporting needs from multiple sources which includes legacy OLTP and NoSQL systems and responsible to provide key KPIs, sales leads to the Sales/Finance team as well as to the key Stake holders.
• Work extensively on Amazon Redshift and Talend ETL to build a scalable/optimized metadata DW layer
• Building a Scalable Data extraction framework on Talend to pull data from multiple sources such as MySQL, Splunk (SDK), Cassandra (Key Value based NoSql Database) and Flurry(REST API)
• Work directly with Senior Sales executives to understand and deliver their reporting / KPI needs
• Developed and delivered Key User Activity, User Registration, User volume metrics across the globe to the Enterprise customers
Environment: Amazon S3, Amazon Redshift, SQL, JIRA, Talend, Tableau From June 2014 to March 2015 (10 months) Data Analytics Engineer(Contractor) @Pandora @ Worked on building a Hive/ Python based ETL pipeline
- worked extensively on HIVE QL
- performed hadoop streaming with custom python transform within HIVE for data transformation
- wrote unix shell script for triggering the ETL pipeline through cron
- extensive communication with the SME on a daily basis to convert business reqs into technical reqs From April 2014 to June 2014 (3 months) Database Developer - Oracle, Hadoop ETL (Contractor)@Macys @ This Project requires me to write and enhance python jobs to build the data pipeline to move data from Oracle/DB2 env. to hadoop and vice-versa using the Python Framework . This also involves supporting the email assembly java team with oracle DB support.
Responsibilities
● Code python jobs to fetch data from multiple business environments (based on Oracle) to Hadoop ( 20 node Hadoop cluster)
● Convert pre-existed unix shell jobs
● Schedule python jobs using cron job and control-m
● Embed Hive, Sqoop ,Unix shell scripts in python jobs utilizing the python framework
● Build PLSQL procedures/function/triggers to implement business logic and load data into Oracle DB as part of the data feed process for the Java Application team From October 2013 to March 2014 (6 months) Datawarehouse Team Lead Nisum Technology@GAP Inc @ This project is based on building the DW - ETL architecture for the current OMS application for the online shopping platform for Gap Inc. Use Datastage for populating the Staging tables. Build PLSQL procedures (Implement Business Logic) for populating the dimension and the FACT tables. Use Espresso scheduler tool for scheduling the Datastage and the PLSQL jobs.
● Lead a team of five developers at Offshore.
● Build Data stage ETL jobs for Staging tables
● Build PLSQL procedures to implement business logic and populate the FACT and Staging tables.
● Provide Performance tuning solutions for the PLSQL procedures.
● Closely work with Application team to understand the Source side schemas and tables
● Use Espresso scheduler tool to schedule the Datastage and PLSQL jobs. From June 2013 to October 2013 (5 months) PL/SQL Analyst / OBIEE Analyst @ ● Develop PLSQL Packages(processes for fixing IB data) for Install Base Reconciliation project
● Test and deploy above for all releases of IB Recon project
● Performance tune queries using cursors ,bulking techniques, parallelism, Indexes
● Write PL/SQL wrappers to validate data on IB SA Data for identifying Duplicate/Invalid/Wrong Leading Serial numbers, invalid installed sites, Duplicate coverages, Decommissioned lines etc
● Perform adhoc OBIEE report analysis using answers tool
● Work closely with the business Analysts to convert the business requirements into technical requirements and to ensure that correct source table attributes are identified as per dimensional data modeling(fact table attributes and Dimensional table attributes) for OBIEE Reporting
● Analyze and provide accurate requirements for Dimensional hierarchies, Aggregate tables for Drill-down and aggregate Provide navigations
● Work extensively on shared hierarchy to implement Data Level Security and Object Level Security
● Work with the ETL team to make sure that all ETL jobs are run on time so that refreshed data is available for OBIEE reporting
● Perform regression and User acceptance testing for OBIEE dashboards and Reports
● Performance tune slow running OBIEE reports by gathering statistics, creating aggregate tables/materialized views, ,caching techniques, indexes, partitions From May 2011 to June 2013 (2 years 2 months) Software Engineer - OBIEE / ETL Developer @ • Used OBIEE Best Practices for creating OBIEE Repositories and Dashboards.
• Responsible for creation, validation and testing the repository objects before delivering it to the users.
• Built the Physical Layer, Business Model and Mapping Layer, Presentation Layer of a Repository by using Star Schemas
• Worked extensively on Dashboards/Answers to create the business intelligence reports as per client requirements.
• Worked with System Engineering team to design - Facts & Dimensions for Repository.
• Define appropriate joins, calculations, dimensions and measures in the repository.
• Created Dimensional Hierarchy, Level based Measures and Aggregate navigation in BMM layer.
• Setup Usage tracking option to monitor frequently run reports in order to provide caching for the reports and perform database optimization by analyzing query performance.
• Created reports and prompts with navigation capabilities and used them in creating dashboards.
• Created charts for OBI reports and BMM Design documents for all the Business models.
• Worked on merging Rpds and web catalogues
• Worked on changing logo for Dashboard
• Taking sign off for all functional and technical design documents and being responsible for further changes in the design.
• Developed Informatica Mappings using various Transformations.
• Created and Scheduled Informatica workflows to run at constant intervals
• Performed Informatica repository merging
• Build Documents on OBIEE Reports, RPD development, Dashboard development, RPD merging and catalogue merging
• Document routine tasks like ETL process flow diagrams, mapping specs, source-target matrix and unit test documentation From January 2010 to April 2011 (1 year 4 months) Research Assistant (RA) @ From September 2007 to August 2009 (2 years) Wichita, Kansas AreaSoftware Engineer @ Client: LISCR, VA
OBIEE Developer
The Liberian Registry is administered by the Liberian International Ship & Corporate Registry a U.S. owned and operated company that provides the day-to-day management for safe and secure shipping as well as its proficient administration of one of the most convenient, efficient, and tax effective offshore corporate registries in the world. The Enterprise Data Warehouse project aimed to design and construct a consolidated Data Warehouse from an integrated Supplier Information System (SIS) and a central billing system to establish a robust and cost effective supplier management process.
Client: ABN Amro Bank, Amsterdam, the Netherlands
Siebel Analytics Developer
ABN Amro Bank provides a comprehensive range of financial services; commercial banking; corporate, investment banking and markets, private banking and other activities. As a Siebel Analytics Developer, I was responsible for the Design, Development and Deployment of the RPDs from the Data Mart and generation of various reports. The system maintains the company income and expenditure details and it also generates detailed reports regarding financial transactions, sales details, and purchase details, operating costs. From July 2006 to July 2007 (1 year 1 month) Mumbai Area, India
Masters Degree, Computer and Electrical Engineering, 3.64 @ Wichita State University,Kansas From 2007 to 2009 Bachelor of Engineering, Electrical Engineering @ National Institute of Technology,Silchar From 2002 to 2006 Partha Deka is skilled in: Business Intelligence, Data Warehousing, Data Marts, Star Schema, OLTP, OLAP, Informatica, PL/SQL, OBIEE, Exadata, Hadoop Analytics, Big Data, Databases, Data Modeling, Hadoop