big data engineer Archives

Professional Summary

I am a person who believes that value of a person is recognised by the work one does and thus I am working towards acquiring the skills needed to excel in my field. I expect myself to strive for excellence in it.With an experience of approximately 7.5 years in different technologies like Hadoop, Hive, Pig, Hbase, DB2 LUW DBA,Oracle DBA, Python, Netezza, Shell/Perl Scripting, Apache Spark, Scala, BigSQL and Tableau. I have cultivated technical expertise and I am a highly contributing team player capable of handling multiple responsibilities

Employment history

Big Data Engineer, Bins LLC. Pierrefort, Connecticut

Mar. 2018 – Present

Responsible for handling almost 20 TB of data on daily basis, from 40 different source like DPI, CDR/VDR related data, Signalling,Browsing, Tealeaf and other applications related data in Hadoop ecosystem.
Responsible for enriching, transforming and loading the structured and unstructured data into hive and HBase tables using Pyspark, Scala and shell scripting.
POC for the project “City in the Motion” using Docker, Pyspark and Shell Scripting to generate realtime subscriber details using their location data.
Foot Fall analysis using subscriber location,Nationality and Age Group for different Malls, places and there frequent visitor based on different client requirements.
Worked on Complex sql queries required for Use cases based on the clients requirement.
Data Governing and Data Quality check before and after deployments based on use case and for tuning the data from different source files.
Perl scripting and shell scripting to automate most of the daily task, related to performance, merging chunk of files on daily basis, auditing files and to automate Spark jobs.
Mostly worked on IBM BigSQL, Hive, HBase, MongoDB, Docker, Python, Pyspark, Scala, Spark SQL, Shell/Perl Scripting.

IT Analyst, Emmerich-Moore. New Chanda, Hawaii

Aug. 2016 – Nov. 2016

Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase NoSQL database and Sqoop.
Hands on experience on populating data through Hive Queries, Joins, Partitioning and Bucketing, Managed and External tables.
Works on experience in spark to write real time streaming data analysis program using Python.
Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
Developed Scripts and Batch Job to schedule various Hadoop Program.
Prepared shell scripts for executing Hadoop commands for single execution.
Experience in development of extracting, transforming and loading (ETL), maintain and support the enterprise data warehouse system and corresponding marts.
Handle vast level of DB2 servers for more than 30 applications.
Shell scripting to automate database level backup, tables tuning, performance enhancement, error troubleshooting.

Database Engineer, Bernier LLC. New Sherlynmouth, Idaho

Jun. 2015 – Nov. 2015

Handle vast level of DB2 servers for more than 12 countries.
Data analytical work during the peak sale time based on the customer purchasing using historical trend.
Build recommendation system using ML and Python and SQL.
Responsible for generating order calculation and order forecasting for a day.
Churn Analysis of Tesco online stores based on customers 6 months activities.
Shell scripting to automate database level backup, tables tuning, performance enhancement, error troubleshooting.
Worked with application team, to prepare the SQL query based on their requirements.
Have experienced in any kind of issues related to lock escalation, application dependency and resources management across the server, high availability disaster recovery, Primary/Standby servers, 24/7 database up and running and escalation which may impact the business.

Software Engineer Analyst, Bradtke-Schuppe. Emardfurt, New Jersey

May. 2013 – Dec. 2013

Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Worked on analysing Hadoop cluster and different big data analytic tools including Pig, Hbase NoSQL database and Sqoop.
Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different database to Hadoop.
Hands on experience on populating data through Hive Queries, Joins, Partitioning and Bucketing, Managed and External tables.
Implemented in setting up standards and processes for Hadoop based application design and implementation.
A Database Administrator specialising in IBM DB2 10.1 LUW with various aspects including creation and maintenance of databases for highly demanding environments.
Understanding of database access and ability to assist developers in implementing efficient SQL code.
Collecting database resource & CPU utilisation in busy hours & idle hours and doing maintenance through reorganising, collecting statistics & rebinding of packages.
Establish standards, controls, and procedures to ensure data integrity and security.
Automated various database works through Shell Scripting (Health Checkup, Secondary log utilizations, reorg, backup, Archival log, idle process finding, application connection threshold, rebind, Tablespace reclaim, load, export, import, restore, runstats).
Implemented automate failover using TSA/db2haicu on cluster level.
Good knowledge on Explain Plan and db2advis, OPTIM and DSM tools to work with historical data, Table functions, Administrative Views & Event Monitors.
Monthly Capacity Planning report for different database servers

Education

Western Schuppe, Gusikowskiport, Wyoming

Bachelor of Science, Computer Science Engineering, Aug. 2011

Skills

SQL

AWS

Microsoft Azure

IBM DB2

Apache Pig

Hive

Shell/Perl Scripting

Pyspark

Scala

Python

Professional Summary

Hands-on, successful Software Engineer with experience of delivering appropriate technology solutions for Cloud platform. Comprehensive knowledge of platform development, enterprise architecture, agile methodologies and cloud services. Keen to provide optimized and precise solutions, fulfilling customer’s requirements.

Employment history

Big Data Engineer, Turcotte, Beier and Fahey. East Bruceshire, Vermont

Jan. 2019 – Present

Collaborate with engineers or software developers to select appropriate design solutions or ensure the compatibility of system components.
Communicate with staff or clients to understand specific system requirements.
Provide advice on project costs, design concepts, or design changes.
Develop application-specific software.
Compile and write documentation of program development and subsequent revisions, inserting comments in the coded instructions so others can understand the program.
Write, analyze, review, and rewrite programs, using workflow chart and architecture diagram.
Evaluate existing systems to determine effectiveness and suggest changes to meet organizational requirements.

Mobile Application Developer, Leffler, Roob and Maggio. West Omer, South Dakota

Oct. 2014 – Mar. 2015

Coordinated with systems partners to finalize designs and confirm requirements.
Designed strategic plan for component development practices to support future projects.
Developed work-flow charts and diagrams to ensure production team compliance with client deadlines.
Proposed technical feasibility solutions for new functional designs and suggested options for performance improvement of technical objects.
Established compatibility with third party software products by developing program for modification and integration.
Consistently met deadlines and requirements for all production work orders.

Mobile Application Developer, Corkery, Upton and Schuppe. East Domenichaven, Nevada

Feb. 2013 – May. 2013

Modify existing software to correct errors, to adapt it to new hardware, or to upgrade interfaces and improve performance.
Developing application as per UI/UX and wire-frames.
Bug fixing and releasing updates for new application version.

Education

Northern Bahringer Academy, South Toya, Nebraska

Bachelor of Science, Computer Sciences, Jul. 2011

Personal info

Phone:

(000) 000-0000

Email:

andrew_smith@example.com

Address:

287 Custer Street, Hopewell, PA 00000

Skills

JSON

Python

CI/CD Processes

DevOps

Apache Spark

AWS Cloud Platform

AWS Code Pipeline

AWS Lambda

AWS Kinesis Stream

Reporting, Development and Database Environments.
Worked on developing a new system that was used as an interactive framework developed to configure templates, execute setups of all records and basic configuration of the processing engine. Store various resources like images, design, links, etc and merge them as required to produce the template.
Performed UAT testing by making test cases and using XML files to perform testing.
Worked closely with the testing team and later on provided the support for hand over.

Design, developing high-volume, low-latency, applications for mission-critical systems.
Deliver high availability and performance.
Development of the Web and backend application for BPO/KPO involving in Sales and Insurance Business model.
Contribution in all phases of the Application Development life cycle
Writing well designed, testable and efficient code.
Prepare and produce releases of software components.
Support continuous improvement by investigating alternatives and technologies and presenting these for architectural review.

Built Streaming services for applying the processing logic on generated samples using Flink
Developed customized Flink JDBC connector
Developed Websocket and REST applications for feeding the WebUI
Deploy services as containers on Nomad and Consul cluster equipped with the Service Discovery

Worked with a couple of teams to design the architecture of this Big data application.
Used Java to implement the Levenshtein distance algorithm to map software names that are manually entered by users to system-generated names.
Implemented Spark application using Scala to process and analyze 500+ GBs of data daily.
Implemented Hive queries, joins and UDFs to cleansing on data and storing it on the table.
Scheduled spark application using shell scripting with several dependent data pipelines scripts.

Developed module for integrating mongodb for statistics module
Developed module for the data acquisition from analyser
Worked with S3 module for integrating content to S3 using python
Working on the upgrade of GLens Product

Developed and executed plans to monitor standard process adherence.
Assemble large, complex data sets that meet functional / non-functional business requirements e.g. Watsons, Robinsons, Waltermart, Mercury Drug, etc. (DATA DOWNLOAD, CLEANSING, TAGGING, BINDING, HARMONIZATION, AND ANALYSIS)
Oversees data planning, field work; product scanning store by store, sales and inventory; prepares reports, and communicates findings and recommendations to line and senior management.
Mostly worked on IBM BigSQL, Hive, HBase, MongoDB, Docker, Python, Pyspark, Scala, Spark SQL, Shell/Perl Scripting.

Responsible for handling almost 20 TB of data on daily basis, from 40 different source like DPI, CDR/VDR related data, Signalling,Browsing, Tealeaf and other applications related data in Hadoop ecosystem.
Responsible for enriching, transforming and loading the structured and unstructured data into hive and HBase tables using Pyspark, Scala and shell scripting.
POC for the project “City in the Motion” using Docker, Pyspark and Shell Scripting to generate realtime subscriber details using their location data.
Foot Fall analysis using subscriber location,Nationality and Age Group for different Malls, places and there frequent visitor based on different client requirements.
Worked on Complex sql queries required for Use cases based on the clients requirement.
Data Governing and Data Quality check before and after deployments based on use case and for tuning the data from different source files.
Perl scripting and shell scripting to automate most of the daily task, related to performance, merging chunk of files on daily basis, auditing files and to automate Spark jobs.

Responsible for the documentation, design, development, and architecture of Big Data applications.
Process customer viewership payload for (Telus International) Canada Origin Telecommunication (Client).
Develop application for ETL (Extraction, Transformation, and Loading) process.
Develop applications for processing structured payload at higher speed and real time.
Perform the testing with the given scenarios anddummy sample inputs by Business.
Maintain data security and privacy by implementing Kerberos Authentication and Authorization in application

Started my career with this startup were I was a part of a team that worked on providing big data related solutions for a fortune 500 retail US giant as well as creating our own automated big data product for Automated data ingestion and data processing.
Worked on technologies like: Hadoop, Hive, Java, Maven, Sqoop, Apache Flume, Apache Velocity, Apache Pig, Azkaban, Bash.
Worked with cloud based environments like Google Cloud Platform’s DataProc cluster as well as On-Prem clusters like HDP and Cloudera.
Working on DR sync with PROD environment.

Write, analyze, review, and rewrite programs, using workflow chart and architecture diagram.
Part of Change management team which reviews CR’s going in all Hadoop environments.
Worked on Service Improvement activity to reduce recurrent failures, reducing batch run time of application.
Conducted review meeting with Dev team to highlight Data/Logic issues in the application.

Developed Map-reduce jobs for Processing the
Write, analyze, review, and rewrite programs, using workflow chart and diagram, and applying knowledge of computer capabilities, subject matter, and symbolic logic.Big Data
Designed And Developed Oozie WorkFlows for Daily Scheduiing
Querying and the Hive Data Warehouse and involved in effective Partitioning

Provided status report of team activities against the plan or schedule, inform task accomplishment, issues and status. Coordinated and track the reviews, documentation, test activities
Coordinated meetings with the functional managers to discuss project impediments, needed resources or issues/delays in completing the task.
Worked on designing and developing the matching algorithm solution using Spark and Scala.• Leveraged SQOOP, Spark SQL, and Scala to build a robust pipeline for a data repository that is made in HDFS.
Helped and trained to develop the team members. Ensured deliverable is prepared to satisfy the project requirements, schedule.

Worked on Apache Nifi for data ingestion, data orchestration,data routing.
Designing and developing complex Big data pipelines to get data from various external sources and help in providing insightful data.
Performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
Experience on AWS services like S3, EC2, DynamoDB and Lambda.
Trained freshers by hosting sessions/talks on new technologies

Maintenance of and development of the Big Data infrastructure for querying Turk Telekom’s customer data consists of 40 million+ users and 1 billion transactions per day
Development of internal query tools for marketing, visualization and customer behaviour on top of Hadoop ecosystem(Hive, impala)
Development of real time spatial visualisation using Hadoop, Spark and Kafka
Churn prediction

As a BigData Engineer, responsible for the development and support of big data projects including requirement analysis and cross-functional team interactions.
Currently taking care of multiple application build for Barclaycard Business on Cloudera Hadoop ecosystem
Taking handover of new application coming in production from Dev Team.
Support live services (Incidents, Problem, change Management).
Development and deployment of the code of the new project.
Working with the Admin team for patching and CDH upgrade activities.
Status reporting of application in BAU to stakeholders.

Founding member of Consus R&D team to build data infrastructure event streaming for 800 million users per month
Led the organization-wide effort to make data lineage, data modeling standardization, and data dictionary.
Implementation of data ingestion in Apache Cassandra.
Daily Updates to the Director on the progress of development and tasks assigned.

Designed and built data processing pipelines using tools and frameworks in the Hadoop ecosystem
Designed and built ETL pipelines to automate ingestion data and facilitate data analysis
Built Streaming services including Window Processing using Flink
Built Batch services including customized transparent Thrift-Server on data stored on HDFS and Cassandra using Spark
Designed the Kafka Topic-partition and Cassandra schema regarding the processing criteria

Implement Java (map-Reduce) based process to cleanse the data before the indexing of the data.
Implemented Hive queries as a part of QA automation.
Implemented Python based program to enrich the data.
Setting up new cluster on Amazons Web Services (AWS) and maintaining for any issues.
Taking regular backup of the Indexed data on Amazon’s S3.
Transferring data from one cluster to another.
Writing Hbase queries to retrieve the data for other teams.

Responsible for data processing(ETL) and device risk analysis for over 4 million terminal in the Ministry of Public Security, which deals with the virus attack, network flow, business system access data on DataWorks of Alibaba Could Platform during the year Nov.2017 to Feb. 2019
Worked as a big data analyst using Scala on Spark platform analyzing our own company’s data, which includes browser history, software use (developing tools, chatting, gaming) data during the year of May.2017 to Nov.2017.
Worker as a big data model developer using Java programming language on the Hadoop platform to analyze user’s computer data, which also includes business system access during working hour and after work hour data and their key’s data( used to access business system). Finally, We caught more than 10 people stolen data from the business system during the year of Oct.2016 to May.2017
Started with building the testing cluster environment of Big data platform in the company, includes Hadoop, Hive, Elsticsearch, and Spark during the first month in the company during the year of Sep.2016 to Oct.2017

6780a1c2-f3fc-4213-bf9a-1de788093cb6

Andrew Smith

Professional Summary

Employment history

Education

Skills

4dac23db-a919-4a9a-b439-ce423efa0ce3

Andrew Smith

Professional Summary

Employment history

Education

Personal info

Skills

big data engineer

sr. big data engineer

big data engineer

big data engineer

big data engineer

big data engineer

big data engineer

senior big data engineer

big data engineer

big data engineer

big data engineer

big data engineer

big data engineer

big data engineer

big data engineer

senior big data engineer

big data engineer

big data engineer

big data engineer