data engineer Archives

CREATED PYTHON PROGRAM FOR JOB MONITORING RUNNING IN HADOOP.
CREATED PROGRAM TO KEEP TRACK OF ALL TICKETS CREATED IN JIRA,TO TRACK THE AREA OF CONCERN FOR OUR TEAM.
Creating Data pipelines using SCDF on cloud foundry.
Deploying CDH cluster to process terabytes of data via Spark.

Design and build data processing pipelines using tools and frameworks in the Hadoop/Spark ecosystem
Implement and configure big data technologies as well as tune processes for performance at scale
Manage, mentor, and grow a team of big data engineers
Collaborated with other teams. Leading cross-teams solution integrations

Ability and interest in translating business requirements into technological visions, and re-imagining data-processing technologies
Working on ETL and data transformation of medical claims data
Design and support of the medical claims (Big) Data Warehouses
Delivering and implementing successful solutions for sustained client impact
Developing complex algorithms and proprietary analytics in hive/Python based on discussions with client and the service delivery team
Has supervised and mentored colleagues in different areas and projects

Worked on data transformation from raw data to hive (HDFS) tables using python and spark. Performed Hadoop cluster management on HDInsight Cluster.
Configured Stream analytics for Azure Stream analytics for ingesting Sensor data into Azure storage.
Implemented several data pipeline jobs to pull the raw data from different sources to AWS S3 bucket, then processed using pyspark in EMR cluster and store the processed data in AWS S3 bucket.
Created Spark jobs as per business requirements, jobs run on EMR and are triggered by Lambda.

Full life cycle development including requirements analysis, high-level design, coding, testing, and deployment.
Extensive working knowledge on structured query language (SQL), python, spark, Hadoop, HDFS, AWS, RDBMS, data warehouses and document-oriented No-SQL databases.
Automated the process of downloading raw data into Data Lake from various source systems like SFTP/FTP/S3 using shell scripting and python.
Developed Hive scripts for data parsing of raw data using EMR and store the results in S3 and ingest into data warehouse like Snowflake, which is utilized by enterprise customers.
Designed ETL jobs to process the raw data using Spark and python in Glue, EMR, and Databricks.
Used python to pull raw data from various sources like Google DCM, DBM, AdWords, Facebook, Twitter, Yahoo, and Tubular. Also this data is parsed using spark framework and injected the data into Hive tables.
Implemented MapReduce programs using pyspark to parse out the raw data as per business user requirements and store the results in Data Lake (AWS S3).

Create a Data Lake on AWS environment using Spark (Pyspark) on AWS Glue for ETL Jobs, AWS Lambda for automation and triggering jobs, Athena are the primary source for data store and S3 for storage.
Developed a ETL(Extract, Transform, Load) functionality for time-based data files and generating the access anomalies for all employees across the company
Developed an ETL project for transforming and visualizing log data using Elastic Search and Kibana
Designed and developed a web application for employee access management/reporting across the company – Full stack

Create ETL Jobs, maintain it and supervise it.
Handle clients needs, demands and helping them reach a realistic deadline and needs
Understanding data and business needs and match it to the technology in use.
Lead a team of other data engineers and developers at the client premise.
Develop dashboards and visualizations for data.
Run scripts and query for data processing.
Coordinate work in the company and with competing companies in the same project

Designing, developing and maintaining highly-available java server, which handled and preprocessed data coming from thousands connections each second.
Designing/Developing concurrent data migration tool based on JavaRx Streams
Designed load test framework which emulated millions of connections
Engaged in production deploys and ecosystem management in AWS

I working on artificial intelligence and machine learning sometimes called machine intelligence, is intelligence demonstrated by machines.
In Machine learning is the scientific study of algorithms and statistical models that computer systems use to perform a specific task
All project are consist of images, videos, image pattern these data are perform a specific task is called as annotation these annotation data after convert into json and send back into to the client.
Performed data analysis, documentation, implementation

Build ETL pipelines (Data Extraction, Transformation and Loading) for collecting data, data decomposition and push to models.
Built & Train models (ML models)
Used Redis and Cassandra as DB for data storage.
Build docker images for tools, models used in this project and deployed
Used Git as a version control tool, configured and used Jenkins for Continuous Integration/ Continuous Deployment.
Implemented Elasticsearch/ELK to monitor applications as well as Nginx and system logs which can be viewed on Kibana for easy troubleshooting.

Involve and handle Jr. Data Engineer responsibilities
Support Jr. Data Engineer
Implement data services solutions (Data Ingestion, Data Processing, APIs, Computations)
Implement data schemas and structure
Implement and develop data quality control
Business intelligence and report generation Development
Develop data set processes

Led efforts to modernize processes of the manufacturing BI team, such as cloud migration and enabling Machine Learning capabilities beyond tradition analyses and visualizations
Initially finished developing an ETL architecture based on Apache Spark in an EMR cluster, but after experimentation and research, successfully pitched and developed a more powerful, fully serverless architecture
Took initiative to find and develop machine learning projects, resulting in a project which was picked up in the plywood division
Currently working on creating an architecture for machine learning model deployment and retraining

Designing and developing data models in line with the client’s business needs
Designing, creating, testing and maintaining the complete data management system
Taking care of the entire ETL process
Ensuring the architecture meets the business requirements
Working closely with the stakeholders and solution architect
Improving data quality, reliability & efficiency of the individual components & the complete system

Extracted and cleaned large datasets from SQL server to acquire necessary components for analysis
Maintained company website that determined performance analysis on appliances utilizing Python, HTML, CSS3, Javascript and SQL alchemy
Built automatic weblog analysis Analysed web server access logs files
Finally pushing the datasets developed to dashboard so that insights can be pulled in on top of data to drive business better.

Works closely with product owner to develop the work plan for Data Migration by understanding the business requirements of the project.
Work with Project Management to provide timely estimates, updates & status .
Understanding the complete pipeline, dependencies and flow of data by creating data model while working closely with the product owner.
Fetching data in either structured or unstructured format from different sources(e.g. JDBC, sftp server, etc.) and in different formats(doc,txt,csv,gz,etc).
Creating metadata for different datasets pulled in to provide a proper schema and datatype as per requirement and analysis made.
Developing scripts using Pyspark,Python and SQL for cleansing of data, applying several business logic by joining of datasets and produce final datasets as per requirement.
Applying Proper scheduling and mointoring on the datasets to reduce manual effort.

Worked closely with the business leads to understand and finalize the requirements with consideration to best practices and reporting needs
Taken complete ownership for the assigned task and timely inform the progress of the project
Knowledge of Unix and shell scripting skills are a plus
Statistical analysis of web server logs Role: Developer | Spark streaming, Scala

As a data engineer, my daily task includes working with lots of data
Implemented BI solution framework for end-to-end business intelligence projects.
Created dashboard for better data Visualisation.
Used data visualization tool such as Google Data Studio,PowerBI

Google Cloud Certified Professional Data Engineering professional with 7+ month of expertise in the area of Big Data Analytics,Data Integration, Data Preparation, Data Visualization and developing modern data platform over Public cloud platforms.
Extensive experience in Python scripting
Expertise on Data Visualization with Google Data Studio.
Expertise on RDBMS systems like SqlServer & Mysql.
Data analytics solutions on Google cloud platform.
Working knowledge on Big Data processing technologies.
An Accountable team player.

Developed strategies and algorithms to automate PCI Compliance in Hadoop Environment.
Big Data Migration from local to cloud environment
Implemented Benfords law with KL Divergence to detect potentially fraudulent datasets using python
Applied Decision Tree Regression to predict monthly debt (+data wrangling and feature selection) using python
Implemented test driven development by designing unit and integration tests.

Design, construct, install, test and maintain data management systems.
Integrate up-and-coming data management and software engineering technologies into existing data structures.
Develop set processes for data mining, data modeling, and data production.
Collaborate with members of your team (eg, data architects, the IT team, data scientists) on the project’s goals.

Prepare script and modify existing scripts as per requirement using Java.
Write queries in PostgreSQL to resolve data issues.
Design and creation of relational and non-relational database schemas.
Worked on automated data model project.
Configure and check SSO on servers using various frameworks using LDAP, SHIB.

Worked for Client DSM in their Data Management Initiative
Participate in the end-to-end life cycle of MDM implementations
Utilize data expertise, business knowledge and technical skills to successfully deliver Data Management initiatives like Vendor Master
Deliver training and provide knowledge transfer to end user clients
Apply performance, tuning enhancements and house-keeping activities
Work with clients to identify new areas within the business where the built solution can be utilized to drive business results
On-Site lead, On-shore/Off-shore way of working

Used text-mining process of reviews to determine customers concentrations.
Delivered result analysis to support team for hotel and travel recommendations.
Designed Tableau bar graphs, scattered plots, and geographical maps to create detailed level summary reports and dashboards.
Developed hybrid model to improve the accuracy rate.

Development of web-scrapers based on the Scrapy and custom(Requests, bs4, Mechanical soup).
Managing API(DRF) and DB (PostgreSQL, MongoDB)
A few exp. with Celery
Determined the most accurately prediction model based on the accuracy rate.

As a Data Engineer I have successfully designed and implemented the data load for multiple revenue sources of the bussiness.
Optimized long running jobs.
Have supported data base architects, data analysts and data scientists.
Implemented automations for data extraction from with shell scripts and python.
Assembled large,complex data sets that met the functional/ non – functional business requirement.

Automated business process by implementing statistical models and reduced the manual efforts.
Have implemented data engineering concept with Hadoop Big Data ecosystem.
Participated in development and working on multiple applications/Modules.
Worked with Amazon Redshift tools like SQL workbench/J, PG Admin, DB Hawk, Squirrel SQL.

Achieved building highly reusable core recommender application used by 5 different projects.
Worked with team leader to introduce data pipelines that drastically save up 20x storage usage of target in-memory system.
Authored and concisely presented documentations to 3 project owners.
Upgraded existing news crawl tool with significant misses reduction of 95%.
Coordinated with testers and provided effective solutions in terms of APIs performance and data-related systems.
Involved in multiple decision-making businesses with data analysts in addition to offering data-driven reports collected from top traffic websites.
Researched and assessed ETL technologies to apply in the upcoming products.

Responsible for creating migration scripts in Oracle to migrate data from one db to another
Responsible for creating validation process to make sure 100% data accuracy and movement
Document all test procedures for systems and processes and coordinate with business analysts and users to resolve all requirement issues and maintain quality for same.
Worked on Automating the provisioning of AWS cloud using cloud formation for Ticket routing techniques.

Worked on several enterprise projects, developed efficient and effective solutions catering the business need.
Worked on company’s product dealing with humongous data from designing the console to building it to a optimized one.
Optimized project’s components by code refactoring, optimization and solved critical issues like cross-browser compatibility, latency, spam-mail and many more.
Worked with ARIMAX, Holt Winters VARMAX to predict the sales in the regular and seasonal intervals.

Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC).
Performed data ETL by collecting, exporting, merging and massaging data from multiple sources and platforms including SSRS/SSIS (SQL Server Integration Services) in SQL Server.
Worked with cross-functional teams (including data engineer team) to extract data and rapidly execute from MongoDB through MongoDB connector.
Performed data cleaning and feature selection using Scikit-learn package in python.
Partitional clustering into 100 by k-means clustering using Scikit-learn package in Python where similar hotels for a search are grouped together.
Used Python to perform ANOVA test to analyze the differences among hotel clusters.
Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model.

data engineer

senior data engineer

senior data engineer

senior data engineer

data engineer

data engineer

data engineer

senior data engineer

data engineer

data engineer

data engineer

data engineer

data engineer

data engineer

data engineer

data engineer

data engineer

data engineer

data engineer

data engineer

data engineer

data engineer

data engineer

junior data engineer

data engineer

data engineer

data engineer

senior data engineer

data engineer

data engineer /machine learning engineer