data engineer

  • Built sophisticated web crawlers to extract vast volumes of real-estate and construction data
  • Developed operational monitoring and alerting system to ensure quality of the scrapers
  • Assisted junior engineers whenever needed. Supported them by doing pair programming
  • Performed code reviews to ensure quality and adherence to standards

senior data engineer

  • Analyzed missions to predict future manpower requirements for operations using SQL, Python and Regression Analysis. 
  • Analyzed the existing data and created new schema based on the new design 
  • Written stored procedures, triggers using SQL. 
  • Deployment of Application on Test and Production Server. 

data engineer

  • Developed data warehouse for Online Sales Europe Division
  • Innovated automation for SQL and Unix code generation
  • Participated in regular Client visits & interactions on Retail Fashion domain
  • Also worked with K-Nearest Neighbors, Apriori algorithms for product recommendations including content-based filtering and collaborative filtering methods. 

data engineer

  • Perform the Sql query on data to get the managed data for model
  • Implemented several Natural Language Processing mechanisms for Spam Filtering, Chatbots. 
  • Worked with NLTK, SciPy, Polyglot for developing various NLP tasks. 
  • Worked with Amazon Redshift tools like SQL workbench/J, PG Admin, DB Hawk. 

data engineer

  • Error fixing and resolution.
  • Used high correlation filter, low variance filter and random forest for feature selection. 
  • Worked with Times Series Forecasting to predict the sales production based on the historic data for product recommendation. 
  • Worked with various classification algorithms including Naïve Bayes, Random forest, Support Vector Machines, Logistic Regression etc. 

data engineer /machine learning engineer

  •  Collecting data from various data sources including oracle database server and customer support department and integrating those into a single data set. 
  • Imported python and statistics libraries like NumPy, Pandas, Sklearn, Seaborn. 
  • Worked with Data preprocessing techniques like checking the data normally distributed and implemented log transformation, Box-Cox, cube root, square root transformations. 
  • Performed and treated outliers and missing values detected using boxplots and Pandas predefined functions. 
  • For performing data mining, data analysis, and predictive modelling worked with Java machine learning Library WEKA. 
  • Experience with container-based deployment using Docker, work with Docker images and Docker Registers. 
  • Worked with various dimensionality reduction techniques like Principal Component Analysis (PCA), Latent Discriminant Analysis (LDA), Singular Value Decomposition (SVD), Factor Analysis etc. 

data engineer

  • Involved in complete onsite-offshore meetings and discussions
  • Involved in production support to make sure loads are completed ontime and met the SLA
  • Design and implementation of a full cycle of data services 
  • Analyzed user requirements and designed the database changes accordingly.

data engineer

  • Creation of Perl scripts for automating the production data files download from FTP to data processing machines
  • Prepare test data files using Linux Shell scripts, Perl script as per the business requirements.
  • Developed reporting systems, tools and applications to facilitate management of content. 
  • Managed content processing jobs and data extracts for distribution.
  • Reporting and logging of bug, following bug life cycle, keeping track of issues.
  • Work with users to review the results and assist users in obtaining user acceptance.
  • Enhancement of existing functionalities to meet user’s changing need.

data engineer

  • Involve and handle Mid. Data Engineer responsibilities 
  • Support Mid. Data Engineer 
  • Identify ways to improve data reliability, efficiency, and quality 
  • Align architecture with business requirements 
  • Develop, construct, test and maintain architectures 
  • Identify, troubleshoot and resolve complex production data integrity and performance issues 
  • Drive engineering best practices and set standards 

data engineer

  • Managed end-to-end process for updating and verifying special orders data
  • Analyzed inventory usage reports to avoid backordering
  • Developed mass update system to avoid manual updates to data warehouse. Trained other users on the program 
  • Expand or modify the system to serve new purposes or improve workflow

data engineer

  • Collected, transformed, analyzed, and refined operational and customer data.
  • Built-out data structures designed to efficiently answer business questions.
  • Assisted in evolving data structures from a MSSQL Server footprint into a data lake environment.
  • Developed, implemented and tuned ETL processes.
  • Created pipelines from internal and external data sources to AWS using custom python scripts.

data engineer

  • Collaborated with software developers to design the core engine of recommender system for claim of service/warranty for an electronic manufacturing client.
  • Prototyped a tool to mitigate production risks considering employee attrition and retention steps. 
  • Ingesting structured and unstructured data
  • Build standard processes for data governance, data dictionary and data flows 

senior data engineer

  • Migrating sql server to Hive via using sqoop & nifi
  • Setup data pipeline to incremental import of  MSSQL server to Hive with 24 hours lagging. Data volume processed daily is 200 GB.  
  • Installed 40 nodes cloudera cluster to accomodate historical data of 80 TB on google premises. 
  • Installed 10 nodes apache cluster to demonstrate POC on hadoop stack technologies
  • Using python to analyse data in csv format and providing report to management

data engineer

  • Identifieded system data, hardware and software components required to meet the product requirements.
  • Prodigy (data labeling/annotations), Dataturks, Label Studio. Built our own custom annotation tool.
  • Google Cloud Platform and Ubuntu Virtual Machine
  • Proficiency in query languages like SQL and Relational DBMS.

data engineer

  • Manage Snowflake Data Warehouse for Gartner Digital Markets team.
  • Developed high frequency data integration with BigQuery for ingesting sessions and hits data.
  • Provide advice on cost optimizations and process changes.
  •  Setup Airflow as ETL tool to migrate existing ETL tool in effort to reduce costs by up to $25,000. 

senior data engineer

  • Involved in requirement gathering, analysis, design, coding, deployment for business objects
  • Write complex queries in SQL Server, Netezza and Snowflake as per business requirements
  • Developed processes to download and process files from SFTP, Amazon S3
  • Build processes to download files using API
  • Invloved in developing of ETL packages using SQL Server Integration Services (SSIS) to process different formatted files to Datawarehouse
  • Built different reports in SQL Server Reporting Services (SSRS) for data visualization
  • Performed end-to-end testing for the system including unit and system testing

data engineer

  • Build a data warehouse solution for the new BSCS system.  
  • Prepared & consolidated monthly Post-paid and Pre-paid Revenue reports and related data under oracle environment. 
  • Support operation jobs and make sure that all numbers on reports are in range and jobs are done successfully. 
  • NLTK, Tensorflow, spaCy, scikit-learn, Keras, OpenCV

data engineer

  • Utilized Microsoft Excel  software to create  and analyze data.
  • Boosted chatbots responses by 10%. 
  • Identified the major issues in  dataset. 
  • Doing ad-hoc analysis and presenting results in a clear manner

data engineer

  • Streaming the data from assets continuously using Spark Streaming. Which provides highly scalable, fault- tolerant stream processing. 
  • Processing of data obtained from the assets using in-memory cluster computing spark engine. 
  • Sending the processed data to rule engine and validating the data with rules configured by the customer and alerting the customer. 
  • Storing the processed data for further analysis and predictions using ML Algorithms. 
  • Sending the streaming data to a real time dashboard for continuous monitoring.

data engineer

  • Greenplum Query design, to be consumed in BI Reports and Live Office.
  • Labeling data and feature selection for training machine learning models
  • Building custom python AI package, NAIL (Nirveda AI Library), to help build data sets, create supervised, structured, fast & robust approach to handle pre-processing, annotation, post-processing, text verification.
  • Data mining and scraping data

data engineer

  • Develop SQL scripts for a variety for reports, data corrections and data migrations
  • Responsible for troubleshooting various computer issues and implementing solutions
  • Work closely with project manager to develop work plan for Data Warehouse projects as well as implementations
  • Develop framework, metrics and reporting to ensure progress can be measured, evaluated and continually improved
  • Support the development of performance dashboards that encompass key metrics to be reviewed with senior leadership and sales management
  • Work with application developers and DBAs to diagnose and optimize query performance
  • Build shell scrips to automate tasks

data engineer

  • Working on designing and implementing complete end to end big data pipelines using Sqoop, Flume, Spark, Kudu, HDFS and build real-time streaming using StreamSet 
  • Distribute, store, and process data in a Umniah Hadoop cluster, Process, and query structured and non-structured data using SparkSQL, Use Spark Streaming to process live data. 
  • APS (PDW): 
  • Enhancing the Umniah data, warehousing system, which is Microsoft PDW technology (APS), creating efficient, flexible and scalable ETL and ELT processes to handle data movement. 
  •  Extracting and transforming large size datasets by Python scripts and loading by PowerShell Developing on-premise and on-cloud Power BI reports and migrate current DataZen dashboards, Reports, Slicers, and KPIs into Power BI also configuring gateway and Power BI report server 
  • Developing Python tool to decode Huawei CDRs and enhance cost-effective and decoding processes 
  • Collecting data requirement from the business team and translate it into datasets and gathering the reports need 

data engineer

  • Worked for Australian Telecommunications Client on Planning and Budgeting reports using Oracle Hyperion Suite of applications
  • Role as a data engineer involved data transformation, loading and reporting using ETL tools
  • Configuring and developing live office based on BI Reports.
  • BO Universe’s development/enhancement and debugging as per User Requirement.

data engineer

  • Nirveda Cognition Inc. (Direct Hire) http://www.nirvedacognition.ai
  • Data Scientist | Data Annotator | ML Engineer | Artificial Intelligence
  • Python 3.5+, including NumPy, Pandas, and other mathematics and scientific libraries
  • ETL (Extract, Transform, Load) of large scale data sets
  • Web scraping (Selenium, Beautiful Soup, etc)
  • Experience with Natural Language Processing, Machine Learning, Deep Learning, Computer Vision, etc.
  • Building large data sets by combining internal and client data with third-party or synthetic data

data engineer

  • Developed Enterprise Datawarehouse (EDW)  from ground up gathering data across different products and data sources.
  • Identified system data, hardware and software components required to meet the product requirements.
  • Played multiple roles such as Data Architect , ETL and BI developer.
  • Efficiently improved report execution by performance tuning sql queries, database tables and aggregating the data.
  • Designed and Developed event based scheduling service to deliver automated reports to users using Microstrategy and bash scripting.
  • Implemented Self Serve analytical capability to business users enabling them to quickly create and share dashboards.

data engineer

  •  Work with Project Management to provide timely estimates, updates & status.
  •  Collect, handle and document the data for insights.
  • Coordinate with the support team and support the management in data services.
  • Also, resolve the technical issues of clients.

data engineer

  • Built recommendation engine for video content types like Linear-TV and VOD in Scala using Spark MLlib through recommendation techniques like Collaborative Filtering(includes matrix factorization techniques ), Content-Based Filtering and RNN which resulted in improvement of click-through rate(CTR) by 0.5% initially. Services include data ingestion (to Hbase through Kafka), training the models and serving(through Redis).
  • Built churn predictor system – System which predicts the churning probability of each user for the upcoming months based on the video usage patterns, resulted in the reduction of churn percentage by 3%(rounded)
  • Instrumental in developing end-to-end video analytics system for Yupptv and OTT tenants. Responsibilities includeCollecting raw data from multiple clients, process the data and load in appropriate tables by using Kafka, Elastic Search, and Redshift(ELK architecture).
  • Serving various analytics requests through intuitive dashboards for different types of users.
  • Providing the flexibility to develop own dynamic dashboards to tenants(Similar to BI tool). 

data engineer

  • AWS Certified Developer
  • As a part of Data engineering team, worked on multiple projects, designed and proposed architecture for couple of clients.
  • Hands on experience in many AWS, GCP services and multiple Apache and other open source projects.
  • Was responsible for code development, review and deployment.

data engineer

  • I am currently working on setting up a framework for enabling customer segmentation. Thereby enabling the marketing team to make smart decisions based on the segments. 
  • Responsible for architecture and development of data pipeline using Apache Kafka for consuming a huge volume of clickstream messages. Contributed to setting up systems for data warehouse and developed efficient applications for migrating application data to the data warehouse. Understanding systems that are deployed and setting up systems for monitoring key metrics for deployed systems to guarantee high availability.  Understand application systems in order to develop denormalized datasets for analysis using Apache Spark.
  • Design and develop the architecture for BookMyShow PWA. Responsible for optimizing build times and load times for the mobile web app. Understanding the latest technologies and finding ways to implement them to solve performance bottlenecks. Implement features in website end to end right from developing components in Reactjs and integrating them with API’s. Developed API’s in NodeJS. Used docker for automated deployment.
  • Contributed in developing BookMyShow desktop website. Responsible for developing end to end features, right from communicating with product owners and gathering requirements to developing UI by collaborating with designers. Integrating API’s on the server side using core PHP.

data engineer

  • Designing the Data Pipeline
  • Choosing the Optimum Distributed Data Store Architecture
  • Designed and converted about 30 BO WebI Reports from BOXI 3.1 to BI 4.0, connecting to Teradata.
  • Preparing report design documentation for reports to be developed and performing UAT/QA of BI Reports.