Seeking a candidate that will help design, build, and modernize an existing high-profile legacy system in a cloud DevOps environment utilizing available C2S services. As a data engineer you will be responsible for taking an existing framework and executing against it that will ultimately take unstructured data and transform into structured, searchable and tagged data that will be more useful to the Program. The candidate will utilize C2S services in combination with 3rd parties - Spark, EMR, DynamoDB, RedShift, Kinesis, Glue, Snowflake, etc.
- Creation and support of real-time data pipelines built on AWS technologies including Glue,
- Redshift/Spectrum, Kinesis, EMR and Athena
- Interface with other technology teams to extract, transform, and load data from a wide variety of data sources using SQL and AWS big data technologies
- Continual research of the latest big data and visualization technologies to provide new capabilities and increase efficiency
- Collaborate with other tech teams to implement advanced analytics algorithms that exploit our rich datasets for statistical analysis, prediction, clustering, and machine learning
- Help continually improve ongoing reporting and analysis processes, automating or simplifying self-service support for customers
- Demonstrated strength in data modeling, ETL development, and data warehousing
- Experience using big data technologies (Hadoop, Hive, Hbase, Spark etc.)
- Knowledge of data management fundamentals and data storage principles
- Experience using business intelligence reporting tools (Tableau, Business Objects, Cognos etc.)
- Strong analytic skills related to working with unstructured datasets.
- Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Build processes supporting data transformation, data structures, metadata, dependency, and workload management.
- Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores.
- Experience with relational SQL and NoSQL databases, including Postgres.
- Experience with data pipeline and workflow management tools.
- Experience working with AWS data technologies (Redshift, S3, EMR)
- Experience building/operating highly available, distributed systems of data extraction, ingestion, and processing of data sets
- Experience working with distributed systems as it pertains to data storage and computing
- Knowledge of software engineering best practices across the development lifecycle, including agile methodologies, coding standards, code reviews, source management, build processes, testing, and operations
- TS/SCI with Full Scope Polygraph