ALPHA10X is seeking a seniorDataEngineer to design and implement Databricks data pipelines for our knowledge graph generation worklflow.
You will lead efforts to optimize and troubleshoot our data processing systems and uphold high data quality standards across our Lakehouse.
As a technical leader, you will mentor junior dataengineers and focus on graph data models and business data analysis.
Your role will also encompass refining our existing data infrastructure to ensure scalability and efficiency in Azure cloud environment, including the Lakehouse and Graph, Search and Vector Databases.
Your expertise will be key in advancing our data strategy and infrastructure capabilities.
Main responsibilities include but not limited to:
design and implement advanced data pipelines, focusing on efficiency, scalability, and innovative solutions to complex problems.
and troubleshoot data processing systems, particularly for building and enhancing the knowledge graph with a proactive approach to problem-solving.
contribute to data modeling and architectural decisions, ensuring suitability and optimization for various data sources and types.
robust data processing solutions using Databricks and other relevant technologies, leveraging deep expertise in dataengineering principles.
efficient and scalable data infrastructures; lead ETL development and integration efforts, emphasizing data reliability and integrity.
junior dataengineers, promoting a collaborative and learning-oriented environment; collaborate closely with cross-functional teams to understand and define data requirements ensuring alignment with business objectives.
dataengineering projects with a keen understanding of business needs, ensuring project objectives align with organizational goals.
optimize and maintain existing data infrastructure to meet evolving business needs.
technical leadership throughout dataengineering project lifecycles, from conception to implementation.
expertise in data analysis techniques and quality testing, enabling a comprehensive understanding of data quality issues and the implementation of effective solutions.
Qualifications:
Master's degree in DataEngineering, Computer Science, Data Science, Software Engineering, Information Systems or a related field At least 4 years of experience in progressively responsible roles within dataengineering.
Knowledge & skills:
Must Have Expertise in Python and at least one additional programming language (e.g., SQL) for sophisticated data manipulation and analysis.
Proficiency in graph database technologies, specifically Neo4J.
Comprehensive knowledge of big data technologies, such as Apache Spark and Hadoop, with hands-on experience in large-scale data processing.
Demonstrable passion for dataengineering, evidenced by a portfolio of successful projects and innovative solutions.
Strong English communication skills (verbal and written), capable of explaining complex technical concepts with clarity.
Proven ability to work autonomously in a fast-paced environment, leading with a proactive problem-solving approach.
Detail-oriented approach to data, prioritizing accuracy, reliability, and integrity across all stages of the data lifecycle.
Thorough understanding of data modeling techniques and diverse data architectures.
Experience with Agile methodologies.
Proven experience in advanced data analysis, with a knack for extracting strategic insights from complex datasets.
Nice to have in managing various databases, including Elastic, CosmosDB, and MongoDB.
experience in leading and deploying data solutions on cloud platforms like AWS, Azure, and GCP.
engagement with emerging technologies, fostering adaptability and innovative thinking in the dataengineering domain.
experience with the Databricks ecosystem, including workflows, Delta Lake, feature store, and other advanced functionalities.
with vector databases and experience in integrating them into data workflows.