TrafficGuard is looking for an experienced Google Cloud, Senior Data Engineer whose idea of heaven is an infinite stream of event data.
TrafficGuard is a SaaS solution, helping businesses verify their advertising traffic and prevent invalid traffic from impacting their advertising campaigns. This role will primarily focus on developing, constructing, building tests and maintaining architectures on GCP to support the data scientists and the business stakeholders.
The ideal candidate will bring GCP data architecture expertise and enthusiasm to drive data reliability, efficiency, and quality.
Working in our Perth office or remote, you will be part of a growing and innovative, award-winning company transforming the online advertising ecosystem.
- Work with implementation teams from concept to operations, providing deep technical subject matter expertise for successfully deploying large scale data solutions in the enterprise, using modern data/analytics technologies and cloud.
- Migrate current real-time aggregation customer-facing reporting from third-party vendors to Bigquery for time series analysis of multiple KPI, with various dimensionality and cardinality challenges. This system is critical with tight SLA, Query performance expectations across any time period.
- Integrate massive datasets from multiple data sources for data modelling
- Implement methods for the automation of all parts of the predictive pipeline to minimise labour in development and production
- Ensure Data Quality in data pipeline and build process, design mature software with tools and frameworks like Cloud Composer, DBT - Data Build Tool, Great Expectation, and DevOps principles.
- Formulate business problems as technical data problems while ensuring key business drivers are captured in collaboration with product management
- Knowledge in machine learning algorithms, especially in fraud systems, anomaly detection, Clustering.
- Querying datasets, visualising query results, and creating reports
- Design responsible and interpretable AI, which is privacy-aware, detects data and selection bias. Reduce time to production for AI using systems like BigQuery ML, Google AI Pipeline and AutoML
- Familiarity with real-time streaming ML using concepts like Feature store, Dataflow Apache beam.
To the role, you will bring:
- Minimum 2 years of designing and building production data pipelines from ingestion to consumption within a hybrid big data architecture, using Java, Python, Scala etc.
- Sound knowledge across these products Spark, Cloud DataProc, Cloud Dataflow, Apache Beam, BigTable, Cloud BigQuery, Cloud PubSub, Cloud Functions, Cloud Run, Airflow, Cloud Composer
- Minimum 3 years of designing, building, and operationalising large-scale enterprise data solutions and applications using one or more of GCP data and analytics services combined with 3rd parties or another cloud vendor AWS, Azure - etc.
- 2+ years of experience of building time series aggregation for business reporting using products like Druid, Pinot, Bigquery, OpenTSBD, Bigtable
- Ability to diagnose Dimensionality, Cardinality of data and rollup and design reporting system capable of handling millions of time series for fast access through rollup and innovative data types like HLL sketches.
- Minimum 1 year of experience in performing detailed assessments of current state data platforms and creating an appropriate transition path to GCP
- 1 year of hands-on experience designing and implementing data ingestion solutions on GCP using GCP native services or with 3rd parties such as Talend, Informatica
- 1 year of hands-on experience architecting and designing data lakes on GCP cloud serving analytics and BI application integrations
- Architecting and implementing data governance and security for data platforms on GCP
- Designing operations architecture and conducting performance engineering for large scale data lakes in a production environment
- 2+ years of experience writing complex SQL queries, stored procedures, Performing advanced SQL queries + Python and build and monitor existing data pipelines, detect data quality issues, detect concepts and data drift.
- 2+ years of experience in data visualisation with platforms like Tableau BI, Data Studio, or Python Data viz packages. Help develop Data Viz prototype to explain concepts back to business-users to integrate into customer-facing portal SaaS product
- Good knowledge of Google Cloud SDK, IAM, Role and Security best
Some of additional benefits include:
- Attractive career growth and salary packaging
- Training and development opportunities
- Flexible work hours,
- Vibrant team culture
- Collaboration opportunities with tech partners like Google, AWS, Imply