You are viewing a preview of this job. Log in or register to view more details about this job.

Data Engineer — AI & Data Infrastructure

We’re looking for a Data Engineer with a strong foundation in data pipelines and a meaningful edge in AI-native data infrastructure — specifically RAG pipelines, vector search, embedding workflows, and semantic retrieval systems.

You’ll work on two interconnected problem sets:

The first is foundational: consolidating eight legacy systems into a unified, reliable data platform — ETL pipelines, a data warehouse, and cross-system client identity resolution.

The second is where the work gets genuinely interesting: transforming three decades of institutional research into an intelligent, searchable, interactable knowledge layer that clients can query in ways that weren’t possible two years ago.

This is a small, senior team. You’ll work directly with the CTO, have real architectural ownership, and build systems that are in production — not in a sandbox.

What You’ll Work On

Data Foundation & Migration

Lead the data engineering work for our research portal migration — extracting, transforming, and loading data from legacy systems into modern cloud infrastructure
Build and maintain ETL/ELT pipelines across multiple integration points: CRM, research distribution platforms, trading systems, and third-party APIs
Design and implement our “Golden Record” initiative — cross-system client identity resolution across eight legacy databases with no unified identifiers
Implement event-driven data flows using AWS EventBridge as the central routing layer, treating each source system as a swappable adapter

AI-Native Data Infrastructure (RAG & Search)

Design and build production-grade RAG (Retrieval-Augmented Generation) pipelines over AGCO’s research archive — ingestion, chunking strategy, embedding generation, vector storage, and retrieval
Implement hybrid search approaches that combine semantic (vector) search with keyword and metadata filtering, appropriate for structured financial research queries
Build and maintain embedding pipelines that keep the vector store current as new research is published, with full observability and freshness guarantees
Evaluate and implement emerging retrieval strategies as the space evolves:
Re-ranking with cross-encoders
Hypothetical Document Embeddings (HyDE)
Query expansion and decomposition
Graph-based retrieval (e.g., GraphRAG) for analyst relationship mapping
Structured metadata retrieval for faceted financial queries
Wire retrieval layers into LLM interfaces for research summarization, analyst Q&A, and recommendation-change tracking across the archive
Enable client queries such as: “Show me all emerging market buy recommendations from analysts with 10+ years of coverage who changed their view in the last 6 months”

DevOps & Data Infrastructure

Apply DataOps practices across all pipelines: version control, CI/CD, environment parity across dev/staging/production, and infrastructure as code
Monitor pipeline health, embedding freshness, retrieval quality, and LLM call latency — build alerting that catches problems before users do
Work within AGCO’s AWS environment (App Runner, EventBridge, CDK) and contribute to IaC best practices

Collaboration & Documentation

Partner with the CTO, product team, and application developers to translate business requirements into sound data and retrieval architecture decisions
Document data flows, schema designs, chunking strategies, and retrieval logic so systems are maintainable and not a black box
Contribute to evaluation frameworks for retrieval quality — precision, recall, answer faithfulness — so we know when the system is actually working