Open Role

ATTAIN - Research Scientist

The average family buys three homes in their lifetime. Each one is the largest financial decision they will ever make. Help us make sure our AI gets it right.

ABOUT ATTAIN

We’re building the future of homeownership. Atty — our AI agent — helps people buy their first home: preparing their finances, identifying homes they can afford, completing pre-approval, applying to down payment assistance programs, coordinating tours, and assisting through closing. Because most of the process is automated, we share the commissions paid by the lender and seller directly with the buyer — up to $10,000 back in their pocket. The residential real estate transaction is stuck in the 1990s. Professional gatekeepers — mortgage brokers, real estate agents — are focused on closing transactions, not helping people. Atty changes that. It doesn’t answer questions. It takes actions. And those actions directly impact whether a family buys a home or doesn’t. We are backed by Mila and operating at the intersection of high-stakes financial services, agentic AI, and consumer product. This is early. The problem is real. The research is hard. And the impact — measured in families who own homes that wouldn’t have otherwise — is concrete.

THE PROBLEM YOU WILL SOLVE

Atty doesn’t just chat. It assesses financial readiness, recommends mortgage strategies, fills out loan applications, selects lenders, and coordinates the entire homebuying journey. A wrong recommendation doesn’t generate a bad review. It can cost a family $10,000, delay their dream by years, or lock them intoTHE a mortgage they can’t sustain. We currently validate quality through live user interactions and manual expert review. It works. It doesn’t scale. As we expand across diverse markets, user profiles, and regulatory environments, we face a core challenge that has no published solution: How do we rigorously measure and improve the quality of our AI agent’s decisions at scale — before those decisions reach real families? There is no benchmark for AI agent quality in residential mortgage and real estate. You would be building the first one. The methods you develop will generalize across regulated, high-stakes financial services — a gap the research community has not yet closed.

WHAT YOU'LL OWN: This is not a literature review engagement. You will design, build, and validate research artifacts that go directly into production

Synthetic user profile generator — thousands of realistic homebuyer scenarios spanning income levels, credit histories, debt structures, employment types, geographic markets, family situations, and adversarial edge cases. Statistically representative. Privacy-preserving. Scalable.
Simulation harness — a system that feeds synthetic profiles through Atty and captures every recommendation, action, and output across multi-turn, multi-step interactions with sufficient realism that agent outputs are meaningful.
Benchmarking framework — clearly defined quality metrics scored against expert human baselines: accuracy of recommendations, optimality of lender selection, document completeness, regulatory compliance, and communication quality. You define how these dimensions are weighted.
Gap analysis methodology — a rigorous framework for identifying where Atty matches or exceeds human performance and where systematic weaknesses exist across user segments, markets, and regulatory environments.
Repeatable evaluation pipeline — an automated system that integrates into our development workflow and runs continuously as the agent improves, new markets are added, and new use cases are introduced.

THE HARD QUESTIONS YOU'LL ANSWER

How do you generate synthetic homebuyer profiles that are statistically representative of the real population — including edge cases and adversarial scenarios — without leaking private data?
How do you simulate multi-turn financial conversations with sufficient realism that agent outputs are meaningful? How do you model document submission, user hesitation, and conversational dynamics?
What metrics actually capture ‘quality’ in a high-stakes financial advisory context? How do you weight accuracy against compliance against communication quality?
How do you efficiently encode expert human decision-making as ground truth — combining expert annotation, historical case data, and structured interviews?
How does agent quality vary across 50 US states, different regulatory regimes, and local market conditions? How do you ensure the simulator covers this variation without overfitting to any one market?
How do you translate evaluation results into actionable improvement signals for the AI agent — and prioritize the highest-impact quality gaps?
Applied ML in financial services or regulated domains — you understand what compliance, fairness, and regulatory variation actually mean in practice.
Human-AI comparison studies — you’ve designed expert annotation frameworks and know how to build reliable ground truth from human judgment.
You care about rigor. You ship. You want your research to make a measurable difference in the world — not sit in a PDF. Published research is a strong plus. Mila, CIFAR, NeurIPS, ICLR network is a strong plus.

WHAT WE ARE LOOKING FOR: you have deep experience in one or more of:

Large language model evaluation and benchmarking — you’ve designed evaluation frameworks, not just run them.
Synthetic data generation and simulation — you understand what makes synthetic data statistically valid and where it breaks down.
Multi-agent orchestration and agent evaluation — you’ve thought seriously about how to measure agent quality beyond accuracy.

WHAT THIS IS NOT

This isn’t a role where you review existing benchmarks and write a report. You will design the benchmarks from scratch.
This isn’t a comfortable engagement. The problem is unsolved, the domain is regulated, and the stakes are real families’ financial outcomes.
This isn’t research for research’s sake. Your outputs will be integrated directly into a production AI system serving real users.
This isn’t a long-horizon academic project. We move fast. Findings translate into product improvements on a timeline measured in weeks.

COMPENSATION

Competitive research stipend or salary depending on engagement structure.
Access to Attain’s production data — real Atty interactions, real mortgage outcomes, real user profiles. Not a toy dataset.
Full Mila research infrastructure, collaboration with world-class AI researchers, and a clear path to publication.
Direct impact on a system actively helping families buy homes. The gap between your findings and real-world impact is measured in weeks, not years.

Apply

Send a short note explaining why this venture, and link your LinkedIn or résumé.