Evaluation Scenario Writer - AI Agent Testing Specialist
Please submit your CV in English and indicate your level of English proficiency.
Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation isproject-based, not permanent employment.
What this opportunity involves
You’ll create challenging coding test cases that push AI coding systems to their limits:
- Review and refine realistic coding tasks based on provided production codebases with realistic scope, requirements and information sources
- Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases, not just superficial checks
- Craft “fair but hard” challenges where the AI has all the context it needs, but has to work for it (information scattered across files and external sources, complex reasoning required)
- Analyze AI failures to understand what the model struggles with vs. what it masters
- Iterate based on feedback from expert QA reviewers who score your work on 7 quality criteria
What we look for
This opportunity is a good fit for experienced developers, software engineers, and/or test automation specialists open to part-time, non-permanent projects. Ideally, contributors will have:
- Degree in Computer Science, Software Engineering or related fields
- 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations)
- Background in Full-Stack development, with an equal focus on building React-based interfaces and robust Back-end systems
- Experience writing tests (functional, integration – not just running them)
- Docker containers (running evaluations locally in containers)
- CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results)
- English proficiency - B2
How it works
Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid
Effort estimate
Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.
Payment
- Paid contributions, with rates up to $50/hour*
- Fixed project rate or individual rates, depending on the project
- Some projects include incentive payments
*Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.
Empfohlene Jobs
Minijob (m/w/d) Social Media & Content Creation - Female Startup
Hey, ich bin Caro! Gründerin von Maivy – die Brand, die Haaraccessoires neu denkt. Gegründet aus einem echten Alltagsproblem, wachsen wir gerade rasant – mit einer starken Community, einem klaren …
Teamassistenz (m/w/d) zum nächstmöglichen Zeitpunkt
+++ DIREKTVERMITTLUNG in Festanstellung (keine Zeitarbeit) / Vermittlungsgutscheine (AVGS) werden akzeptiert +++ Bei Fragen einfach unverbindlich anrufen: 03048479484 oder einen Rückruf vereinbaren…
Staatlich anerkannte*r Sozialpädagogische*r Assistent*in oder Erzieher*in (w/m/d)
Wir bieten in unserer Kita Baumhaus in Hamburg Eppendorf zum nächstmöglichen Zeitpunkt folgende zwei Stellen an: staatlich anerkannte*r Sozialpädagogische*r Assistent*in oder Erzieher*in (w/m/d) fü…
Praktisches Jahr
Informationen für Studierende Studierende des Universitätsklinikum Hamburg-Eppendorf (UKE) können bei uns ihr Praktisches Jahr absolvieren. Auf jeder klinischen Station können dazu ein bis zwei St…
Bewerbende mit Behinderungen erwünscht 👨🦽👨🦯🦻 Sicherheitsmitarbeiter (m/w/d) am Standort in Hamburg
Wi-Med Bergmannstrost unterstützt das renommierte BG Klinikum Bergmannstrost Halle in allen nicht medizinischen Belangen. Wir sorgen für einen einwandfreien Ablauf durch unsere Dienstleistungen, v…
Technical Support Specialist
Join Intralot as a Technical Support Specialist!(based in Hamburg, Germany) Your Role: As a Technical Support Specialist you will join our support operations team, focusing on clients in the …
Planungsingenieur Kommunikationsinfrastruktur - IT & Gebäudetechnik (w/m/d)
DEIN AUFGABENFELD Planung und Projektierung von Kommunikations- und Informationstechnik für Dienststellen der Trägerländer Entwicklung komplexer technischer Konzepte für Kommunikationsanlagen, …
(Senior) Game Producer (f/m/d)
At Sunday, we fuel hybrid-casual games by developing and marketing fun, easy-to-play mobile games that consistently top the charts and generate hundreds of millions of downloads globally. Our talente…
Teamlead UGC & Content Creation (m/w/d)
Stellenbeschreibung Du denkst in Creatives, verstehst Direct-Response-Marketing und willst UGC nicht nur produzieren, sondern international skalieren und operativ sowie strategisch führen ? Uns…