ABOUT GRIDNAUT RECRUITING
Gridnaut Recruiting is a staffing and recruiting firm that connects skilled professionals with contract and full-time opportunities at leading AI laboratories, technology companies, and enterprise clients. We specialize in AI evaluation, engineering, research, and professional domain roles.
Join a leading AI lab's GenAI team to contribute to frontier-model evaluation. The position focuses on designing benchmark tasks for coding and agentic workflows, with responsibility for creating challenges that expose reasoning gaps in advanced language models.
This is a W2 employment position with Cincinnatus LLC.
Location: Remote (US-based)
Compensation: $70-$95/hour
Employment Type: Part-time
Hours Required: Minimum 30 hours weekly on weekdays (6+ hours daily)
Core Responsibilities:
Task Design and Development: Create challenging, real-world domain-specific problems targeting capability failures in frontier AI models.
Specification & Solution Development: Integrate the problems into an Agentic development environment, preparing all necessary components using Python, including detailed instructions and working solutions.
Performance Evaluation: Assess model performance across tasks and identify logical reasoning failures.
Analysis: Analyze the agent's steps (Agent Trajectory) to observe and extract core capability loss patterns.
Required Qualifications:
Current or retired STEM professor (ML, coding, data science fields)
Degree in computer science, data science, or related STEM discipline
Reliable weekday availability (30+ hours weekly)
Independent work capability and time management skills
Strong communication and problem-solving abilities
Preferred Experience:
AI training background
Model evaluation expertise
Data annotation experience
DURATION
Contract, ongoing with potential for extension based on performance. Approximately 20 hours per week.
COMPENSATION
$82 per hour (hourly contract).
LOCATION
Fully remote. US-based applicants only.