RLHF Specialist

Odixcity Consulting

Software & Data

2 days ago

Easy apply New

Remote (Work From Home) Full Time IT & Telecoms NGN 400,000 - 600,000

Share link

Job summary

An RLHF Specialist is responsible for improving and aligning AI models using Reinforcement Learning from Human Feedback (RLHF) methodologies. This role focuses on designing, implementing, and optimizing feedback pipelines that enhance model performance, safety, factual accuracy, and alignment with human values.

Min Qualification: Degree Experience Level: Mid level Experience Length: 4 years

Job descriptions & requirements

Responsibilities:

Generate high-quality preference data by comparing multiple model responses and ranking them based on criteria such as helpfulness, honesty, and harmlessness (HHH).
Design complex, multi-turn prompts to stress-test model behavior and expose weaknesses in reasoning or safety.
Write detailed “chain-of-thought” explanations and rationales to train reward models on why specific responses are superior.
Collaborate with Machine Learning Engineers to analyze model failure modes and identify data gaps that, when filled, will improve reinforcement learning outcomes.
Develop and iterate on annotation strategies for preference scoring and reinforcement signals, ensuring consistency across a global team.
Proactively probe models to identify vulnerabilities, biases, or hallucination patterns, documenting findings for model optimization.
Analyze edge cases where the reward model behaves unexpectedly (e.g., over-indexing on verbosity or style over substance). Provide detailed feedback to ML engineers on reward model failure modes and suggest specific data interventions to correct model behavior.
Develop and document templated instruction sets for larger annotation teams. Translate complex reinforcement learning concepts into simple, repeatable tasks for junior reviewers, ensuring high-quality data collection at scale.
Monitor model performance over time by maintaining a personal test set of prompts.
Regularly re-evaluate new model versions against historical benchmarks to track improvements or regressions in reasoning and alignment.

Requirements:

Minimum of 4 years of experience in Data Annotation, Model Evaluation, Computational Linguistics, or Trust and Safety, specifically working with AI/ML training data.
Strong proficiency in Python and deep learning frameworks (PyTorch, JAX, or TensorFlow).
Deep understanding of Reinforcement Learning concepts (PPO, Trust Regions, Reward Hacking) and how they apply to language generation.
Hands-on experience fine-tuning open-source models (e.g., Llama 2/3, Mistral, gemma) using techniques like LoRA/QLoRA.
Experience working with annotation tools (LabelBox, Scale AI, Snorkel) and managing human-in-the-loop workflows.
Ability to diagnose why an RL policy collapsed and adjust hyperparameters or reward structure accordingly.
Experience with Constitutional AI or Self-Alignment techniques.
Contributions to open-source alignment libraries (TRL, Transformer Reinforcement Learning, Axolotl).
Experience with cloud Platforms (AWS SageMaker, GCP Vertex AI).

Remuneration: NGN 500,000 monthly

Important safety tips

Do not make any payment without confirming with the Jobberman Customer Support Team.
If you think this advert is not genuine, please report it via the Report Job link below.

Report Job

RLHF Specialist

Odixcity Consulting

Software & Data

Share link

Job summary

Job descriptions & requirements

Important safety tips

Log in to apply now

Share link

Similar jobs

RLHF Specialist

Odixcity Consulting

Software & Data

Share link

Job summary

Job descriptions & requirements

Important safety tips

Log in to apply now

Share link

Similar jobs

Stay Updated