Multimodal Specialist
Job summary
We are seeking a technically skilled Multimodal Specialist (Vision / Audio / Video) to evaluate and validate annotations across image, video, and audio datasets. This role plays a key part in ensuring the accuracy and consistency of multimodal training data and model outputs.
Job descriptions & requirements
Responsibilities:
- Review and validate image, video, and audio annotations
- Assess bounding boxes, segmentation masks, and object labelling accuracy
- Perform image segmentation, QA, and detect spatial inconsistencies
- Validate video events, temporal sequences, and frame-level annotations
- Conduct audio transcription QA and verify timestamp accuracy
- Score multimodal model outputs for correctness and quality
- Identify labelling inconsistencies, noise, and structural errors
- Provide structured feedback to improve annotation standards
Requirements:
- Bachelor’s degree in Computer Science, Information Technology, or equivalent professional experience.
- 4+ years of experience working with vision, audio, or video datasets
- Familiarity with annotation tools (e.g., labelling platforms for bounding boxes, segmentation, transcription)
- Strong spatial and temporal reasoning skills
- Close attention to detail and consistency in evaluation
- Ability to analyze large-scale multimodal datasets
Important safety tips
- Do not make any payment without confirming with the Jobberman Customer Support Team.
- If you think this advert is not genuine, please report it via the Report Job link below.