ML Intern
Parallel RoleDeccan AI
Jan 2026 - Present
- Worked on trajectory-based workflow evaluation using a rule-based plus LLM-as-a-judge approach.
- Developed sanity and contamination checkers for Terminal-Bench tasks to improve evaluation reliability.
- Worked on a spatial counterfactual occlusion rearrangement evaluation benchmark for VLMs.