Duties include:
1. Deeply involved in the design and development of LLM (large language model), VLM (visual language model) and audio-visual multi-modal model evaluation systems, with a focus on large-scale evaluation of model performance, robustness, and fairness;
2. Lead the construction of evaluation indicators for multiple business scenarios, supporting the ability evaluation and comparison of general and vertical field models;
3. Responsible for designing the overall evaluation scheme, including evaluation framework construction, indicator design, dataset management, and distributed evaluation process optimization;
4. Develop and maintain a highly scalable and visualization-supported large model evaluation platform, realizing multi-task, multi-modal, and multi-round automated evaluation processes;
5. Deeply participate in the analysis and interpretation of evaluation data, provide data support and decision-making basis for model optimization and R&D iteration.
Position requirements
1. Bachelor's degree or above in Computer Science, Artificial Intelligence, Data Science, etc.;
2. Have in-depth understanding of the current mainstream large model's principles and evaluation methods, and be familiar with various evaluation types (zero-shot, multi-round dialogue, alignment, security, etc.);
3. Familiar with mainstream evaluation systems and platforms, such as BigBench, MMLU, LM Harness, OpenCompass, etc., and those who have actual use or development experience are preferred;
4. Have a complete understanding of the data collection, cleaning, annotation, and analysis processes, and be able to propose optimization suggestions based on evaluation results;
5. Have excellent logical thinking and data insight, strong communication and collaboration skills and self-motivation, and be able to work efficiently in a cross-functional team environment.