<p><strong>Duties include:</strong></p><p>• Model development support: Assist the team in developing a large basic model for surgical video understanding, including the use of annotation tools on the hand</p><p>Annotate key actions, instrument use, etc. in surgical videos; based on the annotated data, participate in surgical behavior analysis</p><p>Module, surgical stage recognition module algorithm optimization and model training, record key parameters and experiments in the training process</p><p>Results.</p><p>• Data processing and construction: Go to cooperation hospitals and other real clinical scenes, collect surgical video data according to predetermined standards,</p><p>Perform data format conversion, denoising, etc. pre-processing; participate in building a surgical video dataset, and organize the metadata of surgical videos</p><p>Data information (such as surgical type, chief surgeon, surgical duration, etc.); Assist in building visual-language pre-training data</p><p>Collect, complete video clips and corresponding text description matching, proofreading work. </p><p>• Technical exploration and experimentation: Research state-of-the-art multimodal pre-training techniques, write a technical research report; under team guidance</p><p>Next, use existing open-source multimodal pre-trained models, combine surgical video data for small-scale experiments; adjust</p><p>Hyperparameters, modifying network structure, etc., explore ways to improve the model's learning effect on surgical environment context knowledge</p><p>Law.</p><p>• Clinical application assistance: In clinical scenes, assist doctors in using the developed surgical video analysis model to collect doctors</p><p>In the process of use, feedback opinions; organize model analysis results, generate visual reports, and provide direct clinical decision-making</p><p>View data for reference.</p><p>• Results organization and output: according to academic standards, organize and process the data and charts from the experimental process, write technical documentation; assist</p><p>The team prepares patent application materials, mines innovative points from research results; participates in academic paper writing</p><p>Work, responsible for part of the chapter's first draft writing and literature collection. </p><p></p><p><strong>Requirements:</strong></p><p>• Computer vision, natural language processing or related field learning background, with interdisciplinary knowledge foundation.</p><p>• Have experience in developing with multimodal large models (such as LLaVA, BLIP, Qwen-VL), understand model pre-training and fine-tuning processes; familiar with pre-training algorithms such as MAE, Dino, self-supervised learning, visual-language pairs</p><p>Equal, able to understand algorithm principles and their role in model training.</p><p>• Master Python programming language, able to use PyTorch, TensorFlow, etc. deep learning frameworks to complete simple</p><p>Model building and training tasks; familiar with RAG or Agent development technology (such as RAG, LangChain,</p><p>LlamaIndex), with relevant project practice experience.</p><p>• Have video processing and analysis project experience, familiar with common video data processing methods.</p><p>• Understand context learning, prompt engineering, and prompt fine-tuning techniques, with relevant practical experience.</p><p>• Applicants with academic paper writing or participation in research projects are preferred, with a certain research ability</p>