Duties include:
• Model development support: Assist the team in developing a large basic model for surgical video understanding, including the use of annotation tools on the hand
Annotate key actions, instrument use, etc. in surgical videos; based on the annotated data, participate in surgical behavior analysis
Module, surgical stage recognition module algorithm optimization and model training, record key parameters and experiments in the training process
Results.
• Data processing and construction: Go to cooperation hospitals and other real clinical scenes, collect surgical video data according to predetermined standards,
Perform data format conversion, denoising, etc. pre-processing; participate in building a surgical video dataset, and organize the metadata of surgical videos
Data information (such as surgical type, chief surgeon, surgical duration, etc.); Assist in building visual-language pre-training data
Collect, complete video clips and corresponding text description matching, proofreading work.
• Technical exploration and experimentation: Research state-of-the-art multimodal pre-training techniques, write a technical research report; under team guidance
Next, use existing open-source multimodal pre-trained models, combine surgical video data for small-scale experiments; adjust
Hyperparameters, modifying network structure, etc., explore ways to improve the model's learning effect on surgical environment context knowledge
Law.
• Clinical application assistance: In clinical scenes, assist doctors in using the developed surgical video analysis model to collect doctors
In the process of use, feedback opinions; organize model analysis results, generate visual reports, and provide direct clinical decision-making
View data for reference.
• Results organization and output: according to academic standards, organize and process the data and charts from the experimental process, write technical documentation; assist
The team prepares patent application materials, mines innovative points from research results; participates in academic paper writing
Work, responsible for part of the chapter's first draft writing and literature collection.
Requirements:
• Computer vision, natural language processing or related field learning background, with interdisciplinary knowledge foundation.
• Have experience in developing with multimodal large models (such as LLaVA, BLIP, Qwen-VL), understand model pre-training and fine-tuning processes; familiar with pre-training algorithms such as MAE, Dino, self-supervised learning, visual-language pairs
Equal, able to understand algorithm principles and their role in model training.
• Master Python programming language, able to use PyTorch, TensorFlow, etc. deep learning frameworks to complete simple
Model building and training tasks; familiar with RAG or Agent development technology (such as RAG, LangChain,
LlamaIndex), with relevant project practice experience.
• Have video processing and analysis project experience, familiar with common video data processing methods.
• Understand context learning, prompt engineering, and prompt fine-tuning techniques, with relevant practical experience.
• Applicants with academic paper writing or participation in research projects are preferred, with a certain research ability