2. Based on the large model for indexing page recognition and mining, provide assurance for link discovery/time-sensitive crawling
3. Based on the large model, we will detect dead links, identify pornography/betting/reaction/low-quality content on the trillion web pages
4. Design a training link quality model, and score the quality of trillions of links
5. Responsible for URL normalization, deduplication and unification of trillions of links, improving crawling efficiency and saving storage resources
6. Based on large models to improve search data end-to-end efficiency and quality, and empower AI search
Position requirements
1. Familiar with c/c++、python etc. programming language, with good programming skills
2. Have some experience in data mining algorithms, have some experience in machine learning algorithms
3. Have hadoop/spark/hive etc. Big data processing experience
Have large model application related experience, prompt experience, and interest in algorithms
Have search/recommendation/advertising direction related experience