Job Responsibilities:
- Proficient in Python, with expertise in web requests, data security, HTML, and JavaScript.
- Scrape large-scale data from the internet, primarily for training and optimizing foundational models.
- Perform large-scale cleaning of external data to ensure its suitability for large model training, providing high-quality, cleaned data to the algorithm team.
- Build and maintain automated data update pipelines to ensure continuous data updates and validity.
- Work closely with the algorithm team to understand algorithm requirements and ensure the effective transmission and application of data.
- Experience with data warehousing, able to handle internally generated data while flexibly scraping and processing data from external sources.
Job Requirements:
- Bachelor’s degree in Computer Science, Electrical Engineering, or a related field.
- Have 2 to 4 years of extensive experience in data crawling and data cleaning.
- Passionate about problem-solving, especially in data engineering and statistics.
- Self-driven with strong communication skills.