You will participate in building the data foundation for the next generation of large-scale speech models. Here, your work is not only about processing data, but also laying the foundation for artificial intelligence's "hearing" and "language understanding" abilities. We look forward to your enthusiastic participation in speech technology and natural language processing, and breaking through the boundaries of technology with us.
You will be responsible for the following key missions:
Building a high-quality speech dataset: Responsible for massive speech and text data cleaning, annotation, and structured processing
Optimize data pipeline: continuously improve data processing flow, and enhance model training efficiency
Cross-team collaboration: Work closely with algorithm engineers and product managers to maximize data value
Technical exploration: Participate in the pre-research and data design of cutting-edge speech technologies
We are looking for you:
Have solid Python programming skills, familiar with common data processing libraries (such as pandas, numpy, etc.)
Master Linux basic operations and be able to work efficiently in a server environment
Have excellent problem-solving skills and data sensitivity
Applicants with ASR/TTS-related data processing experience are preferred
Languages
English
Cantonese
Mandarin
Skills
Python (Programming Language)
Linux
Pandas
Numpy
HR Michelle
Hong Kong Generative AI Development Center Co., Ltd. · HR
Active within 3 days
Be careful
Don’t provide your bank or credit card details when applying for jobs.