Web Scraping Specialist

MLabs

Job Overview

Location

Remote

Salary

USD 75,000 - 150,000 yearly

Employment Type

Full-time

Work Arrangement

Remote

Sector

Information Technology & Software

Experience Level

Senior (5-8 years)

Application Deadline

April 29, 2026

About the Company

MLabs is a specialized consultancy focused on Haskell, Rust, Blockchain, and AI technologies. They also offer recruitment services within these niche sectors.

The company prides itself on being a Haskell, Rust, Blockchain, and AI consultancy, indicating a strong focus on cutting-edge technologies and deep technical expertise.

MLabs is committed to offering equal opportunities to all candidates, ensuring no discrimination and providing accessible job advertisements. Their goal is to foster a diverse and inclusive workplace.

MLabs Ltd collects and processes personal information for recruitment purposes only, managing data securely and in compliance with data protection laws. Data may be shared with clients and trusted partners for recruitment needs.

Job Description

We are seeking a skilled Web Scraping Specialist to join a dedicated technical team focused on building the infrastructure essential for training advanced AI models. This role is pivotal in developing systems that deliver vast quantities of web data.

Your responsibilities will include writing, testing, and refining high-performance code to extract data from diverse online sources, ensuring maximum reliability and efficiency. You will manage complex data retrieval tasks, including handling pagination and dynamic content, and ensure the quality of extracted data through rigorous cleaning and formatting.

The ideal candidate will possess advanced skills in Python or JavaScript, with expertise in libraries such as BeautifulSoup, Scrapy, or Selenium. A strong understanding of asynchronous programming, multithreading, and distributed scraping architectures is crucial. You should also have in-depth knowledge of HTML, CSS, JavaScript, and the DOM, along with experience in NoSQL databases like MongoDB.

This is a remote position requiring a 6-hour overlap with EST. We offer competitive compensation ranging from $75,000 to $150,000, along with a comprehensive benefits and equity package.

To apply for this role, click the Apply button on this page and follow the instructions.

Required Skills

PythonJavaScriptWeb ScrapingBeautifulSoupScrapySeleniumAsynchronous ProgrammingMultithreadingDistributed SystemsHTMLCSSDOMNoSQLMongoDBCloud InfrastructureAWSGoogle CloudAzure

Key Responsibilities

  • Write, test, and refine high-performance code to extract data from various online sources.
  • Manage complex data retrieval tasks, including handling pagination and dynamic content.
  • Clean and format extracted data to ensure it meets rigorous quality standards.
  • Store and manage scraped data in appropriate databases, optimizing for access speed and data integrity.
  • Monitor scraping processes and infrastructure to identify and resolve issues.

Qualifications

  • Demonstrated ability to extract data from complex websites with minimal supervision, supported by a portfolio of past projects.
  • Advanced skills in Python or JavaScript, specifically with libraries and frameworks such as BeautifulSoup, Scrapy, or Selenium.
  • Strong knowledge of asynchronous programming, multithreading, and distributed scraping architectures.
  • In-depth knowledge of HTML, CSS, JavaScript, and the Document Object Model (DOM).
  • Experience with NoSQL databases (e.g., MongoDB, Cassandra), including the ability to design efficient storage solutions.
  • Experience deploying and managing large-scale scraping jobs using cloud services such as AWS, Google Cloud, or Azure.
  • Ability to apply machine learning algorithms for data cleaning, categorization, or predictive analysis is preferred.
  • Active participation in relevant open-source projects is a plus.

Benefits & Perks

  • Competitive Compensation: A highly competitive salary ranging from $75,000 to $150,000.
  • Comprehensive benefits and equity package.
  • Impactful Work: Opportunity to work at the forefront of AI development and web-scale knowledge graph creation.
  • High-Output Culture: A professional environment that prioritizes low ego, technical autonomy, and rapid execution.
  • Remote Flexibility: This is a remote position requiring a 6-hour overlap with the core team's schedule.

How to Apply

Please submit your application through the provided link.

The demand for high-quality, web-scraped data is exploding, fueling the development of advanced AI models. This role is central to building the infrastructure that delivers massive datasets for AI training. You will leverage your expertise in Python or JavaScript to extract, clean, and manage data from complex online sources. Your work will directly impact the scaling of public web data accessibility, supporting cutting-edge AI research and development. This is an opportunity to contribute to a lean, technical team focused on rapid execution and innovation in a fast-paced environment.

Posted Date

April 14, 2026