Apify Announces the Launch of Crawlee for Python

July 23, 2024 at 07:38 AM EDT

Prague, Czech Republic – Apify, the world’s leading cloud platform for developing and running web scraping solutions, is excited to announce the launch of Crawlee for Python, a web scraping and browser automation library that helps users build fast and reliable crawlers. Crawlee was created by a team of experts who scrape for a living […]

Prague, Czech Republic - July 23, 2024 —

Apify, the world’s leading cloud platform for developing and running web scraping solutions, is excited to announce the launch of Crawlee for Python, a web scraping and browser automation library that helps users build fast and reliable crawlers.

Crawlee was created by a team of experts who scrape for a living and extract data from millions of web pages daily. Building upon the original Crawlee for Node.js, launched in 2022, Crawlee for Python offers an open-source solution that simplifies web crawler development.

“One of the main advantages of Crawlee is that the library has a single interface for both HTTP and headless browsers,” says Jan Čurn, CEO of web scraping and automation platform Apify. “You can write your crawlers using the same base abstraction, and the framework takes care of the heavy lifting such as parallelization, proxy rotation, and scaling.”

Crawlee for Python is developed and maintained by Apify. With clients including Siemens, Intercom, Microsoft, Groupon, and Accenture, Apify has become acclaimed in the industry for its innovative web scraping platform and marketplace for developers to monetize their software. Its open-source web scraping library, Crawlee, is designed to help devs build and maintain their crawlers faster.

“Developers of scrapers shouldn’t need to reinvent the wheel and can just focus on building the ‘business’ logic of their scrapers,” Čurn adds.

Some of the key features of the Crawlee for Python launch include:

Unified interface for HTTP and headless browser crawling.
HTTP: HTTPX with Beautiful Soup.
Headless browser: Users can switch their browsers from HTTP to a headless browser in 3 lines of code. Accessible with Chrome, Firefox, and other popular browsers, Crawlee builds on top of Playwright and adds its own features.
Automatic parallel crawling based on available system resources.
Written in Python with type hints to offer better DX (IDE autocompletion) and fewer bugs (static type checking).
Automatic retries on errors or when you’re getting blocked.
Integrated proxy rotation and session management.
Configurable request routing – direct URLs to appropriate handlers.
Persistent queue for URLs to crawl.
Pluggable storage of both tabular data and files.
Crawlee is built on Asyncio, so it’s fully asynchronous.

With an active Discord community of over 8,000 web scraping developers, an array of excellent benefits, and fully open source, Crawlee for Python prioritizes high-quality, readable, and maintainable code and reliable crawlers.

Apify encourages anyone interested in learning more about its Crawlee for Python announcement to try out the new web scraping and automation library today on the Crawlee website, where they can also join the Discord community.

About Apify

Founded in 2015, Apify has become renowned as the most flexible full-stack platform for web scraping and browser automation. With a commitment to making the web more programmable and automating mundane, repetitive tasks, Apify is where developers build, deploy, and publish web scraping, data extraction, and web automation tools.

More Information

To learn more about Apify and the launch of Crawlee for Python, please visit https://apify.com.

Source: https://thenewsfront.com/apify-announces-the-launch-of-crawlee-for-python/

About the company: Apify is the platform where developers build, deploy, and publish web scraping, data extraction, and web automation tools.

Contact Info:
Organization: Apify
Address: Lucerna Palace Vodickova 704/36 Prague 110 00 Czechia
Website: https://apify.com

Release ID: 89136411

Should any problems, inaccuracies, or doubts arise from the content contained within this press release, we kindly request that you inform us immediately by contacting error@releasecontact.com (it is important to note that this email is the authorized channel for such matters, sending multiple emails to multiple addresses does not necessarily help expedite your request). Our dedicated team will promptly address your concerns within 8 hours, taking necessary steps to rectify identified issues or assist with the removal process. Providing accurate and dependable information is at the core of our commitment to our readers.