WP Content Crawler – Get content from almost any site, automatically! | Prosyscom Tech
Get content from almost any site to your WordPress blog, automatically!
FOR WHAT IT CAN BE USED
- Create a personal site which collects news, posts, etc. from your favorite sites to see them in one place
- Use it with WooCommerce to collect products from shopping sites
- Collect products from affiliate programs to make money
- Collect posts to create a test environment for your plugin/theme
- Collect plugins, themes, apps, images from other sites to create a collection of them
- Keep track of competitors
- You can imagine anything. The internet is full of contents
HOW IT WORKS
It’s all about CSS selectors and you can learn how to use them in minutes by watching the introduction tutorial.
SEE IT IN ACTION, LEARN IN MINUTES
|Save every post detail
Title, date, excerpt, content, tags, meta keywords, meta description, featured image, post images, custom meta… Just everything.
Just click to an element to find its CSS selector. You can also get alternative CSS selectors that you might be interested in. There is no need to leave your admin panel anymore.
Recrawl posts to keep them updated all the time. You can limit how many times a post can be updated, set update interval, and ignore old posts.
|Custom post meta
Save anything as custom post meta. You can use a CSS selector or just type the value.
Use the artificial intelligence of Google Cloud Translation API or Microsoft Translator Text API to automatically translate the posts.
|Actions and filters
If you are a developer, you can use actions and filters to extend and modify the plugin.
Prepare post content, title, excerpt, list item and gallery item templates using short codes.
You can write alternative selectors to get the data even if the target site has post pages designed different from each other.
|Find and replace anything
You can use plain text or regular expressions to find and replace anything. You can even modify the HTML of the page, create your own HTML elements and write selectors to use them. You can even change image URLs. You have the power.
Target post has more than one page? No problem. You can save paginated posts as well.
|List type posts
Some sites create posts with a list inside. You can extract the list from the post, create a template that should be applied to each list item and even reverse the list.
|Remove unnecessary elements
Sometimes you need to get rid of some elements, such as advertisements, comments, you name it. Just write its CSS selector and it’s removed.
|Automatically insert category URLs
Target site has hundreds of categories? No problem. Just write the CSS selector and the plugin will insert them for you.
Add unlimited sites and activate how many of them you want.
You can import and export site settings easily. Just copy and paste the code created by the plugin.
Set post type. It can be a post, a page, a product, or any other type.
You can remove links from the post. Just check the checkbox and the links are gone. That easy.
You can set a password for the posts to show them only to the users who have the password.
You can add notes for yourself to remind you things about the site. CSS selectors, TODO list, anything.
|Test everything on the fly
Test post crawling, URL collection, CSS selectors, regular expressions, find and replace options and proxies on the fly.
Using the tools, you can save posts manually with their URL, recrawl posts with their ID or delete already-saved URLs.
|Custom general settings for each site
You can provide custom general settings for each post to override them and make them suitable for a site.
You can directly publish the saved posts or keep them as draft to check them before publishing.
|Save images as gallery
You can save the images in the target page as gallery and provide a template for each image to make it suitable for the gallery library that you use on frontend. You can also save the images as WooCommerce gallery by just checking one checkbox.
|Duplicate post check
Check duplicate posts by URL, post title and/or post content.
You can add/remove minutes to/from the post date. By this way, you can schedule post publishing.
|Any data as shortcode
Get anything from target page as a shortcode and use the shortcodes in the plugin’s templates to place any data anywhere you want.
Use a proxy or proxies to get content from the sites to which your IP does not have access. Currently supports HTTP and HTTPS proxies.
Attach cookies, such as session cookies, to each request.
|Crawl as many posts as you want
You can set how many times post-crawling or URL collection CRON events should run. By this way, you can, e.g., save 100 posts every minute. Just be careful and consider your server’s capacity.
Set CSS selectors whose values should not be empty for category and post pages. When an empty value is found using those selectors, you can get an email notification.
See what’s going on in the background. Active sites, number of posts crawled, number of posts updated, last crawled and updated posts, last added URLs, last and next run of CRON events, currently being saved posts and URLs…
You can check the online documentation whenever you feel a need.
|Quick guides right next to the settings
Each setting in the plugin has a quick guide that will help you understand what each setting does.
Watch video tutorials to easily learn how to use the plugin.
|Ready to translate
You can translate the plugin into your own language using Poedit.
|Get updates from your admin panel
You can update the plugin with just one click whenever an update is ready. Just go to your updates page in your admin panel.
REQUIREMENTS: PHP >= 5.6, mbstring
TESTED WITH WP VERSIONS: 4.9.4, 4.8.2, 4.7.2, 4.6.1, 4.5.3, 4.4.2, 4.3.3, 4.2.7, 4.1.10, 4.0.10, 3.9.11
LANGUAGES: English, Türkçe, Français (partial)
- Save post’s title, excerpt and content.
- Save meta keywords and description.
- Save images in the posts to your site.
- Save featured image of target post as the featured image of the post created in your site.
- Save the list in posts (extract the list from the post).
- Save paginated posts. If target post is paginated, the post in your site can be paginated as well.
- Save anything from target page as post meta.
- Find and replace anything in the target page’s HTML before saving its content.
- Find and replace using regular expressions.
- Find and replace in image URLs before saving them.
- Manipulate target page’s HTML before saving to make it suitable for your needs.
- Remove unnecessary elements from target page’s HTML.
- Map categories of your site to target site’s categories to keep posts organized.
- Automatically get category URLs for category mapping.
- Add unlimited categories for a site and automatically check for new posts uniformly.
- Set alternative CSS selectors for each setting. By this way, get content from the pages of the same site which are designed differently.
- Automatically save posts.
- Automatically check for new posts.
- Set maximum number of category pages to be checked automatically.
- Add unlimited sites.
- Import/export site settings.
- Set post type. For instance, you can add posts as products if you use WooCommerce.
- Set a template for the post.
- Set a template for each list item.
- Set passwords for created posts.
- Keep notes for each site to keep a changelog or anything you want.
- Test any CSS selector and find-and-replace setting right away.
- Test a site before activating it for automatic crawling.
- Manually create posts by providing the URL of the posts.
- If you want, use settings different than general settings for each site.
- Activate/deactive each site any time to start/stop automatic crawling.
- Either directly publish the posts or keep them as drafts to make changes before publishing.
- Online documentation
- Video tutorials for a quick start
- Ready to translate (.po file)
- Get updates from your WordPress admin panel.
- Save images as gallery. If you use WooCommerce, you can save images from target page as product gallery.
- Get anything from target page as a shortcode and use the shortcodes in templates to place any data anywhere you want.
- Save tags using CSS selectors.
- Find and replace in post tags.
- Limit the number of tags that can be added to a post.
- Use a proxy or proxies to get content from the sites to which your IP does not have access.
- Post title and excerpt templates in which you can use custom short codes.
- Find and replace in custom short code data.
- Add custom post meta without a selector.
- Set how many times URL collection and post crawling events should run each time for a site. For instance, you can save 3 posts every minute, or run URL collection 5 times every 2 minute.
- Collect post URLs in reverse order for each category page.
- Remove links from the content by just checking a checkbox. This will not touch the links manually added to the templates.
- Notifications. Set CSS selectors whose values should not be empty for category and post pages. When an empty value is found using those selectors, you can get an email notification.
- Visual selector. Just click to an element to find its CSS selector. You can also get alternative CSS selectors that you might be interested. There is no need to leave your admin panel anymore.
- Recrawl posts to keep them updated all the time.
- Attach cookies, such as session cookies, to each request.
- See what’s going on in the background in dashboard.
- Select post dates.
- Add/remove minutes to/from the post date. You can schedule post publishing by this way.
- Scheduled post delete.
- Duplicate post checking via URL, title and/or content.
- Perform advanced HTML manipulations: exchange element attributes, remove element attributes, find and replace in element attributes, manipulate HTML of an element.
- Find and replace in custom short code and custom post meta content
- Save all images in the post content by checking a single checkbox.
- An option to always use UTF8 encoding.
- Translate posts automatically using either Google Cloud Translation API or Microsoft Translator Text API.
- Modify and extend the plugin by using actions and filters.
v1.7.0 - 22 October 2017 * New: Translate posts automatically by using Google Cloud Translation API or Microsoft Translator Text API. * New: Randomize proxies. By checking this option, you can make the plugin randomly order the proxies you entered. * New: Over 50 filters and actions are added. If you are a developer, you can now use these to extend the plugin however you like. * Fixed: The proxies were used when there was an error getting the target page's source code. Now, they are always used, even when testing. * Fixed: Plugin's pages were not shown properly with PHP 7.1. * UI and UX improvements. v1.6.0 - 4 March 2017 * New: Date selectors. * New: Add/remove minutes to/from the post date. You can schedule post publishing by this way. * New: Scheduled post delete. * New: Duplicate post checking via URL, title and/or content. * New: More HTML manipulation options: exchange element attributes, remove element attributes, find and replace in element attributes, manipulate HTML of an element. * New: Find and replace in custom short code and custom post meta content * Improvement: More counts are shown in site listing. * Improvement: Save all images in the post content by checking a single checkbox. * Improvement: Reorder settings that can have multiple values. * Improvement: If the main template is empty, it will be considered as it contains [wcc-main-content] shortcode in it. * Improvement: An option to always use UTF8 encoding. * Improvement: Load general settings with a button when you are overwriting them for a site. * Improvement: Settings are grouped and reordered for better navigation. * Improvement: Auto refresh the dashboard every few seconds. * Improvement: Track CRON events and the next sites that will be processed by the CRON events in the dashboard. * Improvement: Better notifications for the required settings when performing a test. * Improvement: Auto find for next page URL, post date and post title in DEV tools. * Improvement: Remove elements using a CSS selector in DEV tools. This can be used to remove blocking elements to better select the elements you want. * Fix: Sometimes thumbnail images and post URLs did not match when category pages were crawled. * Fix: When importing site settings, form validation should not be performed. * Small bug fixes and improvements. v1.5.1 - 7 February 2017 * New: Dashboard. See what's going on behind the scenes. * Bug fixes and improvements. v1.4.1 - 27 January 2017 * Fixed: URLs in the queue should be saved uniformly according to their categories. v1.4.0 - 26 January 2017 * New: Post recrawling. Recrawl posts to update them regularly. * New: Proxy tester. Test if your proxies work correctly. * New: Cookies. Attach cookies to every request that is made to the target site. * Removes Lodash. * Small bug fixes and improvements. v1.3.0 - 14 January 2017 * New: Visual inspector * Fixed: Assets are not loaded on Windows servers. * Fixed: "General settings" link on plugins page does not work. * Fixed: Plugin does not crawl all active sites when there are more than 10 active sites. v1.2.0 - 30 August 2016 * New: You can now use proxy. * New: Set connection timeout in seconds. * New: Post title and excerpt templates in which you can use custom short codes. * New: Find and replace in custom short code data. * New: Maximum number of categories that can be added automatically via CSS selectors to the category map increased. * New: Add custom post meta without a selector. * New: You can set how many times URL collection and post crawling events should run each time for a site. For instance, you can save 3 posts every minute, or run URL collection 5 times every 2 minute. * New: You can collect post URLs in reverse order for each category page. * New: Remove links from all short code data. This will not touch the links manually added to the templates. * New: Notifications. You can now set CSS selectors whose values should not be empty for category and post pages. When an empty value is found using those selectors, you can get an email notification. * Fixed: Downloaded file's name does not have a proper file extension if the file on the target site is generated dynamically. * Fixed: Crawling stops if there is a request exception. * Fixed: Crawling stops if target page's HTML could not be retrieved.