Robot | Path | Permission |
GoogleBot | / | ✔ |
BingBot | / | ✔ |
BaiduSpider | / | ✔ |
YandexBot | / | ✔ |
Title | WebMagic |
Description | 特性: --> WebMagic --> 简单灵活的Java爬虫框架。 --> --> English WebMagic是一个简单灵活的Java爬虫框架。基于WebMagic,你可以快速开发出一个高效、易维护的爬虫。 特性: 简单的API,可快速上手 模块化的结构,可轻松扩展 提供多线程和分 |
Keywords | N/A |
WebSite | webmagic.io |
Host IP | 185.199.109.153 |
Location | - |
Site | Rank |
US$702,646
Last updated: 2023-05-09 07:18:59
webmagic.io has Semrush global rank of 15,063,497. webmagic.io has an estimated worth of US$ 702,646, based on its estimated Ads revenue. webmagic.io receives approximately 81,075 unique visitors each day. Its web server is located in -, with IP address 185.199.109.153. According to SiteAdvisor, webmagic.io is safe to visit. |
Purchase/Sale Value | US$702,646 |
Daily Ads Revenue | US$649 |
Monthly Ads Revenue | US$19,458 |
Yearly Ads Revenue | US$233,495 |
Daily Unique Visitors | 5,405 |
Note: All traffic and earnings values are estimates. |
Host | Type | TTL | Data |
webmagic.io. | A | 599 | IP: webmagic-io.github.io. |
webmagic-io.github.io. | A | 3600 | IP: 185.199.109.153 |
webmagic-io.github.io. | A | 3600 | IP: 185.199.110.153 |
webmagic-io.github.io. | A | 3600 | IP: 185.199.111.153 |
webmagic-io.github.io. | A | 3600 | IP: 185.199.108.153 |
webmagic.io. | AAAA | 599 | IPV6: webmagic-io.github.io. |
webmagic-io.github.io. | AAAA | 3599 | IPV6: 2606:50c0:8001::153 |
webmagic-io.github.io. | AAAA | 3599 | IPV6: 2606:50c0:8002::153 |
webmagic-io.github.io. | AAAA | 3599 | IPV6: 2606:50c0:8003::153 |
webmagic-io.github.io. | AAAA | 3599 | IPV6: 2606:50c0:8000::153 |
webmagic.io. | NS | 599 | NS Record: webmagic-io.github.io. |
webmagic.io. | MX | 599 | MX Record: webmagic-io.github.io. |
webmagic.io. | TXT | 599 | TXT Record: webmagic-io.github.io. |
--> WebMagic --> 简单灵活的Java爬虫框架。 --> --> English WebMagic是一个简单灵活的Java爬虫框架。基于WebMagic,你可以快速开发出一个高效、易维护的爬虫。 特性: 简单的API,可快速上手 模块化的结构,可轻松扩展 提供多线程和分布式支持 一个示例: public class GithubRepoPageProcessor implements PageProcessor { private Site site = Site . me (). setRetryTimes ( 3 ). setSleepTime ( 1000 ). setTimeOut ( 10000 ); @Override public void process ( Page page ) { page . addTargetRequests ( page . getHtml (). links (). regex ( "(https://github\\.com/[\\w\\-]+/[\\w\\-]+)" ). all ()); page . addTargetRequests ( page . getHtml (). links (). regex ( "(https://github\\.com/[\\w\\-])" ). all ()); page . putField ( "author" , page . getUrl (). regex ( "https://github\\.com/(\\w+)/.*" ). toString ()); page . putField ( "name" , page . getHtml (). xpath ( "//h1[@class=’entry-title public’]/strong/a/text()" ). toString ()); if ( page . getResultItems (). get ( "name" )== null ){ //skip this page page . setSkip ( true ); } page . putField ( "readme" , page . getHtml (). xpath ( |
HTTP/1.1 200 OK Server: GitHub.com Content-Type: text/html; charset=utf-8 Last-Modified: Tue, 17 Oct 2017 06:11:17 GMT Access-Control-Allow-Origin: * ETag: "59e59f05-2430" expires: Sun, 31 Oct 2021 11:46:17 GMT Cache-Control: max-age=600 x-proxy-cache: MISS X-GitHub-Request-Id: A5E0:2BA4:740A77:FEEED3:617E7FB1 Content-Length: 9264 Accept-Ranges: bytes Date: Sun, 31 Oct 2021 11:36:17 GMT Via: 1.1 varnish Age: 0 Connection: keep-alive X-Served-By: cache-stl4839-STL X-Cache: MISS X-Cache-Hits: 0 X-Timer: S1635680178.528754,VS0,VE36 Vary: Accept-Encoding X-Fastly-Request-ID: 9a22b16c8dcfdfa8b82944d46dde94df4ea3f5e8 |
Domain Name: WEBMAGIC.IO Registry Domain ID: D503300000040371240-LRMS Registrar WHOIS Server: whois.gandi.net Registrar URL: https://www.gandi.net/whois Updated Date: 2021-03-21T03:58:16Z Creation Date: 2014-04-10T07:52:14Z Registry Expiry Date: 2023-04-10T07:52:14Z Registrar: Gandi SAS Registrar IANA ID: 81 Registrar Abuse Contact Email: abuse@support.gandi.net Registrar Abuse Contact Phone: +33.170377661 Domain Status: ok https://icann.org/epp#ok Registrant Country: CN Name Server: F1G1NS2.DNSPOD.NET Name Server: F1G1NS1.DNSPOD.NET DNSSEC: unsigned >>> Last update of WHOIS database: 2021-09-18T21:29:10Z <<< |