Skip to content

fix(route/nhentai): fix detail route image src extraction and bypass Cloudflare with Puppeteer#22140

Open
FlanChanXwO wants to merge 8 commits into
DIYgod:masterfrom
FlanChanXwO:route/nhentai
Open

fix(route/nhentai): fix detail route image src extraction and bypass Cloudflare with Puppeteer#22140
FlanChanXwO wants to merge 8 commits into
DIYgod:masterfrom
FlanChanXwO:route/nhentai

Conversation

@FlanChanXwO
Copy link
Copy Markdown

Involved Issue / 该 PR 相关 Issue

Close # None

Example for the Proposed Route(s) / 路由地址示例

/nhentai/search/language:chinese+blue+archive/detail
/nhentai/index/parody/blue archive/detail

New RSS Route Checklist / 新 RSS 路由检查表

  • New Route / 新的路由
  • Anti-bot or rate limit / 反爬/频率限制
    • If yes, do your code reflect this sign? / 如果有, 是否有对应的措施?
  • Date and time / 日期和时间
    • Parsed / 可以解析
    • Correct time zone / 时区正确
  • New package added / 添加了新的包
  • Puppeteer

Note / 说明

This PR focuses on fixing the detail mode for existing /nhentai routes.

  • Fixed image parsing in detail mode: Updated getDetail() to correctly extract gallery images by supporting both data-src and src attributes, and improved high-quality image URL transformations.
  • Enabled Puppeteer: Set requirePuppeteer: true for nhentai routes to bypass Cloudflare anti-bot protection.
  • Refactored login flow: Replaced got-based login with Puppeteer to handle Cloudflare challenges.
  • Extended cookie cache: Increased login cookie cache duration from 3 days to 30 days.
  • Added maintainer: Added FlanChanXwO to the maintainer list.

Copilot AI review requested due to automatic review settings May 31, 2026 14:23
@github-actions github-actions Bot added the route label May 31, 2026
@FlanChanXwO FlanChanXwO changed the title Route/nhentai fix(route/nhentai): fix detail route image src extraction and bypass Cloudflare with Puppeteer May 31, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates the nhentai routes to rely on a browser automation flow for login/torrent fetching (likely to handle Cloudflare), and updates route metadata accordingly.

Changes:

  • Replace cookie acquisition and torrent download logic with a Puppeteer-driven approach (plus cookie refresh on expiry).
  • Make nhentai routes declare Puppeteer as required and add a new maintainer.
  • Improve image URL extraction robustness and adjust date parsing defaulting.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
lib/routes/nhentai/util.tsx Switches cookie + torrent retrieval to browser automation, refines image extraction/date parsing
lib/routes/nhentai/search.ts Declares Puppeteer requirement and updates maintainers
lib/routes/nhentai/index.ts Declares Puppeteer requirement and updates maintainers

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +27 to 36
const { page, destroy } = await getPuppeteerPage(loginUrl, {
onBeforeLoad: async (page) => {
const allowedTypes = new Set(['document', 'script', 'xhr', 'fetch', 'stylesheet']);
await page.setRequestInterception(true);
page.on('request', (request) => {
allowedTypes.has(request.resourceType()) ? request.continue() : request.abort();
});
},
followRedirect: false,
gotoConfig: { waitUntil: 'domcontentloaded' },
});
Comment on lines +291 to +297
.map((ele) => {
const img = $(ele);
const src = img.attr('data-src') || img.attr('src');
return src ? new URL(src, baseUrl).href : null;
})
.filter((src) => src !== null)
.map((src) => src.replace(/(.+)(\d+)t\.(.+)/, (_, p1, p2, p3) => `${p1}${p2}.${p3}`))
...simple,
title: $('div#info > h2').text() || $('div#info > h1').text(),
pubDate: parseDate($('time').attr('datetime')),
pubDate: parseDate($('time').attr('datetime') || ''),
Comment on lines +39 to +53
await new Promise((resolve) => setTimeout(resolve, 5000));

let currentUrl = page.url();
let title = await page.title();

let attempts = 0;
// eslint-disable-next-line no-await-in-loop
while ((title.includes('Just a moment') || currentUrl.includes('challenges.cloudflare')) && attempts < 10) {
// eslint-disable-next-line no-await-in-loop
await new Promise((resolve) => setTimeout(resolve, 3000));
currentUrl = page.url();
// eslint-disable-next-line no-await-in-loop
title = await page.title();
attempts++;
}
Comment on lines +14 to 16
requirePuppeteer: true,
antiCrawler: true,
supportBT: true,
Comment on lines +18 to 20
requirePuppeteer: true,
antiCrawler: true,
supportBT: true,
@github-actions github-actions Bot added the auto: not ready to review Users can't get the RSS feed output according to automated testing results label May 31, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Successfully generated as following:

http://localhost:1200/nhentai/search/language:chinese+blue+archive/detail - Failed ❌
HTTPError: Response code 503 (Service Unavailable)

Error Message:<br/>Error: this route is empty, please check the original site or &lt;a href=&quot;https://github.com/DIYgod/RSSHub/issues/new/choose&quot;&gt;create an issue&lt;/a&gt;
Route: /nhentai/search/:keyword/:mode?
Full Route: /nhentai/search/language:chinese+blue+archive/detail
Node Version: v24.16.0
Git Hash: f16b1db4
http://localhost:1200/nhentai/index/parody/blue archive/detail - Success ✔️
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>nhentai - parody - blue archive</title>
    <link>https://nhentai.net/parody/blue-archive/</link>
    <atom:link href="http://localhost:1200/nhentai/index/parody/blue%20archive/detail" rel="self" type="application/rss+xml"></atom:link>
    <description>hentai - Powered by RSSHub</description>
    <generator>RSSHub</generator>
    <webMaster>contact@rsshub.app (RSSHub)</webMaster>
    <language>en</language>
    <lastBuildDate>Sun, 31 May 2026 14:28:29 GMT</lastBuildDate>
    <ttl>5</ttl>
    <item>
      <title></title>
      <description>&lt;h1&gt;0 pages&lt;/h1&gt;&lt;br&gt;</description>
      <link>https://nhentai.net/g/653594/</link>
      <guid isPermaLink="false">https://nhentai.net/g/653594/</guid>
      <pubDate>Invalid Date</pubDate>
    </item>
    <item>
      <title></title>
      <description>&lt;h1&gt;0 pages&lt;/h1&gt;&lt;br&gt;</description>
      <link>https://nhentai.net/g/653510/</link>
      <guid isPermaLink="false">https://nhentai.net/g/653510/</guid>
      <pubDate>Invalid Date</pubDate>
    </item>
    <item>
      <title></title>
      <description>&lt;h1&gt;0 pages&lt;/h1&gt;&lt;br&gt;</description>
      <link>https://nhentai.net/g/653493/</link>
      <guid isPermaLink="false">https://nhentai.net/g/653493/</guid>
      <pubDate>Invalid Date</pubDate>
    </item>
    <item>
      <title></title>
      <description>&lt;h1&gt;0 pages&lt;/h1&gt;&lt;br&gt;</description>
      <link>https://nhentai.net/g/653482/</link>
      <guid isPermaLink="false">https://nhentai.net/g/653482/</guid>
      <pubDate>Invalid Date</pubDate>
    </item>
    <item>
      <title></title>
      <description>&lt;h1&gt;0 pages&lt;/h1&gt;&lt;br&gt;</description>
      <link>https://nhentai.net/g/653455/</link>
      <guid isPermaLink="false">https://nhentai.net/g/653455/</guid>
      <pubDate>Invalid Date</pubDate>
    </item>
  </channel>
</rss>

@github-actions
Copy link
Copy Markdown
Contributor

Auto Review

No clear rule violations found in the current diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto: not ready to review Users can't get the RSS feed output according to automated testing results route

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants