Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【joneslanglasalle】抓取內容缺少,文章跳转链接不对 #17800

Open
1 task done
KwToPA opened this issue Dec 4, 2024 · 6 comments
Open
1 task done

【joneslanglasalle】抓取內容缺少,文章跳转链接不对 #17800

KwToPA opened this issue Dec 4, 2024 · 6 comments
Labels
RSS bug Something isn't working

Comments

@KwToPA
Copy link

KwToPA commented Dec 4, 2024

路由地址

/joneslanglasalle/:language?/:category{.+}?

完整路由地址

/joneslanglasalle/zh/trends-and-insights

相关文档

https://docs.rsshub.app/zh/routes/new-media#jones-lang-lasalle

预期是什么?

按照时间线倒序抓取

实际发生了什么?

抓到的内容 链接跳转为 https://www.joneslanglasalle.com.cn/zh/trends-and-insights 不是正确的文章链接

抓到的第一条是2024未来办公调研 实际第一条是
重庆“芯”观察——功率半导体将如何撬动产业园区需求

部署

自建

部署相关信息

No response

额外信息

感谢

这不是重复的 issue

  • 我已经搜索了 现有 issue,以确保该错误尚未被报告。
@KwToPA KwToPA added the RSS bug Something isn't working label Dec 4, 2024
Copy link
Contributor

github-actions bot commented Dec 4, 2024

Searching for maintainers:
  • /joneslanglasalle/:language?/:category{.+}?: @nczitzk

To maintainers: if you are not willing to be disturbed, list your username in scripts/workflow/test-issue/call-maintainer.js. In this way, your username will be wrapped in an inline code block when tagged so you will not be notified.

If all routes can not be found, the issue will be closed automatically. Please use NOROUTE for a route-irrelevant issue or leave a comment if it is a mistake.
如果所有路由都无法匹配,issue 将会被自动关闭。如果 issue 和路由无关,请使用 NOROUTE 关键词,或者留下评论。我们会重新审核。

@KwToPA
Copy link
Author

KwToPA commented Dec 4, 2024

const limit: number = Number.parseInt(ctx.req.query('limit') ?? '10', 10);

这条 '10'能否改成'12' 中国站点 语言 中英文 第一页都是12条

@KwToPA
Copy link
Author

KwToPA commented Dec 4, 2024

文章推送页面

中国

中文

https://www.joneslanglasalle.com.cn/zh/trends-and-insights/_jcr_content/maincontent/facetednavigation/tab1531846884154/tab-content/pagesection/experience-content/paginatedarticlegrid?page=1

英文

https://www.joneslanglasalle.com.cn/en/trends-and-insights/_jcr_content/maincontent/facetednavigation/tab1531846884154/tab-content/pagesection_850081153/experience-content/paginatedarticlegrid?page=1

查看文章推送页面的网页源代码 格式为

        <a href="/zh/trends-and-insights/cities/observation-of-chongqing-chips-industry">
            <div class="ti-tile ti-featured-tile">
                <div class="img" style="background-image: url('/images/apac/china/articles/jll-observation-of-chongqing-chips-industry-teaser-800x600.jpg.rendition/jll-752-423.jpeg')"></div>                <div class="ti-content">
                    <div class="ti-location-space">
                        <span class="ti-category">城市</span>
                    </div> <!-- ti-location-space -->
                    <div class="ti-title">重庆“芯”观察——功率半导体将如何撬动产业园区需求?</div>
                    <p class="ti-teaser">《重庆产业办公楼白皮书》抢“鲜”读</p>
                    <div class="ti-type-date">
                        
                        <span class="ti-date">11月26日</span>
                    </div> <!-- ti-type-date -->
                </div> <!-- ti-content -->
            </div>
        </a>

需要提取 <a href="

ti-title 和 ti-date

文章推送页面的第一个和后续11个html格式不一样

@pseudoyu
Copy link
Collaborator

pseudoyu commented Dec 5, 2024

I tested and found that the item and url is correct, can you provide more details?

CleanShot 2024-12-06 at 07 15 36@2x
CleanShot 2024-12-06 at 07 16 53@2x

@KwToPA
Copy link
Author

KwToPA commented Dec 6, 2024

抓取的第一条是 未来办公

image

查看自建ip/joneslanglasalle/zh/trends-and-insights 路由 发现 isPermaLink 标签下的确是文章链接

我用的是ttrss作为阅读器,显示网站的链接 不是文章的链接,Permalink的链接抓不到
image

英文站点 10条只抓了7条,其他三条是2022年的
image

里面第四条,Thailand的那条不是latest板块的,是Shaping the future of real estate中的内容。对照英文的推送页面,也没有泰国这条

https://www.joneslanglasalle.com.cn/en/trends-and-insights/_jcr_content/maincontent/facetednavigation/tab1531846884154/tab-content/pagesection_850081153/experience-content/paginatedarticlegrid?page=1

英文的isPermaLink也是文章链接,但ttrss还是抓不到

image

@KwToPA
Copy link
Author

KwToPA commented Dec 6, 2024

未来办公这条 https://www.joneslanglasalle.com.cn/zh/trends-and-insights/research/future-of-work-survey

页面上没看到发布日期

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RSS bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants