天道酬勤,学无止境

html-treebuilder

TreeBuilder 获取嵌入节点(TreeBuilder Get embedded nodes)

问题 基本上,我需要在 HTML 代码中获取所有这些人的姓名和电子邮件。 <thead> <tr> <th scope="col" class="rgHeader" style="text-align:center;">Name</th><th scope="col" class="rgHeader" style="text-align:center;">Email Address</th><th scope="col" class="rgHeader" style="text-align:center;">School Phone</th> </tr> </thead><tbody> <tr class="rgRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__0"> <td> Michael Bowen </td><td>mbowen@cpcisd.net</td><td>903-488-3671 ext3200</td> </tr><tr class="rgAltRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__1"> <td> Christian Calixto </td><td>calixtoc@cpcisd.net</td><td>903-488-3671 x

2021-11-22 12:30:28    分类:技术分享    html   perl   module   html-treebuilder

TreeBuilder Get embedded nodes

Basically, I need to get the names and emails from all of these people in the HTML code. <thead> <tr> <th scope="col" class="rgHeader" style="text-align:center;">Name</th><th scope="col" class="rgHeader" style="text-align:center;">Email Address</th><th scope="col" class="rgHeader" style="text-align:center;">School Phone</th> </tr> </thead><tbody> <tr class="rgRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__0"> <td> Michael Bowen </td><td>mbowen@cpcisd.net</td><td>903-488-3671 ext3200</td> </tr><tr class="rgAltRow" id="ctl00_ContentPlaceHolder1_rg_People_ctl00__1"> <td> Christian Calixto <

2021-11-12 01:19:39    分类:问答    html   perl   module   html-treebuilder

WWW::机械化提取帮助 - PERL(WWW::Mechanize Extraction Help - PERL)

问题 我正在尝试自动提取在网站上找到的成绩单。 由于站点在描述列表中格式化了采访,因此可以在 dl 标签之间找到整个成绩单。 下面的脚本允许我搜索站点并以纯文本格式提取文本,但我实际上正在寻找它来包含 dl 标记之间的所有内容,意思是 dd、dt 等。这将允许我们为面试开发我们自己的 CSS。 关于该页面需要注意的是,在采访过程中的不同点插入了中断语句。 我们发现的一些使用配对从网页中提取信息的工具发现这是一个问题,因为它只抓取信息直到 break 语句。 如果您指出我的不同方向,请记住一些事情。 这是我到目前为止所拥有的。 #!/usr/bin/perl -w use strict; use WWW::Mechanize; use WWW::Mechanize::TreeBuilder; my $mech = WWW::Mechanize->new(); WWW::Mechanize::TreeBuilder->meta->apply($mech); $mech->get("http://millercenter.org/president/clinton/oralhistory/madeleine-k-albright"); # find all <dl> tags my @list = $mech->find('dl'); foreach ( @list ) { print

2021-10-08 02:05:10    分类:技术分享    perl   parsing   screen-scraping   www-mechanize   html-treebuilder

WWW::Mechanize Extraction Help - PERL

I'm try to automate the extraction of a transcript found on a website. The entire transcript is found between dl tags since the site formatted the interview in a description list. The script I have below allows me to search the site and extract the text in a plain-text format, but I'm actually looking for it to include everything between the dl tags, meaning dd's, dt's, etc. This will allow us to develop our own CSS for the interview. Something to note about the page is that there are break statements inserted at various points during the interview. Some tools we've found that extract

2021-10-08 01:40:16    分类:问答    perl   parsing   screen-scraping   www-mechanize   html-treebuilder