Я пытаюсь написать веб-скребок. Я хочу получить все ячейки подряд. Строка перед тем, что я хочу, имеет ТОРГОВЫЕ ЗАСЕДАНИЯ в качестве ее обычного текстового значения. Я могу успешно получить эту строку. Но я не могу понять, как получить следующие дочерние строки, которые являются ячейками или тегами <td>
.
if ($foundTag = FindTagByText("THOROUGHBRED MEETINGS", $html))
{
$cell = $foundTag->parent();
$row = $cell->parent();
$nextRow = $row->next_sibling();
echo "Row: ".$row->plaintext."<br />\n";
echo "Next Row: ".$nextRow->plaintext."<br />\n";
$cells = $nextRow->children();
foreach ($cells as $cell)
{
echo "Cell: ".$cell->plaintext."<br />\n";
}
}
function FindTagByText($text, $html)
{
// Use Simple_HTML_DOM special selector 'text'
// to retrieve all text nodes from the document
$textNodes = $html->find('text');
$foundTag = null;
foreach($textNodes as $textNode)
{
if($textNode->plaintext == $text)
{
// Get the parent of the text node
// (A text node is always a child of
// its container)
$foundTag = $textNode->parent();
break;
}
}
return $foundTag;
}
Вот html, который я пытаюсь проанализировать:
<tr valign=top>
<td colspan=16 bgcolor=#999999><b>THOROUGHBRED MEETINGS</b></td>
</tr>
<tr valign=top bgcolor="#ffffff">
<td><b>BR</b> <a href="meeting?mtg=br&day=today&curtype=0">SUNSHINE COAST</a></td>
<td>FINE/DEAD</b></td>
<td><font color=#cc0000><b>R1</b></font>@<b>12:30pm</b></td>
<td align=center bgcolor=#cc0000><a href="odds?mting=BR01000"><b><font color=#ffffff>1</a></font></td>
<td align=center><a href="odds?mting=BR02000"><b><font color=black>2</b></font></a></td>
<td align=center><a href="odds?mting=BR03000"><b><font color=black>3</b></font></a></td>
<td align=center><a href="odds?mting=BR04000"><b><font color=black>4</b></font></a></td>
<td align=center><a href="odds?mting=BR05000"><b><font color=black>5</b></font></a></td>
<td align=center><a href="odds?mting=BR06000"><b><font color=black>6</b></font></a></td>
<td align=center><a href="odds?mting=BR07000"><b><font color=black>7</b></font></a></td>
<td align=center><a href="odds?mting=BR08000"><b><font color=black>8</b></font></a></td>
<td bgcolor="#ffffff" colspan=4> </td>
</tr>
Вот мой вывод:
Row: THOROUGHBRED MEETINGS Next Row: BR SUNSHINE COAST FINE/DEAD [email protected]:30pm 1 2 3 4 5 6 7 8 CR NEW ZEALAND FINE/DEAD [email protected]:10am 1 2 3 4 5 6 7 8 9 DR HOBART OCAST/HVY [email protected]:15pm 1 2 3 4 5 6 7 MR CRANBOURNE OCAST/SLOW [email protected]:20pm 1 2 3 4 5 6 7 8 NR COFFS HARBOUR OCAST/SLOW [email protected]:45pm 1 2 3 4 5 6 7 8 SR MORUYA FINE/GOOD [email protected]:25pm 1 2 3 4 5 6 7 8 VR BENALLA OCAST/SLOW [email protected]:35pm 1 2 3 4 5 6 7 8 XR KALGOORLIE FINE/GOOD [email protected] 3:00pm 1 2 3 4 5 6 7 HARNESS MEETINGS DT LAUNCESTON SHWRY/GOOD [email protected] 4:57pm 1 2 3 4 5 6 7 8 9 10 MT CRANBOURNE OCAST/GOOD [email protected] 5:05pm 1 2 3 4 5 6 7 8 GREYHOUND MEETINGS AD GAWLER OCAST/GOOD [email protected] 5:10pm 1 2 3 4 5 6 7 8 9 10 11 CD CANBERRA OCAST/GOOD [email protected] 5:02pm 1 2 3 4 5 6 7 8 9 10 11 MD SALE FINE/GOOD [email protected] 4:54pm 1 2 3 4 5 6 7 8 9 10 11 12 Cell: BR SUNSHINE COAST Cell: FINE/DEAD Cell: [email protected]:30pm Cell: 1 2 3 4 5 6 7 8 CR NEW ZEALAND FINE/DEAD [email protected]:10am 1 2 3 4 5 6 7 8 9 DR HOBART OCAST/HVY [email protected]:15pm 1 2 3 4 5 6 7 MR CRANBOURNE OCAST/SLOW [email protected]:20pm 1 2 3 4 5 6 7 8 NR COFFS HARBOUR OCAST/SLOW [email protected]:45pm 1 2 3 4 5 6 7 8 SR MORUYA FINE/GOOD [email protected]:25pm 1 2 3 4 5 6 7 8 VR BENALLA OCAST/SLOW [email protected]:35pm 1 2 3 4 5 6 7 8 XR KALGOORLIE FINE/GOOD [email protected] 3:00pm 1 2 3 4 5 6 7 HARNESS MEETINGS DT LAUNCESTON SHWRY/GOOD [email protected] 4:57pm 1 2 3 4 5 6 7 8 9 10 MT CRANBOURNE OCAST/GOOD [email protected] 5:05pm 1 2 3 4 5 6 7 8 GREYHOUND MEETINGS AD GAWLER OCAST/GOOD [email protected] 5:10pm 1 2 3 4 5 6 7 8 9 10 11 CD CANBERRA OCAST/GOOD [email protected] 5:02pm 1 2 3 4 5 6 7 8 9 10 11 MD SALE FINE/GOOD [email protected] 4:54pm 1 2 3 4 5 6 7 8 9 10 11 12