Добрый вечер, я использовал BeautifulSoup для извлечения некоторых данных с веб-сайта следующим образом:
from BeautifulSoup import BeautifulSoup
from urllib2 import urlopen
soup = BeautifulSoup(urlopen('http://www.fsa.gov.uk/about/media/facts/fines/2002'))
table = soup.findAll('table', attrs={ "class" : "table-horizontal-line"})
print table
Это дает следующий результат:
[<table width="70%" class="table-horizontal-line">
<tr>
<th>Amount</th>
<th>Company or person fined</th>
<th>Date</th>
<th>What was the fine for?</th>
<th>Compensation</th>
</tr>
<tr>
<td><a name="_Hlk74714257" id="_Hlk74714257"> </a>£4,000,000</td>
<td><a href="/pages/library/communication/pr/2002/124.shtml">Credit Suisse First Boston International </a></td>
<td>19/12/02</td>
<td>Attempting to mislead the Japanese regulatory and tax authorities</td>
<td> </td>
</tr>
<tr>
<td>£750,000</td>
<td><a href="/pages/library/communication/pr/2002/123.shtml">Royal Bank of Scotland plc</a></td>
<td>17/12/02</td>
<td>Breaches of money laundering rules</td>
<td> </td>
</tr>
<tr>
<td>£1,000,000</td>
<td><a href="/pages/library/communication/pr/2002/118.shtml">Abbey Life Assurance Company ltd</a></td>
<td>04/12/02</td>
<td>Mortgage endowment mis-selling and other failings</td>
<td>Compensation estimated to be between £120 and £160 million</td>
</tr>
<tr>
<td>£1,350,000</td>
<td><a href="/pages/library/communication/pr/2002/087.shtml">Royal & Sun Alliance Group</a></td>
<td>27/08/02</td>
<td>Pension review failings</td>
<td>Redress exceeding £32 million</td>
</tr>
<tr>
<td>£4,000</td>
<td><a href="/pubs/final/ft-inv-ins_7aug02.pdf" target="_blank">F T Investment & Insurance Consultants</a></td>
<td>07/08/02</td>
<td>Pensions review failings</td>
<td> </td>
</tr>
<tr>
<td>£75,000</td>
<td><a href="/pubs/final/spe_18jun02.pdf" target="_blank">Seymour Pierce Ellis ltd</a></td>
<td>18/06/02</td>
<td>Breaches of FSA Principles ("skill, care and diligence" and "internal organization")</td>
<td> </td>
</tr>
<tr>
<td>£120,000</td>
<td><a href="/pages/library/communication/pr/2002/051.shtml">Ward Consultancy plc</a></td>
<td>14/05/02</td>
<td>Pension review failings</td>
<td> </td>
</tr>
<tr>
<td>£140,000</td>
<td><a href="/pages/library/communication/pr/2002/036.shtml">Shawlands Financial Services ltd</a> - formerly Frizzell Life & Financial Planning ltd)</td>
<td>11/04/02</td>
<td>Record keeping and associated compliance breaches</td>
<td> </td>
</tr>
<tr>
<td>£5,000</td>
<td><a href="/pubs/final/woodwards_4apr02.pdf" target="_blank">Woodward Independent Financial Advisers</a></td>
<td>04/04/02</td>
<td>Pensions review failings</td>
<td> </td>
</tr>
</table>]
Я хотел бы экспортировать это в CSV, сохраняя структуру таблицы, отображаемую на веб-сайте, возможно ли это, и если да, то как?
Заранее благодарим за помощь.