I'm having problems with telerama.fr when grabbing data for Brava HDTV, Mezzo and TV MONDE Europe.
WG parses shows like this:
Show with ---- start = 19-4-2013 20:29:00 stop = 19-4-2013 22:46:00 title = <a class="self" target="_self" href="http://webgrabplus.com/tele/programmes-tv/rameau-castor-and-pollux%2C50139049.php">Rameau : Castor and Pollux</a>
Show with ---- start = 20-4-2013 20:29:00 stop = 20-4-2013 22:17:00 title = <a class="self" target="_self" href="http://webgrabplus.com/tele/programmes-tv/the-pyongang-concert%2C50265795.php">The Pyongang Concert</a>
Show with ---- start = 19-4-2013 20:30:00 stop = 19-4-2013 21:15:00 title = <img src="http://webgrabplus.com/%3Ca%20href%3D"http://icon.telerama.fr/label/television/grand/4.png"">http://icon.telerama.fr/label/television/grand/4.png" class="tv10-ico-t" alt="On aime beaucoup"><a class="self" target="_self" href="http://webgrabplus.com/tele/programmes-tv/scheherazade%2C38370279.php">Schéhérazade</a>
Show with ---- start = 19-4-2013 21:00:00 stop = 19-4-2013 22:35:00 title = <a class="self" target="_self" href="http://webgrabplus.com/tele/telefilm/un-flic%2C4837509.php">Un flic</a>
This site was ok sometime ago, so something must have changed...
Hi Willemx
there was another small change of the site that left html tags in the title.
The attached version removes them.
It needs the latest build 49 of the program !!
Jan
Jan,
It's ok now; many thanks!
Willem