Hi,
I'm trying to create a site ini file for ICTV www.ictv.net.au. They use very weird page names for each day on their main index.
For example, the link to this Sunday's index is 82-sunday-july-21-2013.html , but Monday is 76-monday-july-22-2013.html.
The sequence of numbers (82, 76) is random and doesn't follow a pattern I can use in subpage.format or urldate.format.
What should I do?
Thanks for your time.
Hi,
how strange! However I don't think the extra number is random. They seem to be fixed like this:
76 monday
77 tuesday
78 -- redirects to homepage
84 wednesday
79 thursday
80 friday
81 saterday
82 sunday
(the reason I think they are fixed is that you can find a few from the past like 80-friday-july-5-2013). Not sure though.
You can use a simple url for the index pages , http://www.ictv.net.au/76-monday.html or even http://www.ictv.net/76-.html for monday (the sites redirects you to the monday in the schedule week list)
If the weekday numbers are indeed fixed you can use the urldate format weekdayname like this:
urldate.format {weekdayname|76-monday|77-tuesday|84-wednesday|79-thursday|80-friday|81-saterday|82-sunday}
The rest of the date handling depends on the refresh schedule of the site. You have to check the date on the index page and disable it if it is already past. I can help you if you get there.
(Please use the site_ini_template from the download page because it already has the proper header.)
Jan
Thank you for the reply, Jan.
I now see they are fixed too, even Google search shows dates going back a few weeks with the same 2 digit day number.
I have changed url_index and urldate.format to match yours and I get the attached results.
Found an error with maxdays and also change single to multi in the index scrubbers, but results are same.
Hi Smacca,
I completed what you started. You can download it @http://webgrabplus.com/sites/default/files/download/ini/info/zip/Australia_ictv.net.au.zip.
You will also need to install the new beta build http://www.webgrabplus.com/sites/default/files/patchexe_prebuild.zip because the it needs the half hour timezone setting which is not supported in the current version.
Jan
Thank you so much Jan! I had no idea I needed to do calculations to get results. This latest ini is almost perfect, but there is one more problem I can't seem to fix.
When WG scrubs 76-monday-july-22-2013.html, it places the shows into 20130723, etc. It is putting all shows 1 day ahead. Also, the pages on ICTV are so bad, they list yesterday and tomorrow's shows at the beginning and end of each table/tbody. Is there a way to ignore these extra showings?
Thanks again, I would be lost without your expertise :)
For a couple of days try firstshow=3 (in the line that starts with 'site {' in the ini file)
That skips a few showshows on the first index page. It should solve the problem. Please report back if it does or doesn't
Jan
Great, that seems to fix the date issue. I'll monitor it and report back.
ICTV has just updated their TV guides for a full week! They're very messy and I think the firstshow= fix may not work after all.
This page is perfect: http://www.ictv.net.au/79-thursday-july-25-2013.html
But Saturday is the worst: http://www.ictv.net.au/81-saturday-july-27-2013.html
XML below shows a number of problems from 26/07/13 onwards...
Smacca
Try this :
http://www.webgrabplus.com/sites/default/files/download/ini/info/zip/Australia_ictv.net.au.zip
I added a date filter in the showsplit operation.
Jan