**------------------------------------------------------------------------------------------------ * @header_start * WebGrab+Plus ini for grabbing EPG data from TvGuide websites * @Site: locatetv.com * @MinSWversion: 1.1.1/53 * @Revision 3 - [27/06/2014] Jan van Straaten * - split actor /presenter loop * @Revision 2 - [26/06/2014] Francis De Paemeleere * - make .channels.xml generation generic for (US/UK/IE) * @Revision 1 - [25/06/2014] Jan van Straaten/Hicks * - update to site * @Revision 0 - [21/06/2014] Jan van Straaten * - creation * @Remarks: readme.txt for customization * @header_end **------------------------------------------------------------------------------------------------ * site {url=locatetv.com|timezone=US/Eastern|maxdays=12|cultureinfo=en-US|charset=utf-8|titlematchfactor=90} *url_index{url|http://www.locatetv.com/listings/|channel|#|urldate}*http://www.locatetv.com/listings/cnbc-hd#23-Jun-2014 *urldate.format{datestring|dd-MMM-yyyy} * url_index{url|http://www.locatetv.com|channel|?offset=|urldate} urldate.format{daycounter|0} url_index.headers {customheader=Accept-Encoding=gzip,deflate} url_index.headers {customheader=X-Requested-With=XMLHttpRequest} * index_urlchannellogo {url||
|style="width: 0px"> } index_showsplit.modify {cleanup(removeduplicates)} * simple removeduplicates sufficient? scope.range {(indexshowdetails)|end} index_start.scrub {single(excludeblock="min""now")|
  • |
  • |} * index_temp_1.scrub {single|
  • |
  • |} * start value without excludeblock *in case
  • 17 mins ago
  • index_temp_2.modify {substring(type=regex)|'index_temp_1' "\A(\d{1,2}) min"} * the minutes 'ago' index_temp_2.modify {addstart(not "")|00:} *timespan format *in case
  • in 17 mins
  • index_temp_3.modify {substring(type=regex)|'index_temp_1' "\Ain (\d{1,2})"} * the minutes to go index_temp_3.modify {addstart(not "")|00:} *timespan format * calculate the start time from the 'now' (index_variable_element) value * if ago index_start.modify {calculate('index_temp_2' not "" format=time)|'index_variable_element' 'index_temp_2' -} * if to go index_start.modify {calculate('index_temp_3' not "" format=time)|'index_variable_element' 'index_temp_3' +} * index_showicon.scrub{single|>|"/>} * index_temp_4.scrub {single||">||} index_temp_5.scrub {single||">||} index_title.modify {addstart('index_temp_5' "")|'index_temp_4'} index_title.modify {addstart('index_temp_5' not "")|'index_temp_5'} index_subtitle.modify {addstart('index_temp_5' not "")|'index_temp_4'} * episode, two cases : * subtitle starts with Season 3 Episode 5: .... index_episode.modify {substring(type=regex)|'index_subtitle' "\A(.+?Episode \d{1,}):"} * subtitle starts with EPISODE: 25 index_episode.modify {substring('index_episode' "" type=regex)|'index_subtitle' "\A(EPISODE: \d{1,})"} * index_subtitle.modify {remove(type=regex)|"\A('index_episode': )"} index_subtitle.modify {remove(type=regex)|"\A('index_episode')"} index_episode.modify {remove|:} index_episode.modify {cleanup(style=lower)} index_description.scrub {single|

    ||

    |

    } * * details and subdetails *index_urlshow.modify {clear} *index_urlsubdetail.modify {clear} *index_temp_1.modify {clear} *index_temp_2.modify {clear} * * if there is a 'star appendLink series' * get title, desc, cat * get the urlshow from the 'star appendLink series' index_urlshow.scrub {single||href="|">|} index_urlshow.modify {addstart('index_urlshow' not "")|http://www.locatetv.com} * * if there is no 'star appendLink series' the details are in the appendLink * in that case we can clear the index_description because the same is also in the details * get the urlshow from the 'star appendLink' but only is urlshow is still "" (no appendLink serie) index_description.modify {clear('index_urlshow' "")} index_temp_6.scrub {single||href="|">|} index_urlshow.modify {addstart('index_urlshow' "")|http://www.locatetv.com'index_temp_6'} index_urlshow.headers {customheader=Accept-Encoding=gzip,deflate} end_scope * title.scrub {single|
    |

    |

    |
    } description.scrub {single||" />} category.scrub {single|
    |||
    } productiondate.scrub {single|
    |||
    } category.modify {cleanup(tags="("")")} * actor.scrub {multi()|

    Cast

    |
    )"} *Barbara Walters Host *keith-david/21527">Keith David Stappleton * move Host and Anchor to presenter temp_1.modify {calculate(type=element format=F0)|'actor' #} loop {('temp_1' > "0" max=50)|end} temp_1.modify {calculate(format=F0)|1 -} temp_2.modify {substring(type=element)|'actor' 'temp_1' 1} * the credit to inspect presenter.modify {addstart('temp_2' ~ " Host")|####'temp_2'} * actor.modify {remove('temp_2' ~ " Host" type=element)|'actor' 'temp_1' 1} presenter.modify {addstart('temp_2' ~ " Anchor")|####'temp_2'} * actor.modify {remove('temp_2' ~ " Anchor" type=element)|'actor' 'temp_1' 1} end_loop presenter.modify {replace|####|\|} * make multi presenter.modify {replace||,} presenter.modify {cleanup} presenter.modify {substring(type=element)|0 8} * limits to 8 presenters * role? * remove role: *actor.modify {remove(type=regex)|".+?(.*?)\Z"} * altenative: add word 'as' between name and role actor.modify {replace(type=regex)|"().+"| as} actor.modify {cleanup} actor.modify {substring(type=element)|0 8} * limits to 8 actors * * the cast is also in a subdetail page, it lists director and producer: * enable the next two lines to get that *urlsubdetail.modify {addstart('index_urlshow' not "")|'index_urlshow'/cast} *urlsubdetail.headers {customheader=Accept-Encoding=gzip,deflate} subdetail_temp_1.scrub {multi|

    Credits

    |
    )"} subdetail_director.modify {substring(type=regex)|'subdetail_temp_1' "\A(.+?) Director"} subdetail_producer.modify {substring(type=regex)|'subdetail_temp_1' "\A(.+?) Producer"} subdetail_producer.modify {substring(type=regex)|'subdetail_temp_1' "\A(.+?) Executive-Producer"} * * actor is already in the detail page, but just in case: **subdetail_actor.scrub {multi|

    Cast

    |
    )"} **subdetail_actor.modify {cleanup(tags="<"">")} **subdetail_actor.modify {substring(type=element)|0 8} * limits to 8 actors **subdetail_actor.modify {substring(type=regex)|"\A(.+?) as "} * optional removal of role ** _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ** ##### CHANNEL FILE CREATION (only to create the locate.channel.xml file) ** ** @auto_xml_channel_start *site {loadcookie=locatetv.com_cookies.txt} *subpage.format{list(format=F0 step=1 count=25)|1} *url_index {url|http://www.locatetv.com/listings/?start=&page=|subpage} *index_site_channel.scrub {regex||
  • ]*class="channel"[^>]*data-name="([^>]*)">||} *index_site_id.scrub {regex||
  • ]*class="channel".*?