Hi is it possible to add channel icons into this ini file.
Example: http://cdn1.siol.tv/logo2/150x80/mezzo.png
ATM i am using a solution that is not very nice.
Example of what i am using ATM:
I was not getting NameResolutionFailure error, but there was a problem with index_start.scrub. I have fixed it in this ini and it is working for me. Please note some channel names have changed also. See new channels.xml file.
In the webpage is like this: <p class="zanr" style="font-size: 12px;">Otroški in mladinski / Risanka, Ostalo <script type="text/javascript">raty_init();</script></p>
In ini file is: description.scrub {single|<p class="zanr">|<p>|</p>|<div class="clrA">}
Do you know why is there so many links for the channel logo? It is not standard for xmltv to have more than one and tvheadend doesn't know how to parse this.
The data is ok, just icons are multiple. I figured out the if i pull just for one day there is one icon. If i pull for let say 3 days, there are 3 the same icons. For every day it puts the same icon in the string, and they are seperated with |
It works pretty well. I wonder why the incremental update does not work. it shows everything as new even though it should be same. even if you run it 5 minutes after the initial grab.
In the afternoon works better I get time out issue only on some channels, but the EPG downloads anyway. This time out issue appears on channels randomly.
I have VDSL connection 10/4. In web browser works flawlessly. I also have clear line. So internet connection is not an Issue.
No i am not blacklisted. There must be something else. During the parsing time i can access the epg through browser. And also the incremental does not work it parses always from beginning.
I rewrote the ini. now it takes EPG from index pages. Its much faster and incremental works. You can still enable scrubing the inside pages if you want actors and directors... Episode is calculated for my personal preference, racalculate if using other systems it should be -1.
here's the ini:
site {url=siol.net|timezone=UTC+01:00|maxdays=4|cultureinfo=sl-SI|charset=UTF-8|titlematchfactor=90|ratingsystem=IMDB}
I know. Its not finished. I was just so happy that incremental work that i had to paste it in the forums. I am cleaning the code out an will atach ini when its finished.
Update: Ini ready for testing. Incremental working. Feedback welcome.
I am testing it right now. It is much slower because it scarps deeper into EPG. I assume if i comment out this part ************************* detailed page scrub it will read just from index page.
Maybe a good idea to make it optional would also be to allow thumbnail link scarp or. not since it takes some space in xml.
Also from the xml file (not yet imported into backend) I can still see this (n) at the end of each scarp. Will investigate once it is finished.
Still working on it and yes if you delete the part beyond the index it will be faster. As fo (n), you have to disable it in config.
n = nomark disables the update-type marking (n) (c) (g) (r) at the end of the description
Update:
Final version for my personal taste. Will modify it if i find any bugs. Dont forget to increase index-delay="xx", because of timeot issue with siol. Enjoy.
Hi! Good work and it realy works perfect for one week. Today I notice a small problem again? Am I the only one? If it is possible to fix it I will be very grateful...
This time it has finished in a little more then two hours. But it would take much more, because it has made only c (change) and almost no n (new) entries because i have run it right after finish to test it.
The ini file is not ok as it is, it does not work for me too, EPG is on the site and I can look at all of the programs. Seems they changed something again, if some knowlegable person can fix it, we all will be very happy :)
It works
Thanks
This ini don't work anymore :( , can somebody make new update pls?
This ini don't work anymore :( , can somebody make new update pls?
Hi is it possible to add channel icons into this ini file.
Example:
http://cdn1.siol.tv/logo2/150x80/mezzo.png
ATM i am using a solution that is not very nice.
Example of what i am using ATM:
<channel update="i" site="siol.net" site_id="SLO+1" site_channel="http://bite-in.com/siol/logos_new/slo1.png" xmltv_id="SLO+1">SLO+1</channel>
<channel update="i" site="siol.net" site_id="SLO+2" site_channel="http://bite-in.com/siol/logos_new/slo2.png" xmltv_id="SLO+2">SLO+2</channel>
<channel update="i" site="siol.net" site_id="Planet+TV" site_channel="http://bite-in.com/siol/logos_new/planettv.png" xmltv_id="Planet+TV">Planet+TV</channel>
<channel update="i" site="siol.net" site_id="POP+TV" site_channel="http://bite-in.com/siol/logos_new/poptv.png" xmltv_id="POP+TV">POP+TV</channel>
<channel update="i" site="siol.net" site_id="Kanal+A" site_channel="http://bite-in.com/siol/logos_new/akanal.png" xmltv_id="Kanal+A">Kanal+A</channel>
This ini don't work anymore :( can somebody update pls
Hi,
could someone take a look at this ini. It has stoped working.
i'm getting:
error downloading page: Error: NameResolutionFailure
pausing 3 of 4 times for 15 seconds before re-try.
I was not getting NameResolutionFailure error, but there was a problem with index_start.scrub. I have fixed it in this ini and it is working for me. Please note some channel names have changed also. See new channels.xml file.
Hi,
can someone check why we don't get this text in xml: Otroški in mladinski / Risanka, Ostalo
2015-11-10_13-25-16.jpg
Page link is: http://www.siol.net/tv-spored.aspx?p2=jOm1MhyqCsMpFWUoxjw8qw%3d%3d
Could you also grab this text from the page?
In the webpage is like this: <p class="zanr" style="font-size: 12px;">Otroški in mladinski / Risanka, Ostalo <script type="text/javascript">raty_init();</script></p>
In ini file is: description.scrub {single|<p class="zanr">|<p>|</p>|<div class="clrA">}
Tnx.
Anyone?
fixed the ini, added show icons, cleaned many errors, skips last show of day dunno why
it's partially working now
Thank you! It works great.
yeah but someone has to fix the bug 'skips last show of day' every day
that's beyond my WG knowledge, same is happening with a brazilian site.ini im triying to finish
And if you change index_showsplit into:
Francis has spoken!
upping final working tested ini
can be tranfered to EPG channels
Do you know why is there so many links for the channel logo? It is not standard for xmltv to have more than one and tvheadend doesn't know how to parse this.
<display-name lang="sl">24Kitchen HD</display-name>
<icon src="http://webgrabplus.com/%3Ca%20href%3D"http://cdn1.siol.tv/logo2/150x80/24kitchenhd.png">http://cdn1.siol.tv/logo2/150x80/24kitchenhd.png|http://cdn1.siol.tv/logo2/150x80/24kitchenhd.png|http://cdn1.siol.tv/logo2/150x80/24kitchenhd.png|http://cdn1.siol.tv/logo2/150x80/24kitchenhd.png" />
<url>http://www.siol.net</url>
Fixed!
I can't see any change from your last two files.
hmm it works well here, see the log & guide from a test i performed, from the supposed buggy channel
Strange,
my guide starts like this:
<?xml version="1.0" encoding="UTF-8"?>
<tv generator-info-name="WebGrab+Plus/w MDB & REX Postprocess -- version 1.54.6/0.01 -- Jan van Straaten" generator-info-url="http://www.webgrabplus.com">
<channel id="24Kitchen Adria">
<display-name lang="sl">24Kitchen Adria</display-name>
<icon src="http://webgrabplus.com/%3Ca%20href%3D"http://cdn1.siol.tv/logo2/150x80/doq.png">http://cdn1.siol.tv/logo2/150x80/doq.png|http://cdn1.siol.tv/logo2/150x80/doq.png|http://cdn1.siol.tv/logo2/150x80/doq.png|http://cdn1.siol.tv/logo2/150x80/doq.png" />
<url>http://www.siol.net</url>
</channel>
<channel id="24Kitchen HD">
<display-name lang="sl">24Kitchen HD</display-name>
<icon src="http://webgrabplus.com/%3Ca%20href%3D"http://cdn1.siol.tv/logo2/150x80/24kitchenhd.png">http://cdn1.siol.tv/logo2/150x80/24kitchenhd.png|http://cdn1.siol.tv/logo2/150x80/24kitchenhd.png|http://cdn1.siol.tv/logo2/150x80/24kitchenhd.png|http://cdn1.siol.tv/logo2/150x80/24kitchenhd.png" />
<url>http://www.siol.net</url>
</channel>
<channel id="Amc">
<display-name lang="sl">Amc</display-name>
<icon src="http://webgrabplus.com/%3Ca%20href%3D"http://cdn1.siol.tv/logo2/150x80/mgm.png">http://cdn1.siol.tv/logo2/150x80/mgm.png|http://cdn1.siol.tv/logo2/150x80/mgm.png|http://cdn1.siol.tv/logo2/150x80/mgm.png|http://cdn1.siol.tv/logo2/150x80/mgm.png" />
<url>http://www.siol.net</url>
</channel>
<channel id="Animal HD">
<display-name lang="sl">Animal HD</display-name>
<icon src="http://webgrabplus.com/%3Ca%20href%3D"http://cdn1.siol.tv/logo2/150x80/animalhd.png">http://cdn1.siol.tv/logo2/150x80/animalhd.png|http://cdn1.siol.tv/logo2/150x80/animalhd.png|http://cdn1.siol.tv/logo2/150x80/animalhd.png|http://cdn1.siol.tv/logo2/150x80/animalhd.png" />
<url>http://www.siol.net</url>
</channel>
upload webgrablog.txt
or you can get rid of problem by disabling the icon grab in the ini
just put a * at the beginning of line:
*showicon.scrub {single|<img alt="" |src="|" |/>}
u wont get icns though.....
If i use your latest ini i just get title of the show in xml and nothing else.
If i use this ini http://webgrabplus.com/sites/default/files/download/ini/info/SiteIni.Pac...
The data is ok, just icons are multiple. I figured out the if i pull just for one day there is one icon. If i pull for let say 3 days, there are 3 the same icons. For every day it puts the same icon in the string, and they are seperated with |
I have just made some modifications to ini and now it is correct. Just one channel logo as it should be.
Can you check if anithing needs to be cahanged or else this ini can be moved to downloads.
it's a misterious world....
Hi.
Is it possible to fix ini file for webgrab since http://tv-spored.siol.net/ is totally renewed and new files are needed to scarpe.
TNX
TNX. I see that it is down. Will test the script during weekend if it will be up again.
It works pretty well. I wonder why the incremental update does not work. it shows everything as new even though it should be same. even if you run it 5 minutes after the initial grab.
Is someone else experiencing the opreration has timed out after 3 seconds?
It works for a minute or so and then it just says timed out. Channels are alive (EPG).
Same thing here, time out on every channel.
channel (xmltv_id=TV SLO 2) site -- SIOL.NET -- mode incremental
iiiinnnnnnnnnnnnnnnnnnnnnnnnnnnerror downloading page: The operation has timed o
ut (5sec)
Retry 1 of 4 times
nnerror downloading page: The operation has timed out (5sec)
Retry 1 of 4 times
nerror downloading page: The operation has timed out (5sec)
Retry 1 of 4 times
nnnerror downloading page: The operation has timed out (5sec)
Retry 1 of 4 times
nnerror downloading page: The operation has timed out (5sec)
Retry 1 of 4 times
error downloading page: The operation has timed out (10sec)
Retry 2 of 4 times
nnnnnerror downloading page: The operation has timed out (5sec)
Retry 1 of 4 times
nerror downloading page: The operation has timed out (5sec)
Retry 1 of 4 times
nerror downloading page: The operation has timed out (5sec)
Retry 1 of 4 times
nerror downloading page: The operation has timed out (5sec)
Retry 1 of 4 times
error downloading page: The operation has timed out (10sec)
Retry 2 of 4 times
Maybe they block your ip. Does it open in a browser?
In the afternoon works better I get time out issue only on some channels, but the EPG downloads anyway. This time out issue appears on channels randomly.
I have VDSL connection 10/4. In web browser works flawlessly. I also have clear line. So internet connection is not an Issue.
Maybe there is an issue on Siol side?
No i am not blacklisted. There must be something else. During the parsing time i can access the epg through browser. And also the incremental does not work it parses always from beginning.
I am also blowing my mind on incremental. it should work normaly. The time structure is the same. Maybe Franciss or Jan could take a look at it.
Hi!
Thanks for the new version of parser.
It works, but it seems, SIOL is throttling access to the page.
I have temporarily solved the problem by increasing retry time-out in ProgramData\ServerCare\WebGrab\WebGrab++.config.xml
from retry time-out="5" to retry time-out="15"
I have also reduced number of channels to grab and days to grab to minimum.
I rewrote the ini. now it takes EPG from index pages. Its much faster and incremental works. You can still enable scrubing the inside pages if you want actors and directors... Episode is calculated for my personal preference, racalculate if using other systems it should be -1.
here's the ini:
site {url=siol.net|timezone=UTC+01:00|maxdays=4|cultureinfo=sl-SI|charset=UTF-8|titlematchfactor=90|ratingsystem=IMDB}
url_index{url|http://tv-spored.siol.net/kanal/|channel|/datum/|urldate|}
url_index.headers {customheader=Accept-Encoding=gzip,deflate}
urldate.format {datestring|yyyyMMdd}
*
index_showsplit.scrub {multi|<main role="main" class="table-list">|<div class="row" data-id=||</main>}
*
index_title.scrub {single|<div class="col-9">|<strong>|</strong>|</div>}
index_category.scrub {single|<div class="col-2 right">|<small class="gray">|</small>|</div>} *index page category
index_category.scrub {multi(separator="," include=first)|<p class="event-meta">||</p>|</p>}
index_category.modify {remove(type=regex)|".*\/"}
index_start.scrub {single(debug)|<div class="col-1">||</div>|</div>}
index_description.scrub {single|<div class="col-9">|<p>|</p>|</div>}
*index_description.modify {addend|\n}
index_rating.scrub {single|<i class="fa fa-clock-o"></i>|IMDB:|!??!|<span>}
index_showicon.scrub {single|<div class="col-3">|<img data-src="|"| title}
index_country.scrub {single(separator="," include=last)|<p class="event-meta">||<br>|<i class="fa fa-clock-o">}
index_temp_8.scrub {single(separator="," include="sezona")|<p class="event-meta">||</p>|</p>}
index_episode.scrub {single(separator="," include="del")|<p class="event-meta">||</p>|</p>}
************************* Uncoment for more detailed info (much slower, incremental does not work)
*index_urlshow {url|http://tv-spored.siol.net|<p><a href="http://webgrabplus.com/%7C%7C%7C"}
*index_urlshow.headers {customheader=Accept-Encoding=gzip,deflate}
***
*title.scrub {single|<article role="article">|<h1>|</h1>|<p class}
*start.scrub {regex||<div class="time">[^>]*(\d{2}:\d{2})[^>]*-[^>]*\d{2}:\d{2}[^>]*</div>||}
*description.scrub {multi(include=2)|<p class="content">||</p>|</p>}
*director.scrub {single(separator="," include=first2)|Režija: </b>||</p>|</p>}
*actor.scrub {single(separator="," include=first5)|Igrajo: </b>||</p>|</p>}
*
***********************
index_temp_8.modify {remove(not "")| sezona }
*temp_8.modify {addstart(null)|1}
index_temp_8.modify {calculate(format=F0)|}
*temp_8.modify {calculate(format=F0)|1 -}
index_episode.modify {remove(not "")| del}
*episode.modify {addstart(null)|1}
index_episode.modify {calculate(format=F0)|}
*episode.modify {calculate(format=F0)|1 -}
index_episode.modify {addstart|'index_temp_8'. }
index_episode.modify {addend|. 0/0}
index_episode.modify {remove(not "")|0. 0 .0/0}
*
country.modify {replace(null)|Združene države Amerike|ZDA}
TNX for this one. If you will update the scarper can you upload it as ini file or ose pastebin to paste the text file.
There are still some bugs presented. At the end i see (n) on every scarp.
Also "preberi več" could be filtered out.
And also scarping for example CBS Reality:
This is xml scarped.
<programme start="20160408103500 +0200" stop="20160408110000 +0200" channel="CBS Reality">
<title lang="sl">Preživeli za las na posnetkih</title>
<desc lang="sl">
<a href="http://webgrabplus.com/kanal/reality/oddaja/2203427660/datum/20160408">» preberi več</a>.(n)
</desc>
<category lang="sl">Dokumentarni</category>
<category lang="sl">Ostalo</category>
<icon src="http://webgrabplus.com/%3Ca%20href%3D"http://vimg.siol.tv/sioltv/epg/default/documentaire.png">http://vimg.siol.tv/sioltv/epg/default/documentaire.png"/>
This is what is showed on the siol site:
10:35Preživeli za las na posnetkih
Dokumentarni
PREŽIVELI ZA LAS NA POSNETKIH
Dokumentarni / Ostalo, 2. sezona, 9. del, Ostalo
None
» preberi več
I know. Its not finished. I was just so happy that incremental work that i had to paste it in the forums. I am cleaning the code out an will atach ini when its finished.
Update: Ini ready for testing. Incremental working. Feedback welcome.
I am testing it right now. It is much slower because it scarps deeper into EPG. I assume if i comment out this part ************************* detailed page scrub it will read just from index page.
Maybe a good idea to make it optional would also be to allow thumbnail link scarp or. not since it takes some space in xml.
Also from the xml file (not yet imported into backend) I can still see this (n) at the end of each scarp. Will investigate once it is finished.
Still working on it and yes if you delete the part beyond the index it will be faster. As fo (n), you have to disable it in config.
n = nomark disables the update-type marking (n) (c) (g) (r) at the end of the description
Update:
Final version for my personal taste. Will modify it if i find any bugs. Dont forget to increase index-delay="xx", because of timeot issue with siol. Enjoy.
Hi,
when using this latest ini the channel icons are missing. Can you please fix this.
Tnx, it works!
It looks like there are changeses on the site again since the scarper does not work anymore. At quick look the bold selection is new.
http://tv-spored.siol.net/kanal/3sat/oddaja/23189932123/datum/20160616
and i assume that this line has to be changed to reflect the changes:
url_index{url|http://tv-spored.siol.net/kanal/|channel|/datum/|urldate|}
Hi! Good work and it realy works perfect for one week. Today I notice a small problem again? Am I the only one? If it is possible to fix it I will be very grateful...
Thank you Blackbear199..working OK.
Is parsing very slow also for others. It was much faster in previous version in the latest is totally slow. 15 hours or. more for 2 days?
http://pastebin.com/idJr92r7
This time it has finished in a little more then two hours. But it would take much more, because it has made only c (change) and almost no n (new) entries because i have run it right after finish to test it.
Hi,
I think that they changed once again something. Webgrab says "no shows in indexpage". Could someone be so kind and try to update ini files.
Thank you
it seems that it is not parser but there is no epg on their site. so wait till they fix it.
The ini file is not ok as it is, it does not work for me too, EPG is on the site and I can look at all of the programs. Seems they changed something again, if some knowlegable person can fix it, we all will be very happy :)
Hi,
Thank you very much for help. Everything is working OK now.
Pages