Webgrab+ beta (2.1.11)
mono version 6.8.0.105
Linux Mint 19.3
Grabbing from Entertainment (E!) channel and looks like a show episode name is called The F**k Buddy (i think its from Sex & the City). Seems like the asterisks are an issue. Is there a workaround?
[ Info ] ( 27/122 ) TVGUIDE.COM -- chan. (xmltv_id=EP) -- mode Incremental
[Warning ]
[Warning ] !! -- WARNING : tvguide.com doesn't allow epg grabbing !!
[Warning ] it is advised to disable this channel / site from your channel list
[Warning ]
[Error ] Unable to update channel EP
[Critical] See log file for details
[Critical] Exception.Message: parsing ":\s*The F**k Buddy" - Nested quantifier *.
[Critical] Exception.StackTrace: at System.Text.RegularExpressions.RegexParser.ScanRegex () [0x0029a] in <4bf78e13a6ea4494a3898e6a836a77f4>:0
at System.Text.RegularExpressions.RegexParser.Parse (System.String re, System.Text.RegularExpressions.RegexOptions op) [0x00036] in <4bf78e13a6ea4494a3898e6a836a77f4>:0
at System.Text.RegularExpressions.Regex..ctor (System.String pattern, System.Text.RegularExpressions.RegexOptions options, System.TimeSpan matchTimeout, System.Boolean addToCache) [0x00097] in <4bf78e13a6ea4494a3898e6a836a77f4>:0
at System.Text.RegularExpressions.Regex..ctor (System.String pattern, System.Text.RegularExpressions.RegexOptions options) [0x00000] in <4bf78e13a6ea4494a3898e6a836a77f4>:0
at WGconsole.K.IndicesAndRegexOperations (System.String 0, System.String 1, System.String 2, WGconsole.K+L 3, WGconsole.K+M 4, System.Boolean 5) [0x000ed] in <82fae152af5c4c379c1328bc94ed3f01>:0
at WGconsole.K.IndicesAndRegexOperations (System.String 0, System.String 1, System.String 2, System.String 3, System.String 4, System.Boolean 5) [0x0006a] in <82fae152af5c4c379c1328bc94ed3f01>:0
at WGconsole.K.Edit (System.String 0, System.String[] 1, System.String 2) [0x00d3b] in <82fae152af5c4c379c1328bc94ed3f01>:0
at WGconsole.K.ScrubOperations (System.Collections.Generic.Dictionary`2[TKey,TValue] 0, System.String 1, System.Boolean 2) [0x00ead] in <82fae152af5c4c379c1328bc94ed3f01>:0
at WGconsole.K.ScrubShowDetails (System.String 0, WGconsole.Q 1) [0x001bc] in <82fae152af5c4c379c1328bc94ed3f01>:0
at WGconsole.G.6 (System.String 0, WGconsole.W 1, WGconsole.T 2) [0x0213e] in <82fae152af5c4c379c1328bc94ed3f01>:0
at WGconsole.G.1 (System.String[] 0) [0x01985] in <82fae152af5c4c379c1328bc94ed3f01>:0
[ Info ] Existing guide data restored!
thanks
can you post channel line
Hey Mat, here you go
<channel update="i" site="tvguide.com" site_id="(srvID61047)Number:51,SourceId:34632" xmltv_id="EP">E! West</channel>
that isn't the problem.
[Critical] Exception.Message: parsing ":\s*The F**k Buddy" - Nested quantifier *
this should tell you right away...
":\s*The F**k Buddy"
this is a regex expression to remove the subtitle from the title.
index_title.modify {remove(type=regex)|":\s*'index_subtitle'"} is my guess
its crashing because its thinks the ** in the word F**K is part of the regex.
this is whats causing the nested error(the double ** as u would never do this in a real regex).
so either change it to not use regex or the even simplier way is before the the line add..
index_subtitle.modify {cleanup(style=regex)}
this will change F**k to F\*\*k and regex will treat the * at a real * and not its special meaning in regex.
btw u probably need to fix this for the details title removal also,i didn't look.
Could be mono...in windows looks all ok.
Sex and the City
The F**k Buddy
will check with linux
if your using regex in windows it should still crash,** is illegal in regex period.
and I use Linux but my ini isn't using the regex method.
also on my ubuntu is fine...i have used the siteini in networks rev.14
Edit:
did 7 days...still no problem
So for me is either your ini (if you changed something) or mono version
I got it to work, however, i had to change something in the ini. I basically had to start with a fresh TVGuide.ini. A few months ago, I had help from Blackbear199 with the TVGuide.ini where i was trying to remove sports subtitles from being added to the title. The line (remove subtitle from title) was not working in the ini that i got from the latest site ini pack.
So Blackbear199 suggested
title.modify {remove(notnull type=regex)|":\s*'index_subtitle'"}
from the original
title.modify {remove(notnull type=string)|":\s*'index_subtitle'"}
Now with a fresh TVGuide.ini, it scans properly however, sports subtitles are added to the title again. I can't seem to get sports subtitles removed from the Title without changing to regex, but that presents the issues that made me start this post. I might just leave the sports subtitles as is unless someone has another suggest.
I'm attaching my TVGuide.ini for reference.
Thanks for all the help
you need to fully understand how webgrab works,especially the update process.
read the manual.
if you go changing this or that u will end up doing more harm than repair.
webgrab gets 2 title(most of the times),one from the index_title and one from the detail_title(wegrab accepts just title also for ths)
it compares these two so what u do to one(say index_title) u most likely have todo to the other(detail_title or just title)
if they don't match(to a certain degree,set by titlematchfactor=xx on site {xx} line) webgrab will add (?) to end of title in xml file.
detail_title(or just title) always overwrides the index title,now u see why I said any modify to index_title has to be applied to detail_title?'
u can get the index_title any way u want but if the detail title always overwrites it as the creators think this is the mist reliable of the 2.
so unless u modify it the same way(not always necessary,depends on the data).it always used as the title in ur xml.
title.modify {remove(notnull type=string)|":\s*'index_subtitle'"}
this will never work as its so wrong in so many ways.(maybe it did back in the days,like very old wg versions).
this ini is older than the hills and was written to use regex(before separator sting was even around).
with all that said I cud type a few more pages telling you whats wrong any why but I wont.
maybe matt will fix it.
Blackbear, I hope you understand I meant no disrespect. Not sure if you're response was with as much frustration as I read it.
Honestly, other than that change you suggested a couple of months ago (http://www.webgrabplus.com/content/want-add-2-things-tvguidecomini-having-trouble-figuring-it-out). I ended up not messing with it other than the "regex" change you suggested. I didn't want problems with daily grabs.
As far as reading the manual. Trust me, i've scoured many times through the manual, but its like a foreign language to me. If its that above my head, then i'll just leave it at that. I'll not mess with stuff nor try to get help to modify the ini.
Finally, I just got into using webgrab++ in the last few months, so did not know the TVGuide ini is a little outdated compared to the latest Webgrab beta version. I do want to point out that I was pretty quick to donate.
I do really apprediate all of your help, i'm sorry if I've bothered you.
went back and read that thread,and I gave you the answer is my post above but anyway here what I said in that post..
title.modify {remove(notnull type=string)|:'index_subtitle'}"
shud be(imho) to mske sure be this
title.modify {remove(notnull type=regex)|":\s*'index_subtitle'"}
this will match is there is a space after the : or no space.
":\s*'index_subtitle'"
havnt you seen this before? try looking at your original error and then read what I said to fix it.
I am 99% sure it was in my first reply above.
in short
index_subtitle.modify {cleanup(style=regex)}
index_title.modify {remove(notnull type=regex)|":\s*'index_subtitle'"}
title.modify {remove(notnull type=regex)|":\s*'index_subtitle'"}
see what I did,first I have the subtitle cleaned using regex,this will fix any issues with it haveing any special character like the double ** in F**k.
I then remove the index_subtitle from both the index_title and title(also can be called details_title,remember this modify must be after the title is scrubbed,just edit or replace the exisiting line using old removal way) but using regex,this is why the cleanup on the index_subtitle is very important.
the old method isn't using regex so any special character in the index_subtitle don't matter.
title.modify {remove(notnull type=string)|:'index_subtitle'}"
this only diff using this and doing what I said above is the regex looks zero or more spaces after the : and the first character in the index_subtitle.
I made a small typo above and its fixed but this..
index_subtitle.modify {cleanup(type=regex)}
should be
index_subtitle.modify {cleanup(style=regex)}
Ok, the change from "type" to "style" fixed both of those issues (removed sports subtitles and handled ** in the subtitle).
Just making sure this is supposed to be happening, but now a "\" is added between each subtitle word. Not complaining, just want to make sure its supposed to do that.
<programme start="20200317093000 +0000" stop="20200317103000 +0000" channel="EP">
<title lang="en">Botched</title>
<sub-title lang="en">Playground\ Trauma\ and\ a\ Pint-Sized\ Mama</sub-title>
<desc lang="en">A pint-sized Irish model travels across the pond for larger breasts while the doctors help a woman who almost died from a nightmarish tummy tuck. A young father with a busted nose hopes the docs can give him a nose just like his daughter's. (cc).(n)</desc>
<date>2018</date>
<category lang="en">tvshow</category><icon src="http://www.webgrabplus.com/%3Ca%20href%3D"https://tvguide1.cbsistatic.com/mediabin/showcards/tvshows/650000-699999/thumbs/665656-botched_300x400.png">https://tvguide1.cbsistatic.com/mediabin/showcards/tvshows/650000-699999..." />
<episode-num system="onscreen">S5 E4</episode-num>
<rating system="MPAA">
<value>TV-14</value>
</rating>
</programme>
Thanks for your help.
as I said something that sounds so simple turns into a pita.
change this
index_subtitle.modify {cleanup(style=regex)}
index_title.modify {remove(notnull type=regex)|":\s*'index_subtitle'"}
title.modify {remove(notnull type=regex)|":\s*'index_subtitle'"}
to
index_temp_9.modify {set|'index_subtitle'}
index_temp_9.modify {cleanup(style=regex)}
index_title.modify {remove(notnull type=regex)|":\s*'index_temp_9'"}
title.modify {remove(notnull type=regex)|":\s*'index_temp_9'"}
the \ are from the cleanup command(the wont be just added to spaces,there are a number of characters that are special in regex).
so its best to leave the original subtitle untouched(copy it to a temp element(index_temp_9 in this case)) and use it to cleanup and remove from the titles.
Blackbear, that worked perfectly!
I just want to make sure I follow you correctly, since now the subtitle is copied to a temp element, there should not be issues in future grabs?
I'm uploading the ini with the additions from you. Not sure if the you want to update the TVGuide ini with the changes you recommended.
Thanks
there shudnt be anymore issues.
special character in the subtitle(or any other element) only matter if the element is used with a modify command with type=regex.
the index_subtitle scrub itself it don't matter(even though its using regex also) as the special characters are ingnored during the scrub.
Thank you very much! Maybe this will help others that attempt the same thing.