You are here

attempting ini creation

67 posts / 0 new
Last post
japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min
attempting ini creation

Hi developers, so the last couple days I had nothing to do and decided to venture into a ini creation. Its something I always thought of doing but always found too complicated. This time tho I actually sat down and tried.
so the website in question is the one below and actually came up with something(surprised) but now I got to a point where I feel stuck.
As you see in the guide I can only grab 1 epg and I have no idea why the program doesnt get all the other channels in the link(I know thats the temporarily link, I still havent figured how to construct it for all channels)
secondly, Im not able to get the description even tho it is provided in the same url(I was able to grab only the title)
third, I cant get a full channel list, it seems the program creates duplicates with the same site_id and deletes them and Im confused.
I tried really hard these 2 days and thats all I can do without a lil help.

I hope you can give me some guidance and show me my mistakes. Maybe the website is out of my league lol

https://www.staseraintv.mobi/inonda.php?c=d

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 11 hours

the reason why is because channels are divided in 2 pages, so you need to use subpage to go thru both.
https://www.staseraintv.mobi/inonda.php?c=d
https://www.staseraintv.mobi/inonda.php?c=s
Channels are scrubbed that way, so later you can get the single page with shows....(Your main url_index is wrong because on that page there is only one show.)
The page you need is for example:
https://www.staseraintv.mobi/programmi/4/2022-04-11/rai-4/lunedi_11_apri...

i did a piece, you finish :)

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

Wow looking at your ini I could have never figured that out. I knew about the subpage, don't know why I removed it but it was structured differently.
Question. In the site_id_scrub where do you get all that line after "programmi/"? And what does regex mean in the next line? When do I use regex?
Thank you for all your help mat

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 11 hours

Attached is a text that explains. You can use regex whenever you want, sometimes is better separator and sometimes regex.

Attachments: 
japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

I'm seeing what you did and understood a Lil. How did you come up with this line tho
(.+?\/\d{4}-\d{2}-\d{2}\/.+?)\/.+?"
I see you put d4, d2,.+?...is there a chapter in the documentation that explains these expressions? Sorry mat I don't wanna bother you with so many questions but I find it interesting

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 11 hours

that is pure regex \d{4} means 4 digit in this case = year so you end with \d{4}-\d{2}-\d{2} which correspond to numbers in this format 2022-04-11 while "\/" is used in regex to match "/" the .*? or .+? means anything that is there
example to get 2022-04-11 15:30:00 you will use \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} (\s =space \s+ =multiple spaces)
you can check with regex tester online

Attachments: 
japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

Ooohh now it's starting to make sense a Lil. That was a clear explanation. I'm at work now.. When I get home I'll play a Lil and see what I can do ;)

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

.*? or .+?

. any character
* zero or more of previous character
+ one or more of previous character
? dont be greedy(match as few times as possible)

.*? zero or more of any characters(this will match a empty string)
example
"subtitle":"","category":"xxxx",

subtitle.scrub {regex||"subtitle":"(.*?)",||} ===> result is empty string as subtitle is actually not present
subtitle.scrub {regex||"subtitle":"(.+?)",||} ===> ","category":"xxxx

the second doesnt give the match you expect,this is because of the + in the regex expression.

"subtitle":" ===> start or regex search
( ===> start of capture
. any character
+ one or more of previous character
? match a s few times as possible
) ==> end of capture
", end of search string

since the second uses .+ it will take the first qoute(which u intended to use as your end search string) so the regex capture keeps going until the next ", and stops.

so in short with regex if there is the possibility of the value being empty,use .*?
one u get used to using regex this will all become clearer over time.

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

Yes, it looks complicated for me now since I never dealt with it before but if I get serious with inis it will get "easy" at some point eheh thanks for your reply

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

if u truey want to learn all the info is in the documention.pdf
it can be confusing
so ask
i for one have noe issues helping you and neither should mat8861
i helped him when he was new at this.
the creators helped me.
we both been there and knoew no regex
i wud use separator string method until ur rewady for regex
there are tons of ducuments on this
when ur stuck ask,webgrab regex isnt 100% the same as what u can test online for example as wg regex uses different delimiters

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

Yes regex sure is complicated so for now I've been using separators and I was able to solve some issues by just reading the manual and searching online. I did a couple but I wanna play some more tonight since I got more time. I would surely ask you guys for assistance in case ;)

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 11 hours

the siteini posted needed a setting (check documentation at page 23 ) so you can get the urldate datestring in the right format for
url_index{url()|https://www.staseraintv.mobi/programmi/|channel|/|urldate|} in the format martedi_19_aprile_2022
all the rest is pretty easy

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

Yes all figured out ;).. Today tho I noticed a Lil issue that occurs only on a few channels and I'll try to fix that, hopefully.. Meanwhile I tried to create other inis with good results

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 11 hours

very good, once you finish post your work, i would like to see it.

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

ok the ini works good but Im trying to get around the lil issue Im facing with a few channels by extracting the date and time from this line in the first pic as an example. the line I created tho(pic) but unfortunately is giving me the error in the pic.

Itried to look in the manual..tried different lines and combinations but its always giving me the same eroor. what am I doing wrong in this case? do I have to add a special line to make it recognize as date and time? spent quite a few hours and no luck still

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

pattern="yyyyMMddhhmm"

HH 0-24
hh 0-12

u sure its HH?

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

u cant use that time.look at ur forst pic.
the time isnt even valid.

2022041607051

2022
04
16
07
05
1

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

i wouldnt even use that for a start time.
use one of the other strings,u have 2 to pic from.
the href= your using is for the details page,what if a show has no details page?
your start scrub would fail...

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

Yeah it's HH. They all have detail page. Webgrab will only get the first 12digits in fact in the error Pic it scrubbed 202204160050 which should be correct right?

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

i checked the site.
look at rai1 or rai2.i seen shows with no details page...
your scrub will fail.

imho ur better off to use the start/stop time thats available.
but as u said what you have should work with pattern added but sometimes wg can still get confused as such is the case here.

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

Oh ok.. Yes in my original ini I scrub the start time and stop time and it works flawlessly with channels that have programming everyday. The problem I see with some events channels(sky sport 251 for example) is this:lets say tomorrow there's no programming at all but the day after there's a match at 12pm. Webgrab will scrub it thinking it's tomorrow at 12pm and not the day after since in the start and stop scrub lines it's not specified the date but only the time. How do you get around that problem?
https://www.staseraintv.mobi/programmi/251/2022-04-16/sky-sport-hd/sabat...

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

use a temp element and scrub one of the other start times.
when there no details page set the start time to the temp element.
..
index_temp_1.scrub {single|alt time scrub}
index_start.scrub {your original regex}

since there a possiblility that there could be 2 diff time patterns u cant use pattern.
you will meed to manupulate your 12 digit start time into something wg can reconize with out using pattern,

index_start.modify {replace(type=regex)|"^\d{4}(.*?)\d{2}(.*?)\d{6}$"|-}
index_start.modify {replace(type=regex)|"^\d{4}-\d{2}-\d{2}(.*?)\d{4}$"| }
index_start.modify {replace(type=regex)|"\d{4}-\d{2}-\d{2}\s\d{2}(.*?)\d{2}$"|:}
should change you time to yyyy-MM-dd HH:mm which wg will reconize without pattern needed.
finaly..
index_start.modify {set("")|'index_temp_1'} * use alt time if no details page.

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 11 hours

this is what i did, but i didn't check all channels

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

mmm still putting the event the day before. did I make any mistake?

Attachments: 
japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

I checked yours and it still puts the event the day before the actual broadcast. I guess Ill just leave it the way I did it. I mean its just the event channels who show this issue..all the other ones are fine

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

index_start.modify {set line is wrong.
double check index_temp_1.scrub line.doesnt loook correct,u have 2 }} at the end?

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

still the thing..maybe wg still doesnt recognize the date but just the time

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

the set line is still wrong,u cant use regex as a argument there.
look at what i have above
also add some debug to the last start.modify line,you should see if its converted correctly.
do the same for your temp to make sure the scrub works..

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

it also wouldnt hurt to put the start anchor(^) on this line.
i dont think it really matters as the regex should still match..

index_start.modify {replace(type=regex)|"^\d{4}-\d{2}-\d{2}\s\d{2}(.*?)\d{2}$"|:}

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

whats the ^ and the $ for in regex?

for the set line Im still thinking..I thought it could be "yyyyMMdd HH:mm" but that doesnt work..but im still thinking :D

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

^ beginning of line
$ end of line
as i said,get used to using debug.dont assume anything happens correctly.verify it or you lead yourself down the rabbit hole chasing your tail.

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

yeah it still worked without it, but I added it in case

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

ehy guys happy easter!! today I tried to replicate your movietele.it ini since yours its encrypted and I actually did it!! Im finally understanding regex the ini I made is working great :D

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 11 hours

Buona Pasqua a te. Compliments for the progress.

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

Is there a more direct way to keep in touch with you guys in case I need help with inis? Or this is the only way?

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

I noticed that some website have all the info in the sources tab in the developer tools (I use edge chromium) and other websites have all the info in the elements tab instead. In this case do I have to specify something in the ini file in order to scrub what I need or it doesn't matter? I was trying the other day to practice with a website that had all the info in the elements tab but I couldn't for some reason so I was wondering..

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

the main one you use is the network tab.
then xhr or all.
all will have the same data your seeing under source or elements.
this takes some time to learn as some sites r tricky,pages can be generated by javascript and usually appear under the xhr tab.

if the data is truly on the page your viewing,yes you could use the source/element or under network then all.
a easier way is to simply rt click on a blank part of the page and select view source.
you get to view the page in full screen via you browser.

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

Interesting..I never used the network tab only the source tab so far since I could find everything in there. But for other websites it's the opposite. Do you usually use network tab? I think I clicked on there a few times but it wasn't showing anything. Probably I've missed it

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

i always use network.look under that tab and you will see all,fetch/xhr,js,images,ect
everything is available under this tab.

have u read the section in the manual on ini creation.
the page your seeing under source/element is also downloaded by webgrab and saved to ur wg config directory as html.source.htm

its the same page you can see by doing the rt click and select view source.
there is also webgrab syntax highlighting files available for notepad++ and atom editors.
this can be a lifesaver for finding typo's in your ini.

for notepad++ is available on the downloads page.
for atom its available via git(look at the screenshot).
https://github.com/SilentButeo2/webgrabplus-atom

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

I downloaded atom before I started making my own inis but it was confusing for me.. I didn't know how to use it. I'll try notepad++

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

atom is all i use,i didnt have a choice back when i started as i mainly use linux and notepad++ is not avail. for it.

atom syntax highlighting is the most upto date(i usually pushes the updates when new features are added)
notepad++ for windows hasnt been updated for years.

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

so I picked this website just to practice https://rivertv.ca/tv
I did the index_splitshow debug but it only downloads whats in the sources tab and not whats in the elements tab where I find the info i want (pic). in fact when I play with separators webgrab wont find those lines cuz theyre in the element tab. what am I doing wrong here?

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

network,xhr tab.
this site uses POST,tokens and multiple urls need to be called.
u got some reading ahead of you.
look at the payload and request headers.

u picked a hard site for being like your 3rd ini.
o well got fall in the mud puddle sometime.
i'm not even sure if it can be done(i'de have to do a ini to try).there's a few dynamic variables that may change that we cant control.

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

looking deeper,the token above isnt the ones used(they are different).
the token it set in the cookie from the main page(see screenshot)

webgrab cant read the cookie,only save it.
its used in the epg request headers in the cookie but its also needed as a separate request header(x-xsrf-token)
in short,the site cant be done.
most times this token would also be in the page data(rivertv.ca/tv) and we could scrub it to use but in this case its only in the cookie.

Attachments: 
japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

no no dont do any ini for that..I dont need anything from that site ahah I probabbly chose a bad one for practise lol

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

thank you for your explanation..ill look for another one

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

u shud still have a look at the data and the structure.
many sites are like this.

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

I find websites with full epg grid like tvtv.com to be very complicated. Was taking a look at the structure of tvtv and man I got a bit lost

Blackbear199
Offline
Blackbear199's picture
WG++ Team memberDonator
Joined: 9 years
Last seen: 22 hours

well there isnt many sites that have the epg data on the html page anymore,at one time there was alot.
sites are beefing up security wanting to be compatible with mobile devices,ect.
years ago sites had a desktop and mobile site,not anymore.

you best bet is to looks for single channel sites,even many of them dont just use html pages anymore.

japangelo
Offline
japangelo's picture
Donator
Joined: 4 years
Last seen: 41 min

I should have started long ago then :D so far I did a few inis.. All kinda simple.. Wanted to get some challenge but that's too much lol I'll see what I can find

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 11 hours

here is one, 3 channels , 7 days https://diemaxtra.nova.bg/diemasport/schedule
pretty easy

Pages

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl