WebChangeMonitor / Bug Reports / Feature Requests / #286 Support detecting change of post-processed data

Morten MacFly - 2025-01-26

status: open --> pending

Group: Future_Release --> Invalid
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Morten MacFly - 2025-01-26

For clarification: A manual post-processor would operate on the file created by WCM and modify this file. So you are asking to read the file after the post-processor is run and re-evaluate the diff, right?

So what yo envision is the following work-flow:

WCM -> writes temp. content file
WCM -> starts post-processor "PP"
PP -> does "magic" and modifies the temp. content file
WCM -> gets notified (how?)
WCM -> reads temp. content file again
WCM -> calculates changes and sets state of item accordingly

The problem here is with the notification part: Currently, any post-processing tools are run asynchronously to avoid that WCM hangs (probably forever) waiting for the end of a post-processing tool. (Note that a thread is no solution here.) To me, part of the post processing should also be to inform the user (i.e. via email) that a change took place.

Before thinking about the direction above I would rather think about removing the limitation you face why you actually need to call a post-processing tool at all. To my experience this is only the case if you want to interpret the content. WCM has quite some powerful tools to do that meanwhile. The items state if then calculated on the interpreted content. Which would do what you want.

Do you have an example of what you cannot do and why you need a post-processor tool (which one) for that?

Ps.: I would have another idea btw how to solve this that includes IPC... but lets start easy.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gitoffthelawn - 2025-01-27

Yes, everything you wrote is accurate and correct. I really like how you want to think of a more elegant solution, as that matches my desire as well. So let's start there!

Approximately 90% of my post-processing routines are to call jq. If you're not using it, jq is like sed, but for JSON. If you're not familiar with sed, we can't be friends any longer. ;) I use jq to collect the needed data (and only the needed data) from JSON files. If WCM was to somehow include jq functionality, those post-processing routines would no longer be needed.

Most of my other post-processing routines are to handle line-breaks. I still haven't been able to get WCM to add carriage returns (ASCII 0x0d) or linefeeds (ASCII 0x0a) to its output. After much time invested in trying to get it to work, I wrote post-processing scripts to perform this task. I think this should be easy to implement in WCM, and IIRC I created an issue report for it a while back, but a quick search of issue reports isn't popping it up.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Morten MacFly - 2025-01-29

OK, I can guess what you have in mind ( and know sed btw. ;-) ) but I' afraid I need more info to be sure what to do.

I guess it would be the best if you provide me with:
1.) A source JSON source file you want to process (which can be anonymized)
2.) The JQ command(s) you are operating on that file
3.) The expected output file

Especially also flag why and where you want to add (?) line-feeds or similar. Wouldn't you usually want to remove these?

I am using jsoncons ( https://github.com/danielaparker/jsoncons ) for the JSON stuff an probably using just 5% of its functionality so chances are high, that it would cover the required routines, too.

Last edit: Morten MacFly 2025-10-26

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gitoffthelawn - 2025-02-01
  
  Here's a good example. It uses Mozilla's public API to get the first 50 recommended Firefox extensions. The API returns a massive amount of data, but all I want is the total number of recommended extensions, the number of pages, the number of results on the current page, and the name of each extension returned along with its i18n code.
  
  The URI:
  https://addons.mozilla.org/api/v5/addons/search/?app=firefox&promoted=recommended&sort=created&type=extension&page=1&page_size=50&lang=en
  
  The WCM item relies on 2 regex replace rules that process the JSON within WCM before passing the data to jq:
  
  Regex replace rule #1
  find: "authors":.?"last_updated":"[^"]",
  replace: [null] (blank)
  
  Regex replace rule #2
  find: ,"previews":.?"_score":.?}
  replace: }
  
  The WCM item then calls this post-processing script (this is a Windows batch file; I wrote a similar one for Linux):
  
  jq.exe -cr "\"\(.count) total extensions on \(.page_count) pages, \(.results ^| length) on this page\",.results[].name" "C:\example\%1" > "C:\example\temp.txt" move /y "C:\example\temp.txt" "C:\example\%1"
  
  To keep things simple, I'm just using C:\example for the WCM content folder in this code. You can replace it with whatever you like.
  
  I'm hesitant to mention the following, because I think focusing on the above is the most important. At the same time, providing a bigger picture may help with ideas/implementation:
  
  Because there are multiple pages for the above results (and similar situations with other APIs), ideally WCM will request each page and concatenate the pages. Because WCM cannot current do this, I create separate WCM items for each API call. Thus, if results span 10 pages, which requires 10 separate API calls, the WCM job currently requires 10 separate WCM items. It would be great to have a single WCM item to handle all those repetitive API calls, simply incrementing a variable within the URI for each call.
  
  I wrote a complex script that, within WCM, will concatenate a bunch of discrete WCM data files. It works by taking advantage of WCM's post-processing feature and concatenating all data files in a group when any one is changed. Because the script will have no knowledge of what is going on outside of itself, I implemented all sorts of hackery to make it work even when multiple data files are updated by WCM in succession.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Support detecting change of post-processed data

Monitors a number of web pages for changes.

Group

Searches

Help

#286 Support detecting change of post-processed data

Discussion