30 April 2005

XML Concepts: Prospective search simulator

This note outlines an approach to adjusting prospective search URIs. The tool would run simulations against the URI to ensure validity against intended content and possible future related search results and terms.

Think of the simulator as a tune up for your subscription list in either your OPML file on your site, or in your subscription list in your aggregator. Users would be able to take URI OPML lists and adjust subscription syntax to improve likely search results.

The simulator would help ensure prospective-search-users, despite a valid prior-intent to monitor a likely future publication or triggering event., would not have to wait and wonder, nor be surprised when their search string failed.

The tool could be useful for the Intellectual Property Practice Group in monitoring similar content deliberately obfuscated to avoid unauthorized disclosure. Proprietary information could be loaded the simulator, and the URI could be created and modified to capture terms not yet envisioned but directly related to the trade secret.

Discussion

I’ve noticed that I’m getting into more specific areas, that my terminology isn’t matching what others are using.

This can be problematic when I’ve hardwired a prospective search-URI with a specific phrase, but [unknown to me] the majority of the universe is using another term.


Mud’s Observations on
Prospective Search Syntax and URIs

  • As searches get more narrow, the chances for novel words and phrases increases.

  • As content increases and cross-disciplinary communication grows, the probability of publishers using novel terms increases.

  • As subscription commands get more focused and the number of terms and syntax in the search command increases, the chance of error increases.

  • As OPML lists grow longer in aggregators, it takes longer time to manage changes to the subscription lists.


  • Not all my searches are for things that exist right now. Rather, I’ve got phrases set up that are scanning for the first use.


    Prospective Signaling device in
    2001:A Space Odyssey

    If you recall the movie 2001: A Space Odyssey, there was a large black obelisk on the moon. The thinking was that when man [or something] uncovered this obelisk, there would be a signal to others that man was taking a leap in terms of technology and sophistication.

    Some look at this obelisk as an early warning detection. Not that something was actually in your backyard, but that there was some future event that was likely, and when it occurred, a signal would go out. I look at prospective searches the same way.


    Small problem. Given the complexity of the phrases, and the novelty of the words, I’m not 100% confident that my very focused-complicated search commands are correct or will work.

    The last thing I want to know is, despite my correctly forecasting a proper signaling word, that my search command is incorrect, and that I’m the last person to realize the signaling event has occurred. There I am, oblivious to reality simply because a syntax error.

    Well, lately I’ve noticed this problem and thought, maybe there’s an enterprise opportunity.

    What to do?

    Imagine if you will, a prospective search platform that doesn’t simply let you load up your search command and give you a URI, but a platform that lets you run simulations on that search command-string, and lets you verify that what you hope you’re going to get, is actually what you’re most likely going to get.

    What I’d like to see is a system that

  • Lets me load up a desired search, and then takes existing content, and throws it at that search, and gives me back a trial run of my search string.

  • Show me what content is going into the URI-screener, and what got hit, and what didn’t.

  • Shows me options on the words that might be more appropriate to use.

  • Shows me what I’m entering in the search string [column one] side-by-side with the trial searches [column two] the actual search results [column three], and a list of options on how to adjust my search-syntax so that I get the desired output [column four].

    Example Prospective Search Simulator Steps

    Let’s run through a sample.

    I create a search string. It has many commands. I have an unknown error. Or my word is not correctly spelled. Or the phrase is actually never going to get hit because of my error.

    I enter the search string into the subscription simulator.

    The simulator grabs content that contains the words, and uses multiple variations. The simulator takes letters out of the key words, and also uses synonyms

    At the same time, the simulator uses the input stream and refers to the Yahoo Y!Q similar search tool, to find similar content.

    The simulator shows this similar content as content that wouldn’t get hit, but may be similar or more appropriate. The simulator shows me how to adjust my existing search string to hit these types of similar searches.

    Also, the simulator takes the words I have and searches on Technorati tags for related tags. It shows me what terms I’ve entered, what types of content I would get, and then shows me the types of content-results that are similar, but would be excluded.

    The simulator then shows me how to adjust my existing search string to capture these analogies.

    Review

    The simulator would do a number of things. It would take content that already exists. It would also create false content that deliberately contains or does not contain the target words.

    The simulators goal is to act as a final check for the more complicated search strings. The idea is that users [using very novel words and phrases] would:

  • A. Have simulated content slapped against that search string; and

  • B. Then see whether the intended content is actually captured, or whether there are known problems with the existing search strings that could be better tweaked.

    This type of search simulator is very focused and probably tailored to a very narrow audience. However, I do see a time when people could have very narrow search-requests, and want to get a more immediate sign that their prospective search is going to work.

    Also there are times when someone may generate some novel terms, but these terms never catch on. However, the phrasing or similar phrases used could trigger some search results in the Y!Q search request.

    In these cases, what I would like to see is an ability for the prospective search simulator to look at the general phrases used, and then find analogies, and similar content, not only in the existing searches, but also look forward in time and make some guesses at what other analogies and words others might be using that could be set today.

    Ideally, it would be nice if there was a way that as new words and terms are added to the lexicon, these prospective search tool-phrases in the URI could be updated and tweaked real time.

    I see there being some sort of real-time bot-like tool that integrates with Y!Q. This bot would finds similar search-results, and then uses a thesaurus or some related tag-type tool.

    The bot would then automatically adjust or refine the terms embedded in that URI. The prospective search would get updated in the subscription list, and also in my aggregator list. The search-URI remains viable going forward despite shifts in folksomy.


    Beta testing approach


    Overall, I think it would be neat if there was a separate platform similar to a "Technorati tag creator" to test out the PubSub prospective search. With time, after this platform-tool had proven itself, this separate tool could be integrated with the PubSub platform.


    FeedMesh and targeting specific user-aggregators


    PingOMatic and the FeedMesh would support this tool, not only in finding content, but in highlighting for the services those URIs that are related to which tools.

    The services would know which prospective searches are out there, and would be in a position to target-ping those aggregators with similar-tag-phrases when the new content emerges, and the publishers ping the FeedMesh with the content.

    Ideally, this system would monitor the OPML lists within a user’s aggregator-platform and then ensure that the prospective searches and external services were aware of these changes and updates to the URIs.


    Integrate simulation with similar searches


    I would like a system that does what Y!Q does in that it finds related terms, and then integrates with this tool to update and refine the terms embedded in that prospective-search-URI.

    As time goes on, I would like the Y!Q-like took on similar searches, to also take existing content that gets generated in the future, and match the new output with similar prospective-searches, and alert URI-holders [those who are using a URI with embedded terms] that similar content is showing up that would trigger a response, but it currently doesn’t match exactly. Do they want to adjust their terms automatically; would they like this type of content to be reported?

    User requirements and philosophy in re prospective searches

    I want to know in advance that the subscription syntax for the URI is valid. My real goal in creating this prospective search is to provide a signal that there’s movement, discussion in a specific area that is related to what I’m interested in, even though my terms are not matching exactly.

    I want to know that the prospective search that I’ve created today will work, even though:

  • A. the terms that I’ve decided to use may never actually get used;
  • B. the terms are unfamiliar to others; and/or
  • C. the actual event and discussion goes off on a tangent, but is actually related to my intended search objective

    Ideally, I’d like to be able to throw a prospective search request into the simulator, get back some content, and be able to click on some groups or types of content-clusters, and have the code automatically updated so that this type of content would be included/captured in the prospective search.


    User Objectives

    User wants confidence that:

  • Today’s very narrow search with unusual and non-used phrases is gong to trigger an appropriate signal to me in the future, when others use similar phrases.

  • The prospective URI-code works without the actual event, triggering device, or comments actually showing up.

  • The terms that I have randomly chosen to explain my concept are still viable triggering words, even though the majority of humanity is using something else in the future.

    User wants a tool to

  • Test out the validity of the prospective search in situations where I’m using novel terms, or very complicated search terms that narrows my content.

  • See how my search results would compare to monitor tweaks in the code; and I would like to see, my simply clicking on some of the outputs, how my search string might get updated.

  • Integrates with the major search platforms so that it then can use this external content as part of the body of work/content that gets selectively applied, adjusted and tweaked relative to the search simulator.


  • Thoughts on how to approach the problem

    This tool would take content from existing pages and search tools. If the terms are not there, the simulator would auto-generate combinations, errors, and similar words to show options.

    The prospective search toll would be like a dry run on the search command. It would clearly show which content and terms would get hit.

    The simulator would show options on other outputs: Users would easily change the codes by clicking on the desired terms-results that they like. Simply clicking on the simulated output would allow users to quickly tweak the search command.

    Once the final search string was created, the system would then do a final check and simulation to show the outputs and get final approval for that search command.

    Users would then have greater confidence that the terms, syntax, and scope of terms was adequately captured, and have grater confidence that although no current content is expected to exist in the near future, similar words and phrases would get captured.

    Enterprise: Intellectual property practice group

    I see this tool being useful in exploring whether there are have been leaks from intellectual property discussions. Also, in cases where personnel are discussing issues and concepts in round about ways, this tool would identify and map clusters of discussion that appear to be related to other private discussions.

    Ideally, the outputs of these reports could be in such a form that the courts would readily understand them and see that there was a linkage between the initial protected-confidential-discussion and the subsequent external discussion to open sources not bound by confidentiality agreements.

    This tool would integrate with other tools like Y!Q. Similar searches would be done on the simulated content to verify the output captures those terms that most likely would be desirable.

    Users would be able to do a similar search based on many words, and automatically create a focused PubSub subscription.


    Feature: Syntax simulation


    Tool would suggest alternatives when a desired search should be included in quotes; or that the number of likely hits will be broadened if I put in quotes certain phrases; or adjust the search quotes from double quotes to single quotes.


    Feature: Adjacent instructions


    The instructions for the commands would be immediately adjacent to the simulator. There would be scrolling instructions in a side-by-side menu; and as users made changes, the directions would be adjusted to guide the user in how to make the changes.


    Feature: Instructions tailored to individual search request


    The tool would be similar to an idealized Winksite set-up, or like the current YahooIM download approach: Users would be given instructions as they were working through the steps, rather than having to refer to external pages or other parts of the site for guidance.

    The guidance would be tailored to that specific search. It would not be generic guidance based on hypothetical words or phrases. Users would have very specific instructions tailored to the exact phrases. The platform would not simply offer generic or general suggestions, but specifically tailor the platform directions and guidance to the actual commands, words, phrases, and possible future terms the user is actually using.


    Feature: Click-on-simulated-output to adjust URI


    Think synchronization between simulated output, user controls, and final command. This tool would focus more on identifying the range of possible outputs Rather than require users to type in text, or make manual tweaks; the tool would go the next step.

    Users would simply see a list of possible outputs, and simply click on those outputs that they liked, and the code and text would be automatically updated to capture.


    Feature: Adjust external or new URI


    Ideally, this tool could be applied to existing URIs. Users would be able to take a list from an OPML file, run a simulated search against all these prospective search tools, and then be able to adjust each or all of the subscription URIs using this tool.

    No longer would users have to re-enter a search requests from square one, or try to guess what terms were in their original request. The tool would clearly show what the title of the search request was; give it a suitable name that reported to the aggregator; and then show in the simulation box the range of text, syntax, and simulated output that is associated with each URI from and OPML list.


    Feature: Custom output report


    This feature would take embedded commands from the URI, link them both with the aggregator display, and then upload these command to the printer. No longer would users have to have a standard-display for all prospective searches.

    Rather, each search-report could be tailored to a specific management objective, presentation style, or audience.

    Ideally, the prospective search tool could:

  • [a] integrate different color-commands into different aspects of the search results to highlight them in new ways;

  • [b] create tables and integrate with future [not yet developed] templates that regulatory agencies may require; and

  • [c] display the results in a custom format that meets the statutory requirements.

    Demonstration

    Now that you’ve gotten this far, click on the Yahoo related search and find out who else is talking about this topic, and what terms they are suing.

    Wouldn’t you like to know about those search results in a prospective search, long ago, when you originally created your URI?



    LEGAL NOTICE


    Creative Commons License

    This work is licensed under a Creative Commons License.

    You may not copy any of this work to promote a commercial product on any site or medium in the universe.

    If you see this work posted on a commercial site, it violates the creative commons license; and the author does not endorse the commercial product.

    Free to use for non-commercial uses. Link to this original blogspot and cite as .


    -- This is the end of the content --
  • This note outlines an approach to adjusting prospective search URIs. The tool would run simulations against the URI to ensure validity against intended content and possible future related search results and terms.

    Think of the simulator as a tune up for your subscription list in either your OPML file on your site, or in your subscription list in your aggregator. Users would be able to take URI OPML lists and adjust subscription syntax to improve likely search results.

    The simulator would help ensure prospective-search-users, despite a valid prior-intent to monitor a likely future publication or triggering event., would not have to wait and wonder, nor be surprised when their search string failed.

    The tool could be useful for the Intellectual Property Practice Group in monitoring similar content deliberately obfuscated to avoid unauthorized disclosure. Proprietary information could be loaded the simulator, and the URI could be created and modified to capture terms not yet envisioned but directly related to the trade secret.

    Discussion

    I’ve noticed that I’m getting into more specific areas, that my terminology isn’t matching what others are using.

    This can be problematic when I’ve hardwired a prospective search-URI with a specific phrase, but [unknown to me] the majority of the universe is using another term.


    Mud’s Observations on
    Prospective Search Syntax and URIs

  • As searches get more narrow, the chances for novel words and phrases increases.

  • As content increases and cross-disciplinary communication grows, the probability of publishers using novel terms increases.

  • As subscription commands get more focused and the number of terms and syntax in the search command increases, the chance of error increases.

  • As OPML lists grow longer in aggregators, it takes longer time to manage changes to the subscription lists.


  • Not all my searches are for things that exist right now. Rather, I’ve got phrases set up that are scanning for the first use.


    Prospective Signaling device in
    2001:A Space Odyssey

    If you recall the movie 2001: A Space Odyssey, there was a large black obelisk on the moon. The thinking was that when man [or something] uncovered this obelisk, there would be a signal to others that man was taking a leap in terms of technology and sophistication.

    Some look at this obelisk as an early warning detection. Not that something was actually in your backyard, but that there was some future event that was likely, and when it occurred, a signal would go out. I look at prospective searches the same way.


    Small problem. Given the complexity of the phrases, and the novelty of the words, I’m not 100% confident that my very focused-complicated search commands are correct or will work.

    The last thing I want to know is, despite my correctly forecasting a proper signaling word, that my search command is incorrect, and that I’m the last person to realize the signaling event has occurred. There I am, oblivious to reality simply because a syntax error.

    Well, lately I’ve noticed this problem and thought, maybe there’s an enterprise opportunity.

    What to do?

    Imagine if you will, a prospective search platform that doesn’t simply let you load up your search command and give you a URI, but a platform that lets you run simulations on that search command-string, and lets you verify that what you hope you’re going to get, is actually what you’re most likely going to get.

    What I’d like to see is a system that

  • Lets me load up a desired search, and then takes existing content, and throws it at that search, and gives me back a trial run of my search string.

  • Show me what content is going into the URI-screener, and what got hit, and what didn’t.

  • Shows me options on the words that might be more appropriate to use.

  • Shows me what I’m entering in the search string [column one] side-by-side with the trial searches [column two] the actual search results [column three], and a list of options on how to adjust my search-syntax so that I get the desired output [column four].

    Example Prospective Search Simulator Steps

    Let’s run through a sample.

    I create a search string. It has many commands. I have an unknown error. Or my word is not correctly spelled. Or the phrase is actually never going to get hit because of my error.

    I enter the search string into the subscription simulator.

    The simulator grabs content that contains the words, and uses multiple variations. The simulator takes letters out of the key words, and also uses synonyms

    At the same time, the simulator uses the input stream and refers to the Yahoo Y!Q similar search tool, to find similar content.

    The simulator shows this similar content as content that wouldn’t get hit, but may be similar or more appropriate. The simulator shows me how to adjust my existing search string to hit these types of similar searches.

    Also, the simulator takes the words I have and searches on Technorati tags for related tags. It shows me what terms I’ve entered, what types of content I would get, and then shows me the types of content-results that are similar, but would be excluded.

    The simulator then shows me how to adjust my existing search string to capture these analogies.

    Review

    The simulator would do a number of things. It would take content that already exists. It would also create false content that deliberately contains or does not contain the target words.

    The simulators goal is to act as a final check for the more complicated search strings. The idea is that users [using very novel words and phrases] would:

  • A. Have simulated content slapped against that search string; and

  • B. Then see whether the intended content is actually captured, or whether there are known problems with the existing search strings that could be better tweaked.

    This type of search simulator is very focused and probably tailored to a very narrow audience. However, I do see a time when people could have very narrow search-requests, and want to get a more immediate sign that their prospective search is going to work.

    Also there are times when someone may generate some novel terms, but these terms never catch on. However, the phrasing or similar phrases used could trigger some search results in the Y!Q search request.

    In these cases, what I would like to see is an ability for the prospective search simulator to look at the general phrases used, and then find analogies, and similar content, not only in the existing searches, but also look forward in time and make some guesses at what other analogies and words others might be using that could be set today.

    Ideally, it would be nice if there was a way that as new words and terms are added to the lexicon, these prospective search tool-phrases in the URI could be updated and tweaked real time.

    I see there being some sort of real-time bot-like tool that integrates with Y!Q. This bot would finds similar search-results, and then uses a thesaurus or some related tag-type tool.

    The bot would then automatically adjust or refine the terms embedded in that URI. The prospective search would get updated in the subscription list, and also in my aggregator list. The search-URI remains viable going forward despite shifts in folksomy.


    Beta testing approach


    Overall, I think it would be neat if there was a separate platform similar to a "Technorati tag creator" to test out the PubSub prospective search. With time, after this platform-tool had proven itself, this separate tool could be integrated with the PubSub platform.


    FeedMesh and targeting specific user-aggregators


    PingOMatic and the FeedMesh would support this tool, not only in finding content, but in highlighting for the services those URIs that are related to which tools.

    The services would know which prospective searches are out there, and would be in a position to target-ping those aggregators with similar-tag-phrases when the new content emerges, and the publishers ping the FeedMesh with the content.

    Ideally, this system would monitor the OPML lists within a user’s aggregator-platform and then ensure that the prospective searches and external services were aware of these changes and updates to the URIs.


    Integrate simulation with similar searches


    I would like a system that does what Y!Q does in that it finds related terms, and then integrates with this tool to update and refine the terms embedded in that prospective-search-URI.

    As time goes on, I would like the Y!Q-like took on similar searches, to also take existing content that gets generated in the future, and match the new output with similar prospective-searches, and alert URI-holders [those who are using a URI with embedded terms] that similar content is showing up that would trigger a response, but it currently doesn’t match exactly. Do they want to adjust their terms automatically; would they like this type of content to be reported?

    User requirements and philosophy in re prospective searches

    I want to know in advance that the subscription syntax for the URI is valid. My real goal in creating this prospective search is to provide a signal that there’s movement, discussion in a specific area that is related to what I’m interested in, even though my terms are not matching exactly.

    I want to know that the prospective search that I’ve created today will work, even though:

  • A. the terms that I’ve decided to use may never actually get used;
  • B. the terms are unfamiliar to others; and/or
  • C. the actual event and discussion goes off on a tangent, but is actually related to my intended search objective

    Ideally, I’d like to be able to throw a prospective search request into the simulator, get back some content, and be able to click on some groups or types of content-clusters, and have the code automatically updated so that this type of content would be included/captured in the prospective search.


    User Objectives

    User wants confidence that:

  • Today’s very narrow search with unusual and non-used phrases is gong to trigger an appropriate signal to me in the future, when others use similar phrases.

  • The prospective URI-code works without the actual event, triggering device, or comments actually showing up.

  • The terms that I have randomly chosen to explain my concept are still viable triggering words, even though the majority of humanity is using something else in the future.

    User wants a tool to

  • Test out the validity of the prospective search in situations where I’m using novel terms, or very complicated search terms that narrows my content.

  • See how my search results would compare to monitor tweaks in the code; and I would like to see, my simply clicking on some of the outputs, how my search string might get updated.

  • Integrates with the major search platforms so that it then can use this external content as part of the body of work/content that gets selectively applied, adjusted and tweaked relative to the search simulator.


  • Thoughts on how to approach the problem

    This tool would take content from existing pages and search tools. If the terms are not there, the simulator would auto-generate combinations, errors, and similar words to show options.

    The prospective search toll would be like a dry run on the search command. It would clearly show which content and terms would get hit.

    The simulator would show options on other outputs: Users would easily change the codes by clicking on the desired terms-results that they like. Simply clicking on the simulated output would allow users to quickly tweak the search command.

    Once the final search string was created, the system would then do a final check and simulation to show the outputs and get final approval for that search command.

    Users would then have greater confidence that the terms, syntax, and scope of terms was adequately captured, and have grater confidence that although no current content is expected to exist in the near future, similar words and phrases would get captured.

    Enterprise: Intellectual property practice group

    I see this tool being useful in exploring whether there are have been leaks from intellectual property discussions. Also, in cases where personnel are discussing issues and concepts in round about ways, this tool would identify and map clusters of discussion that appear to be related to other private discussions.

    Ideally, the outputs of these reports could be in such a form that the courts would readily understand them and see that there was a linkage between the initial protected-confidential-discussion and the subsequent external discussion to open sources not bound by confidentiality agreements.

    This tool would integrate with other tools like Y!Q. Similar searches would be done on the simulated content to verify the output captures those terms that most likely would be desirable.

    Users would be able to do a similar search based on many words, and automatically create a focused PubSub subscription.


    Feature: Syntax simulation


    Tool would suggest alternatives when a desired search should be included in quotes; or that the number of likely hits will be broadened if I put in quotes certain phrases; or adjust the search quotes from double quotes to single quotes.


    Feature: Adjacent instructions


    The instructions for the commands would be immediately adjacent to the simulator. There would be scrolling instructions in a side-by-side menu; and as users made changes, the directions would be adjusted to guide the user in how to make the changes.


    Feature: Instructions tailored to individual search request


    The tool would be similar to an idealized Winksite set-up, or like the current YahooIM download approach: Users would be given instructions as they were working through the steps, rather than having to refer to external pages or other parts of the site for guidance.

    The guidance would be tailored to that specific search. It would not be generic guidance based on hypothetical words or phrases. Users would have very specific instructions tailored to the exact phrases. The platform would not simply offer generic or general suggestions, but specifically tailor the platform directions and guidance to the actual commands, words, phrases, and possible future terms the user is actually using.


    Feature: Click-on-simulated-output to adjust URI


    Think synchronization between simulated output, user controls, and final command. This tool would focus more on identifying the range of possible outputs Rather than require users to type in text, or make manual tweaks; the tool would go the next step.

    Users would simply see a list of possible outputs, and simply click on those outputs that they liked, and the code and text would be automatically updated to capture.


    Feature: Adjust external or new URI


    Ideally, this tool could be applied to existing URIs. Users would be able to take a list from an OPML file, run a simulated search against all these prospective search tools, and then be able to adjust each or all of the subscription URIs using this tool.

    No longer would users have to re-enter a search requests from square one, or try to guess what terms were in their original request. The tool would clearly show what the title of the search request was; give it a suitable name that reported to the aggregator; and then show in the simulation box the range of text, syntax, and simulated output that is associated with each URI from and OPML list.


    Feature: Custom output report


    This feature would take embedded commands from the URI, link them both with the aggregator display, and then upload these command to the printer. No longer would users have to have a standard-display for all prospective searches.

    Rather, each search-report could be tailored to a specific management objective, presentation style, or audience.

    Ideally, the prospective search tool could:

  • [a] integrate different color-commands into different aspects of the search results to highlight them in new ways;

  • [b] create tables and integrate with future [not yet developed] templates that regulatory agencies may require; and

  • [c] display the results in a custom format that meets the statutory requirements.

    Demonstration

    Now that you’ve gotten this far, click on the Yahoo related search and find out who else is talking about this topic, and what terms they are suing.

    Wouldn’t you like to know about those search results in a prospective search, long ago, when you originally created your URI?



    LEGAL NOTICE


    Creative Commons License

    This work is licensed under a Creative Commons License.

    You may not copy any of this work to promote a commercial product on any site or medium in the universe.

    If you see this work posted on a commercial site, it violates the creative commons license; and the author does not endorse the commercial product.

    Free to use for non-commercial uses. Link to this original blogspot and cite as .


    -- This is the end of the content --
    " />