Crawling the web is an important right for access of information. I think big crawlers shouldn’t dominate the market. Especially since Google isn’t up to par to find anything that is wanted anymore.
You see this on GitHub already. People publish paper results and manuals, along with a few files, and treat that as if it were open source. And this isn’t limited to LLMs, people with CNN papers or crawlers and other results publish a few files and the results on GitHub as if it were open source. I think this is a clash between current scientific community thinking + Big Tech vs Free Software + Free Culture initiatives.
Additionally, you can’t expect something Microsoft/Meta touches to remain untainted for long.
I think that if the algorithm is so broken to the point of only listing things that are interesting to Google, the search is beyond redemption.