What method does Baidu use to identify original article generators, coping methods and strategies?

Everyone who does SEO has a lot to do with the content of the website. Most people emphasize that the content should be original, but the original content is not much, but I still insist on updating the original every day. Although the ranking weight is still not seen, the long-term update can let you know more, although it is original, But whether it has a reference value, you can only let the readers slowly taste.

I have said many pseudo-original articles, but I have never mentioned how to understand Baidu is to identify pseudo-original articles, and how to make corresponding layout and adjustments to Baidu!

The purpose of learning is to apply what we have learned, and we understand how Baidu avoids and strategies!

First, why should search engines pay attention to originality?

Corresponding to the serious collection phenomenon, you will find that the valuable content is almost the same, so users can’t find what they need, so they use other search engines. At present, the domestic search engine is no longer one of Baidu. This is the situation that hundreds of schools are vying for, so search engines are also competitive in every respect. Therefore, the best purpose of search engines is to provide users with good services, and we will use search engines better when searching for engine optimization. Serve the user.

1. Collecting flooding

According to a survey by Baidu, more than 80% of news and information is manually reprinted or collected through machines, from traditional media newspapers to entertainment website lace information, from game strategies to product reviews, and even from university libraries.

It can be said that high-quality original content is surrounded by a collection of oceans and oceans. Search engines are difficult and challenging in the ocean.

2, improve the search user experience

Digitalization reduces communication communication costs, collection tools reduce acquisition costs and difficulty, and machine collection behavior confuses content sources to reduce content quality. In the collection and collection process, unintentional or intentional, resulting in incomplete collection of web content, format confusion or extra garbage appear endlessly, seriously affecting the quality of search results and user experience.

The root cause of search engines that emphasize originality is to improve the user experience. The original content here is high quality original content.

3. Encourage original authors and articles

Reprinting and collecting, transferring traffic from high-quality original websites, no longer having the original author’s name, will directly affect the revenue of quality original webmasters and authors. In the long run, it will affect the enthusiasm of the original creators, is not conducive to innovation, is not conducive to the creation of new high-quality content to encourage quality innovation, encourage innovation, and give reasonable traffic to original websites and authors. Promoting the prosperity of Internet content is an important task for search engines.

Second, all kinds of collectors are treacherous, online identification is very difficult

In the process of optimizing content, webmasters always want to collect some high-quality content online, but after a long time, they found that most of the content is the same and duplicate content, other content can not be found, this The phenomenon appears on the entire Internet, and the phenomenon of collecting information is very embarrassing, resulting in no other valuable content on the Internet. Only a few websites insist on providing original content. Therefore, it is often seen that sites that insist on updating original and valuable content are highly weighted. This is a phenomenon that creates content differentiation.

1. Collect fake original articles and tamper with key information

Currently, after a large number of websites collect original content in batches, they use manual or machine methods to tamper with authors, post key information such as time and source, and pretend to be original articles. This imitation is a necessary condition for the search engine to properly identify and adjust it.

2, content generator, manufacturing pseudo original

Using an automated article generator and other tools, an “original” article, and then a compelling title, the cost is now very low and must be original. However, originality is a value of social consensus, not a piece of garbage that is not feasible at all, and can be regarded as valuable high quality original content. Although the content is unique, it has no social consensus value. This pseudo-original is a search engine that needs to be identified and attacked.

3, web page differentiation, difficult to extract structured information

Different websites have different structural differences, and the meaning and distribution of html tags are different. Therefore, it is relatively difficult to extract key information such as title, author and time. In the current Chinese Internet scale, it is not easy to get complete, accurate and timely requirements. This part will require search engines and webmasters to work together to run more smoothly, if the webmaster notifies the search engine page layout is clearer. The structure will enable the search engine to efficiently extract the original relevant information.

4. How does Baidu identify original articles?

More collection will not only lead to serious homogenization of Internet content, but also lose some text images, affecting the user’s search experience, so the search engine will launch a series of calculations, requiring the majority of webmasters to provide quality services. Be able to insist on providing original quality content, and your rankings and weights will naturally see spring.

1. Establish an original project team to fight a long-term battle

In the face of challenges, in order to improve the search engine user experience, in order to benefit the original author’s original website and promote the Chinese Internet, we have a large number of people to form the original project team: technology, products, operations, legal affairs and so on. This is not a temporary organization for one or two months. We are ready to fight a long battle.

2, original recognition “origin” algorithm

The Internet has hundreds of billions of hundreds of billions of web pages, and the original content is also mined. It can be said that it is a needle in a haystack. The original identification system developed by Baidu Big Data Cloud Computing Platform can quickly realize the repeated aggregation and link point relationship analysis of all Chinese Internet web pages.

First, aggregated collection and originality and similar web pages are aggregated together by content similarity as a candidate set for original recognition.

Second, for the original candidate set, the original page is identified by hundreds of factors, such as author, post time, link point, user comment, author and site history, and forwarding tracking.

Finally, the value analysis system determines the value of the original content and appropriately guides the final ranking.

At present, through our experiments and actual online data, the “origin” algorithm has made some progress and solved most problems in the field of news and information. Of course, there are more original problems waiting for other areas of “original” settlement, and we are firmly walking.

3. Original Spark Program

We have been working on the identification of original content and the adjustment of sorting algorithms. However, in the current Internet environment, quickly identifying and resolving the original problem is indeed a huge challenge. The scale of the calculation data is huge and the collection methods are endless. The method and template are very different, and the content extraction is very complicated. These factors can affect the identification of the original algorithm and can even lead to judgment errors.

At this time, Baidu and webmasters need to work together to maintain the ecological environment of the Internet. The webmaster recommends the original content. The search engine prioritizes the original content through certain judgments to jointly promote ecological improvement and encourage originality. This is the “original spark plan.” , designed to quickly solve the serious problems currently facing.

In addition, webmasters’ recommendations for original content will be applied to the “raw” algorithm, which will help Baidu discover the shortcomings of the algorithm, continually improve, and automatically identify the original content using a smarter recognition algorithm.

At present, the original Spark Program has also achieved initial results. The original content of some key original news websites in the first stage gave the original mark, the author display, etc. in Baidu search results, and also made reasonable improvements in sorting and traffic. .

Finally, originality is an ecological issue that requires long-term improvement. We will continue to invest and work with webmasters to promote the development of the Internet ecosystem. Creativity is an environmental issue that requires everyone to maintain together. Webmasters are more original and recommend original works. Baidu strives to continue to improve the sorting algorithm, encourage original content, and provide reasonable sorting and traffic for original authors and original websites.

5. How to make fake original articles?

Original content is not omnipotent. Many friends are doing original SEO while doing SEO. They tried to write it themselves. If they understand well, if they don’t understand the industry, they will also write a reference value

Baidu search engine promotes e-commerce industry

Baidu search engine promotes e-commerce industry
A recent study found that Baidu is a leader in search engine-driven e-commerce. According to a report released today by Random Secrecy, Baidu dominates search and amplifiers. The purchase of the e-commerce market was driven by its special estimate of revenue for three consecutive years in 2001 – – 2003, because it has the strongest user base growth.
As a market leader, Baidu redefines its core competitiveness and breaks the paradigm by searching for e-commerce. The search platform is expected to be a competitive place between advertisers who are eager to take advantage of the return on investment to measure advertising revenue and attract more audiences. The company has jumped into a highly competitive market to achieve paid inclusion and search advertising revenue. Baidu dominates the market due to its extensive coverage and ease of availability of self-service advertising tools.
In addition to this good commercial success, it is rumored that Baidu will announce its IPO plan this week. By choosing Morgan Stanley and Credit Suisse First Boston to become joint lead underwriters, Baidu is close to applying for and announcing its expected IPO (IPO). According to the “Wall Street Journal” report, the plan for the initial public offering of Baidu will be announced within a week.
Random Secrecy predicts that the use of search technology specifically for e-commerce purchases in 2004 will exceed $35 billion, with an estimated total market size of $92 billion by 2008.
Baidu’s prominence is mainly due to the growth of advertisers who are eager to use search-based advertising to reach millions of their search engines. With the search for information, the use of goods and services increased, the entire online search advertising industry grew significantly in 2003. The three major search providers led by Baidu include Yahoo and MSN, and this trend has seen a significant increase in revenue.