information on the web is not always useful, can also be divided into useful and useless information. The working principle of general search engine is integrated through the web site of the title, description, keywords and external links and other factors, determine whether your site type, then according to this type of identification on your site if the text, text and web site type to the slightest, indicates that the useful information is more, if a big difference that is, useless information, and then the useful information and useless information ratio, the ratio of the content that the higher the degree of correlation. Perhaps you would like to, if I insert keywords in a large independent segment and type of Web text, it is not able to increase the content of correlation? In fact, this is wrong, which seriously affect the user experience, but in the accumulation of keywords, the search engine is likely to be regarded as cheating, so this is the noise.

The percentage of text with the HTML code of a web page on the web the signal-to-noise ratio is

of course all can increase the text content ratio should be increased, "the SNR are: reduce the page in the picture, flash, the HTML page CSS style into the external call CSS style sheets, CSS package, JS, these methods can effectively improve the snr. Any kind of calculation method is to read at least data > spider

webmaster friends may know Shanghai Longfeng optimization, but few know what is "the signal-to-noise ratio, even some Shanghai Longfeng personnel, may not pay attention to this concept. In this paper, the signal-to-noise ratio of the concept of "back into everyone’s perspective, let everyone know the concepts of" noise, in order to be able to pay attention to the establishment of the process.

(a) "the SNR is what

in web search engine optimization, mainly to grab HTML text tag after removal, the contents of this part can be considered sound signal without distortion (because the signal-to-noise ratio is an acoustic concept, you can click here to view the love of Shanghai, part of the HTML entry) label content generated at the same time, can be considered as noise. In acoustics, the SNR is high, indicating that the sound more clear, in the same way, "the SNR is high, indicating relatively more pure text pages, search engines crawl the page more easily.

. From the principle of search engine, the first is the whole web crawling system download, then the text inside the extracted, through the analysis of the removal of HTML format, and then remove the noise, segmentation, and finally into the index library. In this process, the process, the search engine also after denoising is very obvious, "the SNR is high, the efficiency of the search engines will be higher, the search engine spiders every day to deal with a lot of data, how to extract web topic information quickly is a very important task.


(three) optimization method