大家好,我是零一,浏览器底层有一块非常重要的事情就是HTML解析器,HTML解析器的工作是把HTML字符串解析为树,树上的每个节点是一个Node,很多同学都好奇是怎么实现的,这篇文章就用JS来实现一个简单的HTML解析器。
下面的代码改造自node-html-parser
原理讲解1、效果
我们需要实现一个parse方法,并且传入HTML字符串,返回一个树结构:
constroot=parse(`divid="test"class="container"c="b"divclass="text-block"spanid="xxx"HelloWorld/span/divimgsrc="xx.jpg"//div`);console.log(root);//[{"tagName":"","children":[{"tagName":"div","attrs":{"id":"test","class":"container"},"rawAttrs":"id=\"test\"class=\"container\"c=\"b\"","type":"element","range":[0,],"children":[{"tagName":"div","attrs":{"class":"text-block"},"rawAttrs":"class=\"text-block\"","type":"element","range":[39,],"children":[{"tagName":"span","attrs":{"id":"xxx"},"rawAttrs":"id=\"xxx\"","type":"element","range":[63,96],"children":[{"type":"text","range":[78,89],"value":"HelloWorld"}]}]},{"tagName":"img","attrs":{},"rawAttrs":"src=\"xx.jpg\"","type":"element","range":[,],"children":[]}]}]}]2、核心原理用正则匹配出tagclass="tag"aa=""、/tag通过先进后出(栈)的方式匹配标签对(tag/tag)3、初始化
首先我们需要初始化一些简单的变量和方法备用:
//初始化2种Node类型//HTML[nodeType](