编译原理学习笔记06_antlr初步

nonoob posted @ Sun, 23 Sep 2012 22:11:10 +0800 in Programming , 3745 readers

antlr是什么

权威解释见官方或wikipedia，我只负责八卦。作者Terence Parr似乎是个有洁癖的人，写了antlr第一个版本之后觉得效率不高又写了v2，因为同样的理由又推倒重来写了v3。尽管以java为主打，但这个antlr非常有野心，Java/C#/Python/Ruby/Scala等语言通吃；由grammar生成lexer和parser使得这玩意儿非常适合教学用，也非常适合菜鸟级程序员提高编写编译器的自信。虽说没有JavaCC这么正统，但是个人觉得非常实用，现在Hibernate framework/Jython/Groovy可都是基于antlr的哦！这哥们还开发了一套template engine，同样是一个妄图大一统文本解析的干活。有了antlr这把锤子他就开始到处乱敲：paper自不用说，然后是《The Definitive ANTLR Reference》、《Language Implementation Patterns》。后者近期被译成中文，号称“屠龙”；虽说的确过誉了，但是不得不承认这本书的确深入浅出。尽管作者一直在自卖自夸antlr和StringTemplate，但是对于java系的人来讲这的确是学习编译原理基础知识的好东西；连Guido大叔和Dalvik设计者Dan Bornstein也对这本书颇有好评。

安装工具

antlr：http://www.antlr.org/download.html

对于我这样的习惯了eclipse的懒人来说antlrworks并不是上上之策，插件才是王道http://antlrv3ide.sourceforge.net/。

该插件特点：

新建一般的java project（如果要生成的目标语言是java的话）并convert成antlr即可，之后只需写grammar文件。
默认情况下保存每一次grammar文件就会进行自动生成对应的lexer和parser的java文件（默认target language是java）和对应tokens文件；每次打开eclipse时会重新解析一次。
似乎生成的lexer和parser并不会添加grammar文件所在package信息（吐槽：这一点和上面一个特点加起来足以让人抓狂！！！）。
编辑器的content assit功能和java编辑器的功能有不少出入，括号、引号、缩进等功能做得并不好。
grammar（g文件）除编辑器的两个tab Interpreter和Railroad view非常棒，一个用于傻瓜式写，一个用于GUI显示，并且可以导出成html。

还可以考虑使用StringTemplate：http://www.stringtemplate.org/download.html

开始使用

以含有加减乘法的计算器且目标语言java为例。

在g文件中写出所要定义的grammer（一般是词法和语法的综合）

grammar Expr;

prog : stat+;
stat : expr NEWLINE
     | ID '=' expr NEWLINE
     | NEWLINE
     ;
expr: multiExpr (('+'|'-') multiExpr)*
    ;  
multiExpr : atom('*' atom)*
    ;   
atom : INT
     | ID
     | '(' expr ')'
     ;
     
ID : ('a'..'z'|'A'..'Z')+ ;
INT : '0'..'9'+;
NEWLINE : '\r'?'\n';
WS : (' '|'\t'|'\n'|'\r')+{skip();};

生成的ExprLexer.java文件中会将'(',')'等涉及判定的字符用新的identifier如T__8等来对应，识别这样的字符的方法形如T__8。新生成的ExprLexer类继承了antlr定义的Lexer类，封装了一些方法。ExprParser继承Parser类，诸如expr等g文件中的左式会有个对应的方法（例如expr()）来处理它；为了处理出错和从错误状态恢复，该类还生成了FOLLOW_expr_in_stat34这样的变量并调用了pushFollow()这类方法。
这样，从输入流中可以生成lexer对象，将由lexer得到的tokens作为输入就得到一个parser对象；调用顶层的方法就可以完成语法分析了。

public static void main(String[] args) throws IOException, RecognitionException {
		// TODO Auto-generated method stub
		ANTLRInputStream input = new ANTLRInputStream(System.in);
		ExprLexer lexer =  new ExprLexer(input);
		CommonTokenStream tokens = new CommonTokenStream(lexer);
		ExprParser parser = new ExprParser(tokens);
		parser.prog();
	}

上述代码对于符合grammar的输入不会有输出，如果需要给出相应的输出的话仍可以在g文件中添加（java）代码。

添加包含的package

@header{
import java.util.HashMap;
}

添加parser中的成员变量

@members{
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}

添加parser中左式处理函数的返回值（前面代码中的方法的返回值都为void），并且在每个生成式上在{}里添加特定的方法。值得注意的是对于诸如INT这样在lexer中得到的Token用的是antlr定义好的Token类的一个对象，包含了getText()方法；故不需要定义INT的text返回值。如下面的一段：

expr returns [int value]
     : e=multExpr {$value = $e.value;}
     ( '+' e=multExpr {$value+=$e.value;}
     | '-' e=multExpr {$value-=$e.value;}
     )*
    ;

multExpr returns [int value]
     : e = atom {$value=$e.value;}('*' e=atom {$value *= $e.value;})*

atom returns [int value]
    :   // value of an INT is the int computed from char sequence
     INT {$value=Integer.parseInt($INT.text);}
     | ID	// variable reference
     {
        // look up value of variable
        Integer v = (Integer)memory.get($ID.text);
        // if found, set return value else error
        if(v!=null) $value = v.intValue();
        else System.err.println("undefined variable "+$ID.text);
     }
        // value of parenthesized expression is just the expr value
     | '(' expr ')'{$value=$expr.value;}
     ;

--------------------------------------------------

参考自《The Definitive ANTLR Reference》，又是一本没付版权费的电子书:(

PS：通过antlr生成的代码，我才知道java对于退出多重循环有个不错的做法，就是在指定的循环前面添加上某个label，然后在要退出的时候加上break label。我们都知道尽管java把goto作为保留字，但是在咱码农是没法用goto的。使用这样break的做法肯定没有C/C++来得凶残；但是无疑很好地避免了goto带来的无结构编程恶习，同时又保留了灵活性（似乎该对D.E.Knuth致敬？）。

Comments (17)

[Reply]

middleware said:
Wed, 26 Sep 2012 09:33:05 +0800

ANTLR 确实不错。LL'(k) basically makes LALR obsolete. 而且还支持回溯 parsing（通过 exception），基本解决了左递归和不确定语法的问题。这是我 04 年看的时候的成果。

[Reply]

meidir said:
Thu, 21 Jul 2022 20:48:14 +0800

Glad to be one of the visitors on this awe inspiring web site. Aputure

[Reply]

meidir said:
Wed, 24 Aug 2022 19:22:39 +0800

Some times its a pain in the ass to read what blog owners wrote but this website is really user pleasant! . skin

[Reply]

meidir said:
Wed, 24 Aug 2022 23:33:05 +0800

Wow, Exactly what a Excellent post. I really found this to much informatics. It is what i was searching for.I wish to suggest you that please keep sharing such form of info.Thanks opbest

[Reply]

Linker SEO said:
Wed, 03 Jan 2024 01:37:24 +0800

The industry needs to focus on creating more robust slots. เว็บตรง แตกง่าย

[Reply]

Linker SEO said:
Thu, 04 Jan 2024 14:22:05 +0800

I'm hooked! The biggest web slots here are a true spectacle. สล็อตเว็บใหญ่ที่สุด

[Reply]

Linker SEO said:
Mon, 15 Jan 2024 14:16:49 +0800

I for one utilize them solely astounding components : you will see these people amid: รวมสล็อตทุกค่ายในเว็บเดียว

[Reply]

jsimitseo said:
Tue, 16 Jan 2024 18:04:42 +0800

You bear through a wonderful opening. I rational soundness unquestionably quarry it besides by and by propose to my buddys. I am reserved they assurance be profited from this scene. เกมส์ไพ่ป๊อกเด้ง

[Reply]

jsimitseo said:
Wed, 17 Jan 2024 01:21:19 +0800

Trying to say thanks won't simply be adequate, for the fantasti c clarity in your written work. I will quickly get your rss channel to remain educated of any updates. เว็บสล็อตโรม่า

[Reply]

jsimitseo said:
Fri, 02 Feb 2024 22:18:55 +0800

I likewise composed an article on a comparative subject will discover it at compose what you think. concierge doctor

[Reply]

Matthew Wade said:
Sun, 08 Dec 2024 00:32:35 +0800

As I website possessor I think the content material here is really superb , appreciate it for your efforts. ECサイトセキュリティ

[Reply]

Matthew Wade said:
Sun, 08 Dec 2024 00:33:00 +0800

Deference to website author , some great entropy. EC-CUBE カスタマイズ

[Reply]

Matthew Wade said:
Fri, 13 Dec 2024 17:04:21 +0800

I love blogging and i can say that you also love blogging . 大人動画

[Reply]

Matthew Wade said:
Sun, 15 Dec 2024 19:37:19 +0800

I like what you guys are up also. Such intelligent work and reporting! Carry on the excellent works guys I have incorporated you guys to my blogroll. I think it’ll improve the value of my web site 【フル映像】

[Reply]

Matthew Wade said:
Tue, 17 Dec 2024 21:05:27 +0800

The article posted was very informative and useful. You people are doing a great job. Keep going. 盗撮エロ動画

[Reply]

Matthew Wade said:
Tue, 17 Dec 2024 21:05:55 +0800

Respect to website author , some good selective information . エロ動画のフル動画

[Reply]

Matthew Wade said:
Sat, 21 Dec 2024 20:04:49 +0800

Thanks for this post, I am a big fan of this website would like to go on updated. Japanese Pokemon Card

随便写写

Computers are like bikinis. They save people a lot of guesswork. --Sam Ewing

编译原理学习笔记06_antlr初步