XlegoX
|
  |
| Joined: 16 Jun 2008 |
| Total Posts: 14955 |
|
|
| 02 Sep 2011 09:25 PM |
I wrote a YACC-like parser-generator for making game command suites with more sane commands than what's around right now. Here's the code: http://wiki.roblox.com/index.php/User:XLEGOx/luatext
There's the main class at the top, plus an example usage at the bottom, here's the basic gist of how you use it: 1) Call |MakeParser| on your grammar definition, which will return a function. 2) Call that function on commands, with a first argument of which rule in the parser to use as the main rule to start parsing with, followed by the input text to parse, and finally a lookup table of actions to execute, which were references i the grammar definition.
================================================ The syntax of the grammar files is close to that of YACC, here's the general idea: The grammar definition is a made up of a set of "rules", with the syntax (I'm using mathy-brackets here instead of greater-than-less-thans to avoid problems with HTML, in the code you use greater-then-less-than): ⟨ruleName⟩ := rules ;
================================================= Each of the more complex rules is made by combining a sequence of the base rules, and rules which you have defined using a set of operators. The base rules that you start out with are: ⟨number⟩ := Any number, possibly with decimal places and exponents on it ⟨string⟩ := A string delimited with either double or single quotes ⟨ident⟩ := Any valid programming identifier ⟨word⟩ := Any sequence of numbers/letters. Like ident but may start with a digit.
=========================================== You can build up the base rules using the following operators, as well as using parenthesis to specify order of operations:
⟨rule⟩ ⟨rule⟩ ... The sequence-operator (plain whitespace), if all of the rules match is matches
⟨rule⟩ | ⟨rule⟩ | ... := The or-operator, if any of the rules matches then it matches
[ ⟨rule⟩ ] The maybe-operator, matches either the rule or not the rule
⟨rule⟩* The any-number-of-operator, matches any number of the rule, including zero
⟨rule⟩+ The more-than-one-of-operator, matches one-or-more of the rule, but not zero
`text` The literal-rule. Matches the given text exactly and nothing else.
============================================= The next concept is an error-point. An error point is a type of rule which turn a non-match into an error. If any of the rules after an error point fail to match then the error you specify in the error point will be thrown. You write an error point as a string delimited with extermination points: !Error: Code after me failed.!
=============================================== The final concept is that of a "capture" is would be relatively useless if all the parser could do was verify that something matched. What a capture lets you do is execute some sort of action or build some sort of data structure when something matches, rather than just simply matching it. Each top-level rule has a piece of data associated with it, and you can use the following captures to manipulate it:
{}( ⟨rule⟩ ) The simplest capture, it sets the piece of data to what the rule matched. data = match
{name}( ⟨rule⟩ ) Named-capture, sets the field `name` of data to what the rule matched. data.name = match
{[]}( ⟨rule⟩ ) Array-capture, appends what the rule matched to the data data[#data+1] = match
{name[]}( ⟨rule⟩ ) Named-array-capture, appends what the rule matched to the `namr` field of data data.name[#data.name+1] = match
There is one more special kind of capture which you can use, the "action capture", which calls one of the functions in the table of actions which you pass to the parser you generated when you ask it to parse something. The action capture is written like so:
{name()}( ⟨rule⟩ ) The action capture calls the function `name` in the actions table with what the rule matched.
======================================== Some example grammars:
⟨vector⟩ = ⟨number⟩ `,` ⟨number⟩ `,` ⟨number⟩ Matches any set of three numbers separated by commas.
`print` ( ⟨string⟩ | `(` ⟨string⟩ `)` ) Matches a Lua print statement. That is, a "print" followed by either a plain string, or a string and brackets.
⟨vector⟩ := {x}( ⟨number⟩ ) `,` {y}( ⟨number⟩ ) `,` {z}( ⟨number⟩ ) ; Same as the first vector match, but captures the three coordinates into the data struct. The data struct for the vector rule starts out empty, and then the named-captures will set the x, y, and z fields of the data table to three coordinates.
⟨printVectorCommand⟩ := `print` {printVector()}( ⟨vector⟩ ) ; Here's an example of a complete top level rule. Using the rule from that last example, we can also use an action-capture to do something with the captured coordinates. Now if you added a "printVector" function to the actions table when you call this rule, the printVector function would be called with the table of x/y/z coordinates that the vector rule captured.
============================================= Please post requests of how to match a given command so that I can show you how to use the system, and test it's robustness at the same time. |
|
|
| Report Abuse |
|
|
XlegoX
|
  |
| Joined: 16 Jun 2008 |
| Total Posts: 14955 |
|
|
| 02 Sep 2011 09:27 PM |
Up-to-date wiki link: http://wiki.roblox.com/index.php/User:XLEGOx/luatext |
|
|
| Report Abuse |
|
|
|
| 02 Sep 2011 09:27 PM |
Beautiful.
Also, xLegox, you are still here? I can't stand it here. Too little grammar and education to go around. I barely even go on the block like metric game forums. |
|
|
| Report Abuse |
|
|
XlegoX
|
  |
| Joined: 16 Jun 2008 |
| Total Posts: 14955 |
|
|
| 02 Sep 2011 09:29 PM |
| I don't do anything other than write Lua utilities like this anymore. I have other programming stuff to work on than Roblox games. |
|
|
| Report Abuse |
|
|
|
| 02 Sep 2011 09:32 PM |
| But why even bother veteran of lore, no one on this forum even cares about scripting anymore from what I see. |
|
|
| Report Abuse |
|
|
blocco
|
  |
| Joined: 14 Aug 2008 |
| Total Posts: 29474 |
|
|
| 02 Sep 2011 09:33 PM |
| I care about scripting. Well, coding rather. I get carried away sometimes, I admit. :3 |
|
|
| Report Abuse |
|
|
|
| 02 Sep 2011 09:33 PM |
| Also, any advice on which to do with 160 dollars? |
|
|
| Report Abuse |
|
|
blocco
|
  |
| Joined: 14 Aug 2008 |
| Total Posts: 29474 |
|
|
| 02 Sep 2011 09:35 PM |
| What does ctx stand for in the code? I don't know what to transliterate it to. |
|
|
| Report Abuse |
|
|
|
| 02 Sep 2011 09:35 PM |
| Furthermore, I got number one on my state reading test, the OAA, and number two on my state math test, the OAA. |
|
|
| Report Abuse |
|
|
|
| 02 Sep 2011 09:35 PM |
It's not that we don't care about scripting.
It's that there isn't enough things to talk about that actually pertain to scripting, so we pace ourselves at about 1 on topic post per 2 1/4 weeks. That way we never run out of stuff to talk about. |
|
|
| Report Abuse |
|
|
|
| 02 Sep 2011 09:38 PM |
Aww snap, I see you redid your stream generator :3 I'll take a look through that!
==MODS Update: v2.2.1 out! Check @LuaASM== |
|
|
| Report Abuse |
|
|
XlegoX
|
  |
| Joined: 16 Jun 2008 |
| Total Posts: 14955 |
|
|
| 02 Sep 2011 09:43 PM |
@Necro This one doesn't even have an explicit Lexer or Streamer in it. It does everything in one step, because the Lexing is trivial enough that I don't need tokens or anything like that. If you want a Lexer / Streamer you can pirate one of my earlier ones. |
|
|
| Report Abuse |
|
|
|
| 02 Sep 2011 09:48 PM |
Oyus, I have oodles of code built upon your Java to Lua tokenzier/lexer (what is the proper term?).
You we're right, I really should have done properly parsing on an earlier project. I immediately fell in love with your code :3
Though I never did figure out how to properly process expressions, luckily I have loadstring() to do that for me.
==MODS Update: v2.2.1 out! Check @LuaASM== |
|
|
| Report Abuse |
|
|
XlegoX
|
  |
| Joined: 16 Jun 2008 |
| Total Posts: 14955 |
|
|
| 02 Sep 2011 09:52 PM |
@Necro Read "Let's Build a Compiler" by Jack Crenshaw. It's extremely enlightening on how expression-parsing and operator precedence works. |
|
|
| Report Abuse |
|
|
blocco
|
  |
| Joined: 14 Aug 2008 |
| Total Posts: 29474 |
|
|
| 02 Sep 2011 10:04 PM |
| I'm reading that right now. Thanks for the recommendation. :) |
|
|
| Report Abuse |
|
|
XlegoX
|
  |
| Joined: 16 Jun 2008 |
| Total Posts: 14955 |
|
| |
|
NXTBoy
|
  |
| Joined: 25 Aug 2008 |
| Total Posts: 4533 |
|
|
| 03 Sep 2011 10:29 AM |
| I'm seing a lot of square box (⟩) characters here. Can anyone else see them? |
|
|
| Report Abuse |
|
|
LocalChum
|
  |
| Joined: 04 Mar 2011 |
| Total Posts: 6906 |
|
| |
|
XlegoX
|
  |
| Joined: 16 Jun 2008 |
| Total Posts: 14955 |
|
|
| 03 Sep 2011 11:06 AM |
"I'm seing a lot of square box (⟩) characters here. Can anyone else see them?"
Like I noted, they're mathy-brackets, I guess you can't see the unicode character in your font (it worked fine for me on Ubuntu, Chrome / Firefox so I though it would be okay. I couldn't really replace <> with [] or {} since I needed those characters as well for other things!) |
|
|
| Report Abuse |
|
|
XlegoX
|
  |
| Joined: 16 Jun 2008 |
| Total Posts: 14955 |
|
|
| 03 Sep 2011 01:49 PM |
Updated the code on the wiki with example JSON and XML parsers implemented using the generator.
Also added an "everything up to" rule for implementing things where you want text delimited by something other than quotes, for instance the text inside of an XML tag, which is delimited by a `<`. It's written like this: $( rule ), and matches everything up to but not including the rule in the body. You could re-write the string rule like so using it:
(using underscores instead of < / > for these examples)
_string_ := `"` {}( $(`"`) ) `"` ;
To go with the everything up to rule there's also an "eof" rule now in case you want to match everything up to the end of the rule, which would be written as:
_EverythingToEof_ := {}( $( _eof_ ) ) ; |
|
|
| Report Abuse |
|
|
Anaminus
|
  |
 |
| Joined: 29 Nov 2006 |
| Total Posts: 5945 |
|
| |
|
XlegoX
|
  |
| Joined: 16 Jun 2008 |
| Total Posts: 14955 |
|
|
| 03 Sep 2011 07:22 PM |
Minified version of the code with a couple of performance enhancements is up for anyone who actually wants to use it in the script in something: http://wiki.roblox.com/index.php/User:XLEGOx/luatext-min |
|
|
| Report Abuse |
|
|
|
| 15 Feb 2012 08:38 PM |
Stravant, a question:
How are you supposed to properly use lists? Such as the following:
local Parse = MakeParser([[ <sentence> := {[]}(<ident>); ]]); print(ToString( -- the function included in your script Parse("sentence", "a b c d", {}) ));
I want it to return a list like {"a", "b", "c", "d"}, but instead I get {"a"}. The symbols "+" and "*" only match the very last identifier, "d". I tried recursion, but I only got "a b c d", which leaves me with parsing left to do, defeating the purpose of using this.
My first thought was to modify the "+" and "*" symbols to perform multiple captures instead of the last. However, I think it would be better if using an array based capture automatically tried to match as many times as possible before stopping.
I'll start implementing this, but I am I doing something wrong? |
|
|
| Report Abuse |
|
|
stravant
|
  |
 |
| Joined: 22 Oct 2007 |
| Total Posts: 2893 |
|
|
| 15 Feb 2012 08:54 PM |
You've got a misconception of how it works, you're over-complicating it. When you write an array capture like so:
{[]}( < ident > )
What it will do is capture the contents of the capture rule, in this case an identifier, and then append those contents to the end of the current-capture.
So naturally, if you just write that rule alone, it will only capture one item, because the capture type of "[]" does not effect what stuff matches with the rule. The rule will still only match a single < ident > pattern, the "[]" capture only tells it how that match should be captured.
So, what you want to write is this: {[]}( < ident > ) *
With at, you're now telling it to capture an ident pattern and append it to the capture, but repeat that as many time as it can. That is opposed to this:
{[]}( < ident >* )
Which tells it to match as many identifiers as it can, and then append that whole chunk of identifiers to the capture as a single string.
That's really the only way I could implement it to make all actions you could want to do possible, even if it's a bit unintuitive at first.
|
|
|
| Report Abuse |
|
|
|
| 15 Feb 2012 08:57 PM |
Ohhh! That's perfect! Thanks for the clarification.
I'm reading through your implementation, so hopefully it wouldn't have been too long before I discovered that myself :P
P.S: This is just another one of your ideas I should have started using ages ago. I love it! |
|
|
| Report Abuse |
|
|