|
| 11 Jan 2016 05:11 PM |
alright so i've finished converting my language's code into an abstract syntax tree.
e.g.
int square := (int n) { return n * n; }
gets converted to
typeof = ROOT block = { { typeof = FUNCTION_IMPLEMENT identifier = square num_args = 1 return_type = int arguments = { { identifier = n parse_statetype = int } } block = { { typeof = RETURN expression = { { typeof = IDENTIFIER word = n line = 2 } { typeof = MULTIPLY word = * line = 2 } { typeof = IDENTIFIER word = n line = 2 } } } } } }
anyway, i now need to convert this into bytecode. I have literally no idea where to start, and I'm just wondering if anyone has any input.
Thanks. |
|
|
| Report Abuse |
|
|
nox7
|
  |
| Joined: 29 Aug 2008 |
| Total Posts: 27467 |
|
|
| 11 Jan 2016 05:18 PM |
You first need to back up and create a design for your bytecode order. Bytecode isn't just turned into magically from code - it is carefully put together in a designed structure.
This is sort of the fun thing in language design, because it can be whatever you want. You can even decide to not encode entire numbers (or limit float precision) if you so desire.
I'm sure you've studied the Lua VM bytecode design (if not, Google "No Frills Lua VM 5.1," as it's quite interesting). You can obtain ideas from that and how it all works.
|
|
|
| Report Abuse |
|
|
|
| 11 Jan 2016 05:36 PM |
| @nox I've already created the VM |
|
|
| Report Abuse |
|
|
|
| 11 Jan 2016 05:36 PM |
| Have you read the No Frills guide though? Pretty sure cnt also recommends it, as he credited it in his Lua Bytecode tutorial. |
|
|
| Report Abuse |
|
|
|
| 11 Jan 2016 05:38 PM |
| @boss can you link me pls? |
|
|
| Report Abuse |
|
|
|
| 11 Jan 2016 05:38 PM |
| nvm found it, I'll take a look |
|
|
| Report Abuse |
|
|
|
| 11 Jan 2016 05:39 PM |
I have the link to cnt's Lua Bytecode tutorial, though it is dropbox so it's gone now.
The file you’re looking for has been moved or deleted. Please see this article for details on why a shared link might stop working.
If you are talking about the No Frills guide thing, google what nox said. |
|
|
| Report Abuse |
|
|
nox7
|
  |
| Joined: 29 Aug 2008 |
| Total Posts: 27467 |
|
|
| 11 Jan 2016 05:41 PM |
"@nox I've already created the VM"
That's great, then you should have no problem converting to bytecode. If you have the VM then all you need to do is exactly what I said. Again. |
|
|
| Report Abuse |
|
|
|
| 11 Jan 2016 05:41 PM |
| I mean, i know what the bytecode should look like, it's just a matter of getting there |
|
|
| Report Abuse |
|
|
nox7
|
  |
| Joined: 29 Aug 2008 |
| Total Posts: 27467 |
|
|
| 11 Jan 2016 05:50 PM |
I'm confused as to why that troubles you. You obviously have deep knowledge of this concept, and thus I assumed you did of general programming.
For instance, if you have your syntax tree, you just need to design your VM. You've done this. So you now just need to design your value types. For instance, Lua has a very specific way in which it encodes everything - all of its types.
Functions are broken down into segments of bytes for different things. If you convert a Lua function to bytecode, it creates a byte order of different accessible data. It may have lists of upvalues, instructions (the code to be ran), local variables, upvalues, and a bit more.
Again, I wish I could help you further but I do not understand WHY you are stuck. I also cannot help you further because I do not understand HOW you want to bytecode your source. Is it block based? Does it have functions? How should the scope be defined? |
|
|
| Report Abuse |
|
|
|
| 11 Jan 2016 06:08 PM |
I guess I just don't know where to start. I know what I want my bytecode to look like. It's extremely simple bytecode as of now. Nothing special. e.g if statements are a simple:
ICONST 1 // if (true) JIF end_if .. do stuff LABEL end_if
loops are a simple:
ICONST 10 // iterator LABEL start_loop JIF end_loop LOAD < index of iterator, tbd by compiler > DECREASE ... JMP start_loop LABEL end_loop
Maybe I'm just tired and I need a break from spending all my time on this, I don't know.
The main thing I'm worried about is parsing expressions such as
x * y + 10
that will get converted into
LOAD < index of x > LOAD < index of y > IMUL ICONST 10 IADD
I'm pretty sure that will be my main challenge, I don't know. |
|
|
| Report Abuse |
|
|
nox7
|
  |
| Joined: 29 Aug 2008 |
| Total Posts: 27467 |
|
|
| 11 Jan 2016 06:26 PM |
I definitely recommend a break. Stand up and do something else, haha. I've done that several times with this one plugin I was making which involved calculus (which I barely know). A free breaks helped me come back with better ideas. It really makes a difference.
Anyways, the beautiful thing about bytecode generation is the ability to ignore bad code or unused code (which is typically taken care of by the lower-level compiler).
But starting eh? My suggestion is to plan out an easily automatic method. For instance, Lua's main thread (open a script, and just write without creating functions) is actually wrapped as a function. The interpreter can just use its method on functions and loop through them all.
Just now, you brought up parsing. That's.... not something I'd be thinking about when you're trying to write a byte format.
I don't actually understand your syntax tree format - it doesn't make sense to me. But keep in mind only one part of a bytecode's structure (function structure) contains information of the code to run.
Turn your syntax tree into a series of instructions (which I see you've done). Now, create a byte index per instruction. Let's say a certain byte converts to ASCII or Unicode number 5. Look up 5 in your list of instructions, then you'll know it is a JMP instruction. However, the first byte in this segment is only the ID of the instruction. Let's assume the next two bytes represent the location in memory where the arguments for the JMP are. If it is unconditional, then don't even bother checking - it'll slow the program down.
However, how do you know when to stop reading that instruction? Encode a length (or just make all your instruction bytecodes the same byte length) at the beginning.
I prefer the latter, and just making them all the same length. Let's say all my instructions are encoded as 4 bytes. 1 byte for the ID of the instruction, and 3 for any possible arguments. Then the interpreter will know how many to read until the next one starts.
But when do you know when the instruction list ends? Encode a byte or two that signifies the length of the instructions list. Let's say you encode like this:
[Instruction List] 1 Byte = Length of list Instructions
[Instructions] 1 Byte = ID of instruction 3 Byte = arguments and/or memory locations
|
|
|
| Report Abuse |
|
|
|
| 11 Jan 2016 06:28 PM |
| I thought byte-code was like 010101010. |
|
|
| Report Abuse |
|
|
| |
|
|
| 11 Jan 2016 06:32 PM |
| Forever, may I steal your attention for a moment? I need your help on my :gsub() thread. My language needs dire help D: |
|
|
| Report Abuse |
|
|
nox7
|
  |
| Joined: 29 Aug 2008 |
| Total Posts: 27467 |
|
|
| 11 Jan 2016 06:36 PM |
"byte-code was like 010101010."
No, a byte is 8 bits. A byte is enough data to store one ASCII character (like a letter or number). Although, we use unicode more now (it's two bytes).
What you are talking about are bits. 0 or 1 is a bit, 0000 or 1111 is a tidbit, and eight of those in a sequence is a byte. |
|
|
| Report Abuse |
|
|
|
| 11 Jan 2016 06:42 PM |
bit - binary digit nibble - 4 binary digits e.g 1111 byte - 8 binary digits e.g 11111111 |
|
|
| Report Abuse |
|
|
| |
|
|
| 11 Jan 2016 07:39 PM |
i think this might actually be a lot easier than i thought
I got the whole basic layout done (there will definitely be more added later, like compileExpression, compileFunctionCall, etc)
-- Severin's compiler
local Sev_throwCompileError; local Sev_dumpBytecode; local Sev_jumpIntoBlock; local Sev_jumpOutOfBlock; local Sev_interpretExpression; local Sev_generateBytecode;
local Sev_compileIf; local Sev_compileWhile; local Sev_compileFor; local Sev_compileRepeat; local Sev_compileVariableDeclaration; local Sev_compileVariableDefinition; local Sev_compileFunctionDeclaration; local Sev_compileFunctionDefinition; local Sev_branchToCompilerFunctions;
function Sev_throwCompileError(compile_state)
end
function Sev_dumpBytecode(compile_state)
end
function Sev_interpretExpression(compile_state)
end
function Sev_jumpIntoBlock(compile_state) compile_state.current_node = compile_state.current_node.block[compile_state.block_index] compile_state.block_index = 1 end
function Sev_jumpOutOfBlock(compile_state)
end
function Sev_compileIf(compile_state)
end
function Sev_compileWhile(compile_state)
end
function Sev_compileFor(compile_state)
end
function Sev_compileRepeat(compile_state)
end
function Sev_compileVariableDeclaration(compile_state)
end
function Sev_compileVariableDefinition(compile_state)
end
function Sev_compileFunctionDeclaration(compile_state)
end
function Sev_compileFunctionDefinition(compile_state)
end
function Sev_branchToCompilerFunctions(compile_state) end
function Sev_generateBytecode(ast, current_directory)
local compile_state = {} compile_state.current_node = ast compile_state.block_index = 1 compile_state.is_finished = false
Sev_jumpIntoBlock(compile_state)
while not compile_state.is_finished do Sev_branchToCompilerFunctions(compile_state) end
-- convert bytecode into a string and return it to main.c -- then, the bytecode is handed to vm.c and interpreted return table.concat(compile_state.bytecode, " ") end
return function(ast, current_directory)
local bytecode = Sev_generateBytecode(ast, current_directory) return bytecode
end |
|
|
| Report Abuse |
|
|