ProjectsWhat's NewDownloadsCommunitySupportCompany
Forum Index » S.T.A.L.K.E.R.: Shadow of Chernobyl Forum » Mod discussion
Script optimization question

« Previous 10 events | 1 2 3 4 | Next 10 events »| All Messages
Posted by/on
Question/AnswerMake Newest Up Sort by Descending
  18:22:09  27 April 2013
profilee-mailreply Message URLTo the Top
Jketiynu
Swartz
(Resident)

 

 
On forum: 04/05/2007
 

Message edited by:
Jketiynu
04/27/2013 18:28:26
Messages: 867
Natvac, if you don't mind I have a few questions for you regarding script optimization.

I've been reading lots of tutorials on LUA optimization as well as finding a few tools that claim to help in optimization.

Here's my questions:

1) Does removing "white space" make any difference? Likewise does it make any difference to remove comments rather than just leaving them commented out? I've seen multiple claims of this, however I wonder if it would make any real difference with Stalker scripts?

2) Does the length of things like a name of a local variable name perform better when shorter? Again I'm seeing claims that it does.
Example:

local reallylongstringofcharacters = bone_index

vs.

local a = bone_index



Thank you for any help.
  02:14:03  9 May 2013
profilee-mailreply Message URLTo the Top
NatVac
Senior Resident
 

 
On forum: 06/15/2007
Messages: 4281
I can give you my slightly-informed opinions on this.

>> 1) Does removing "white space" make any difference?

If we are talking scripts or configuration files: It might make loading a teensy bit faster and debugging a lot harder. There would be no impact on script execution due to the JIT compiler. It might make a small difference on ini_file() processing, as you are looking at the raw lines for some operations. If you repeatedly read the same lines in event loop processing, this might be worth optimizing.

If we are talking shaders, you guys know much more than I. I would expect the shader compiler to remove any overhead. Maybe the compile runs faster if there is less to process. Also, if the shader routines for a complex scene take more space than the instruction buffer on the card (at least 64K on modern cards), you can expect swapping to RAM with an FPS hit. But that's more a scene design or shader coding issue than a white space issue.

---QUOTATION---
Likewise does it make any difference to remove comments rather than just leaving them commented out? I've seen multiple claims of this, however I wonder if it would make any real difference with Stalker scripts?
---END QUOTATION---


Just in initial loading and parsing. The game uses a JIT compiler, so actual code execution doesn't even know about the comments. You can save a few milliseconds off the script parsing, and the load from disk is not going to be appreciably reduced if you remove all the comments.

What I've noticed is some significant improvements in loading and execution when valid but unused junk is kept from being loaded and subsequently managed by the game. Don't load it, don't have memory allocated for it, don't have objects defined and processed if they are not going to be used.

---QUOTATION---
2) Does the length of things like a name of a local variable name perform better when shorter? Again I'm seeing claims that it does.
---END QUOTATION---


Before I tested this, I was of the thought that code execution was not affected by var length for locals. My opinion has not changed after testing. Using 27- and 28-character names had no discernable difference in test code execution versus 2-character names used millions of times, for both vars local to the routine or just local to the script object. (See test info in previous post in this thread.)

I didn't test tables, but strings are hashed in the game even for them. The one contributor would be the extra few milliseconds to parse through the script, something you will not normally notice unless something else is wrong.

Note that the foregoing did not include the cost of using loadfile() or loadstring() during game execution. In those cases, you would be dynamically parsing text to execute it as code, and in this case, yes, shorten var names, get rid of spaces, comments, etc. If possible, don't use those functions except during initialization or in debugging. Better still, just put that code in a normal script file. Repeatedly recompiling the same code is the opposite of optimization.
  19:53:40  10 May 2013
profilee-mailreply Message URLTo the Top
Jketiynu
Swartz
(Resident)

 

 
On forum: 04/05/2007
Messages: 867

---QUOTATION---
Helpful stuff.
---END QUOTATION---



Thank you!

I had "optimized" various scripts and was wondering why it seemed to make no difference in performance

That's good because it means modding and merging will be easier as it makes for some hard-to-read scripts if you do the stuff I mentioned.
  22:09:15  19 August 2013
profilee-mailreply Message URLTo the Top
Decane
Senior Resident
 

 
On forum: 04/04/2007
 

Message edited by:
Decane
08/20/2013 0:04:38
Messages: 1696
Suppose that I have the table and function below:
t = {1, 2, 3}

function a()
	for k, v in pairs (t) do
		return 2 >= v
	end
end

I want to know if the function returns the truth value of the first iteration of the pairs() loop (i.e. "true", because 2 >= 1), or if it iterates through the entire table and consequently returns the truth value of the third and final iteration of the pairs() loop (i.e. "false", because 2 >/= 3).

In general, I would wish to know how a "return" declaration behaves when it takes place inside a key-value pairs() loop. (I realize that this is not strictly an optimization question, but it relates to optimization through its implications.)

EDIT: Simplified the question.
  08:08:23  20 August 2013
profilee-mailreply Message URLTo the Top
NatVac
Senior Resident
 

 
On forum: 06/15/2007
Messages: 4281
Decane, the return statement returns when encountered in the script with the value or values parsed from the argument. In your example, the processing of the first pair in the table would execute the return statement with the boolean value of the compare using the value of that first pair. The remainder of the table processing would be skipped.

You are using a table with just values, equivalent to { [1]=1, [2]=2, [3]=3 }. You might expect the first value to be 1 in your loop. But--

"The order in which the indices are enumerated is not specified even for numeric indices." The docs say use a numerical for or use the ipairs function for this.

As an example of the order issue, look at your inventory from game to game. Certain artifacts will always appear before or after others for the whole game, but this order will vary from game to game. It may be that your table is safe by definition, but tables that are manipulated prior to the loop processing have no guaranteed order.

I'd expect a true value returned for the function with that specific data, but otherwise it's a toss-up.
  09:36:21  20 August 2013
profilee-mailreply Message URLTo the Top
Decane
Senior Resident
 

 
On forum: 04/04/2007
Messages: 1696
Very helpful as always, NatVac. Thank you!
  11:41:47  21 August 2013
profilee-mailreply Message URLTo the Top
Decane
Senior Resident
 

 
On forum: 04/04/2007
 

Message edited by:
Decane
08/22/2013 23:49:20
Messages: 1696
More optimization (or just 'good code') questions

(1) Is a boolean compare always preferred to string compares where feasible, even if the boolean compare makes use of lots of "and" statements? For example:

(A1)
function a()
	if not x then
		return false
	end
	if not y then
		return false
	end
	if not z then
		return false
	end
	return something or something_else
end


(A2)
function a()
	return x and y and z and (something or something_else)
end

¯¯¯¯¯¯¯¯¯¯¯¯

(2) How would the following functions be ranked in order of speed, and why? Based on what has already been written in this thread about string compares, I would hypothesize that B3 and B4 both are faster than either of B1 and B2. Based on what has been written about local variables, I would hypothesize that B4 is the fastest of them all. But I really have no clue about B1 vs. B2.

(B1)
function b(p)
	if p[1] == "a" or p[1] == "b" or p[1] == "c" then
		return f() == p[1]
	end
	return false
end


(B2)
function b(p)
	if p[1] ~= "a" then
		if p[1] ~= "b" then
			if p[1] ~= "c" then
				return false
			end
		end
	end
	return f() == p[1]
end


(B3)
function b(p)
	return (p[1] == "a" or p[1] == "b" or p[1] == "c") and f() == p[1]
end


(B4)
function b(p)
	local a1 = p[1] == "a"
	local b1 = p[1] == "b"
	local c1 = p[1] == "c"
	local fx = f() == p[1]
	return (a1 or b1 or c1) and fx
end


Finally, a syntax question. Instead of listing each local boolean on a separate line like I did above, can I write:
local a1, b1, c1, fx = (p[1] == "a") and (p[1] == "b") and (p[1] == "c") and (f() == p[1])

...?

¯¯¯¯¯¯¯¯¯¯¯¯
(3) In xr_conditions.script, almost every function has three arguments (actor, npc, p). But many of the functions do not use more than one, namely p. Would it make any sense to erase the other two unused variables from their argument places, assuming that they are not used?

¯¯¯¯¯¯¯¯¯¯¯¯
(4) Which of these is faster:

(C1)
function xyz()
	if not a then
		return
	end
	if not b then
		return
	end
	return x == y
end


(C2)
function xyz()
	if not a or not b then
		return
	end
	return x == y
end

...?

¯¯¯¯¯¯¯¯¯¯¯¯
(5) And lastly, which of these is faster:

(D1)
function f()
	local a, b, c = x(), y(), z()
	return (a == b) or c
end


(D2)
function f()
	local condition = (x() == y()) or z()
	return condition
end


...?
  09:33:56  23 August 2013
profilee-mailreply Message URLTo the Top
NatVac
Senior Resident
 

 
On forum: 06/15/2007
Messages: 4281
I don't know any really helpful answers for optimization that can't be better stated by the Lua users group (that you originally linked); some links on page 2 of this thread. Another page from that site: http://lua-users.org/wiki/OptimisationCodingTips -- see the Table Access VM code for the Lua 4 version (still applies to Lua 5.1, methinks) to see how the syntax structure is converted into VM code, what I call bytecode.

I can suggest general guidelines (generally good) and opinions (generally ... well, they're opinions).

(1) I don't see any string compares there. I suspect you mean that "==" and "~=" are boolean while "and" and "or" are strings. The first set are relational operators and the second set are logical operators, and they are all eaten at parse time to produce similar low-level bytecode.

First, I think the compiler is smart enough so that generated code tests for "not A" in the same amount of time as it tests for "A". No slowdown contribution here.

But relational operators are apparently slower than logical operators in that the type of each argument is reportedly first checked before they are compared with each other. I say "reportedly" because the bytecode for the "<=" test in the assert example linked above is just JMPLE, so the engine would have to be performing the type check. Are they both numbers? "No" means false. If "yes", then are they the same value? Relational comparisons return boolean results.

The logical tests are different. The statement "return a or b" evaluates a, tests it if false or nil and returns a if not so, else it directly evaluates b and returns it without editorial comment. With "return a and b" you get the same initial test but a false or nil results in that being returned, otherwise b is evaluated and returned. Logical comparisons return false, nil or non-false/non-nil results which could be objects, numbers (unlike C/C++, 0 is NOT false here) or boolean true.

As for the "if" forms of logical comparisons, I'm confident that an evaluation result sets the status flags, so conditional branching requires no additional comparison after the evaluation of b.

Different rules likely apply when a and/or b are constants.

As far as the (asked but not intended) question "which compare faster: booleans or strings" is concerned: I understand that only numbers are directly accessed and everything else is in a hash table. It might depend on where in that table the boolean object or string object is located during the hash lookup -- unless the booleans are treated as numbers (that's what I'd do) in which case the booleans would be faster.

(2) I'd expect B3 to be fastest at run time. B4 creates locals that are only used once, but their creation is based on the same comparison as used in B3, which by default will use short-cut evaluation and return early, without wasting a call to f(), if any prior test fails. B1 and B2 would be tied for 2nd fastest and B4 is last because f() is always called.

Why are B1 and B2 slower than B3? Only because "if x ~= value then return false end" is slower than "return x == value". In the first example the result is evaluated, tested and either branching around or falling through to load the argument 'false' to return. The second example just evaluates a result and returns it.

Re can I write [...]: While you can write that, the answer you want is "no, not valid syntax unless you replace the 'and' keywords with commas". ...Well, it's valid syntax, but not for what you want; only a1 can have a non-nil value.

Tip: Only if a complex expression is evaluated more than once is it likely suitable for a local alias assignment. This includes loops where there can be more than one iteration.

So I'd use a local p1 = p[1] in B3 and adjust the return line to use it.

Now that I've said the above: This is based on my speculation and examination of compiled C/C++, not Lua (other than the VM code in examples on web pages like the link above). Actual testing might say otherwise. In the end, the time spent evaluating all this example code is a minuscule speck against the boulders of nested function calls, loops, global references, table management and dynamically-parsed scripts.

(3) I'm not really understanding the "from their argument places" part of the question. I would not remove "actor, npc" from the "function test(actor, npc, p)" part because xr_logic.script's pick_section_from_condlist() will always pass all three values on the stack (here, CPU registers where possible) to all the xr_condition.script functions (some of which depend on actor and/or npc) even if any or all the parameters are nil. So the table p is the third reference. Removing the unused arguments from the formal parameter list for one would require that you either do it for all or test for a specific exception which defeats the purpose.

You can remove any unused references inside the function body, though.

(4) Which is faster? At run time, neither. (Unrelated: Those function return nil if the first test matches, but that is still considered the logical equivalent of 'false' -- except that any direct comparison of the 'nil' result with 'false' is 'false'.)

(5) D2 is faster, for the same reasons as (2). I'd drop the condition declaration and just return (x() == y()) or z().

Maybe some of this will make sense to you. Others may have useful answers; I hope they will contribute.
  14:27:31  23 August 2013
profilee-mailreply Message URLTo the Top
Decane
Senior Resident
 

 
On forum: 04/04/2007
Messages: 1696
NatVac, thank you. Your replies are always so informative that writing a simple "thank you" seems somehow deficient, and leaves me feeling a little bit guilty that I did not put in more effort to finding these things out on my own.

Be that as it may, you have removed many a layer of confusion from me, even if a few of the peripheral concepts you referred to in the process (e.g. "stack", "CPU register") are foreign to me (for now). So, "спасибо"!
  12:32:26  9 August 2014
profilee-mailreply Message URLTo the Top
Decane
Senior Resident
 

 
On forum: 04/04/2007
 

Message edited by:
Decane
08/11/2014 3:38:37
Messages: 1696
Thread resurrection

I've been reading about LUA optimization from various sources lately, but there are a couple of things I am still unclear about:

(1) Under what circumstances, and to what scope (i.e. function-scope or file-scope) should one localize the pairs function? Suppose e.g. that we have a file like the CS game_relations.script in which more than half of the used functions feature at least one pairs call, and one function even features a nested call (i.e. pairs within pairs). With my current understanding, I would guess that in this case, pairs should be localized at file-scope and then localized again separately in the one function (set_level_faction_community for those who have the file) where there is a nested instance of it. I think this because I think it would be a waste of memory to localize pairs inside a function if it is used just once, but not so if it is used more than once.

(2) Suppose we have a block like this:

	local goodwill = 0
	if new_goodwill == "enemy" then
		goodwill = -8000
	elseif new_goodwill == "friend" then
		goodwill = 12000
	end


I've currently got this 'optimized' as:

local goodwill = (new_goodwill == "enemy" and -8000) or (new_goodwill == "friend" and 12000) or 0


But is the second style of code always (theoretically) faster than the first? Are there any instances where, on purely performance-grounds, an if-then-else block would be preferred to the second style of code (I don't know the proper name for this style so I keep referring to it as 'the second style'.)

(3) I read here (http://springrts.com/wiki/Lua_Performance) that when inserting a function as a parameter/argument to another function, that inputted function should be localized. But in S.T.A.L.K.E.R., there are plenty of instances where a function within file-scope is left unlocalized and is taken as an argument by another function within a function in that same file-scope. For instance, in death_manager.script, there is the global function keep_item, which is taken as an argument by iterate_inventory within the function create_release_item:

self.npc:iterate_inventory(keep_item, self.npc)


What should be done about cases like this? Localize the inputted function itself? Or bind it to a local variable within the function in which it appears, if it appears more than once therein?

(4) Is it performance-enhancing to localize 'self.object' in functions where it appears more than once? E.g. xr_motivator.script is filled with such references. Is 'self.object' treated as a local variable automatically (i.e. is it an implicit parameter of a class function even where the function has no explicit parameters at all, as in the case of e.g. motivator_binder:clear_callbacks)? And the same question applies to 'self.npc' where that is relevant.

(5) In xr_effects/xr_conditions, most or all functions take 'p' as an argument. Now, 'p' is treated as a table, so in the relevant functions, we have references to p[1], p[2], etc. I take it that in these cases, it is good to localize just as we do normally? I.e.:


function stop_cam_effector(actor, npc, p)
	if p then
		if type(p[1]) == "number" then
			if p[1] > 0 then
				level.remove_cam_effector(p[1])
			end
		end
	end
end


... becomes:

function stop_cam_effector(actor, npc, p)
	if p then
		local p1 = p[1]
		if type(p1) == "number" then
			if p1 > 0 then
				level.remove_cam_effector(p1)
			end
		end
	end
end


(I am expecting here that all the checks in the function pass most of the time.)

(6) In some of GSC's scripts, and in some mod scripts, I have seen people localize the key and value .. erm .. variables? .. of the pairs function, like this:

function xxx(actor, npc, p)
	local i, v = 0, 0 -- localized
	for i, v in pairs (p) do
		if v == "a" then
			-- something
		elseif v == "b" then
			-- something
		end
	end
end


What is the point of this?

EDIT:

(7) In the link referenced in (3), it also says (in test 2) that "it isn't faster to localize a class method IN the function call." But surely what is meant here is the for iterator function, and not whatever function in whose scope it falls? Or is it better to localize e.g. math.random at file-scope rather than function-scope? (I don't think it is, but some clarity here would be welcome.)

EDIT 2:

(8) I read here (http://lua-users.org/wiki/OptimisationCodingTips) that pairs is faster than a for-loop for tables wherein the element order doesn't matter. But it was my understanding that for-loops are preferable to pairs wherever their use is possible. So which is it? E.g. in death_manager.script, under function init_drop_settings, there is the table community_list which by default is iterated over with pairs. Should it be replaced by a for i = 1, #community_list -loop instead or not?

EDIT 3:

And finally, a special note for NatVac if/when you ever read this. If you thought SoC's gulag_general.script was poorly optimized with that nice concatenated string declaration at the beginning, check out the CS version of it: http://pastebin.com/Fhvfr8Pu. I'm going to try using the table.concat method to tame that thing since doing it the way you did it in the ZRP crashes the game with an out-of-memory error for the CS version.
 
Each word should be at least 3 characters long.
Search:    
Search conditions:    - spaces as AND    - spaces as OR   
 
Forum Index » S.T.A.L.K.E.R.: Shadow of Chernobyl Forum » Mod discussion
 

All short dates are in Month-Day-Year format.


 

Copyright © 1995-2020 GSC Game World. All rights reserved.
This site is best viewed in Internet Explorer 4.xx and up and Javascript enabled. Webmaster.
Opera Software products are not supported.
If any problem concerning the site functioning under Opera Software appears apply
to Opera Software technical support service.