ProjectsWhat's NewDownloadsCommunitySupportCompany
Forum Index » S.T.A.L.K.E.R.: Shadow of Chernobyl Forum » Mod discussion
Script optimization question

1 2 3 4 | Next 10 events »| All Messages
Posted by/on
Question/AnswerMake Oldest Up Sort by Ascending
  14:00:32  25 December 2014
profilee-mailreply Message URLTo the Top
Decane
Senior Resident
 

 
On forum: 04/04/2007
Messages: 1701
What I ended up doing with the CS gulag_general was to collect all the strings generated by load_job() into a table without manipulating them in the interim, and to then use table.concat() to concatenate them just once when load_ltx() is called. So essentially, something like this:
local ltx

function load_job()
	-- create a table to hold some strings; reference it with the file-local var:
	ltx = {}
	...
	-- fill the table:
	ltx[1] = some_string
	ltx[2] = some_other_string
	ltx[3] = yet_some_other_string
	...
end

function load_ltx()
	local dyn_ltx = (type(ltx) == "table" and table.concat(ltx)) or nil
	-- destroy the reference to the string table so the garbage collector can delete it:
	ltx = nil
	-- return the table-concatenated string if load_job() was executed, else return nil:
	return dyn_ltx
end

Still not as optimized as your solution, Alundaio, but a significant improvement over vanilla nevertheless.
  07:40:30  25 December 2014
profilee-mailreply Message URLTo the Top
Alundaio
Sad Clown
(Resident)

 

 
On forum: 04/05/2010
 

Message edited by:
Alundaio
12/25/2014 8:20:15
Messages: 2230
I would be more concerned about memory usage rather than speed, because Stalker is a 32-bit application. Avoiding bad practices alone is enough optimization for speed.

If you want to optimize Stalker for better RAM usage you need to reuse tables and avoid unnecessary string manipulation. This also has the side effect of boosting speed performance because there is less for the Garbage Collector to clean up.

What the engine needs is better FS functionality or at least a C implementation to create lists through lua. IsStalker, IsMonster, etc. would be better if it was implemented engine side.

Another thing is that all these configuration files are stored into memory as the system ini and then scripts reorganize this data again into lua tables. If you are not manipulating this information it might be better just to read it directly with the FS ini methods every time you need it.

It would be interesting to profile if storing a bool value into a table and reading it, is that much faster than just using system_ini():r_bool() every time you need it. The only reason you would need to store information that already exists elsewhere in memory should only be if you done heavy manipulation to it, like parsing a list from a string or something.

Pretty much every scheme and script binder is guilty of such a thing.



---QUOTATION---
And finally, a special note for NatVac if/when you ever read this. If you thought SoC's gulag_general.script was poorly optimized with that nice concatenated string declaration at the beginning, check out the CS version of it: http://pastebin.com/Fhvfr8Pu. I'm going to try using the table.concat method to tame that thing since doing it the way you did it in the ZRP crashes the game with an out-of-memory error for the CS version.

---END QUOTATION---



GSC's gulag general is the devil. It's a piece of crap. Literally thousands of string concatenations. The cop one is even worse, it's the reason loading times are much longer in cop than in the other games and why your RAM usage skyrockets on load. I save that stuff to file so that it doesn't have to recreate the dynamic gulag every time you load the game. That information is read-only anyway, and never changes. Only a modder would need to clear out this 'cache' to see new gulag changes. In my mod, the game on medium settings loads within a few seconds on subsequent loads (after the first load of a fresh application start) because of it.
  09:32:48  11 August 2014
profilee-mailreply Message URLTo the Top
NatVac
Senior Resident
 

 
On forum: 06/15/2007
Messages: 4282

---QUOTATION---
(1) Under what circumstances, and to what scope (i.e. function-scope or file-scope) should one localize the pairs function? Suppose e.g. that we have a file like the CS game_relations.script in which more than half of the used functions feature at least one pairs call, and one function even features a nested call (i.e. pairs within pairs). With my current understanding, I would guess that in this case, pairs should be localized at file-scope and then localized again separately in the one function (set_level_faction_community for those who have the file) where there is a nested instance of it. I think this because I think it would be a waste of memory to localize pairs inside a function if it is used just once, but not so if it is used more than once.
---END QUOTATION---


(1) I have no solid idea. You could set up a 10000-iteration test loop to see which is faster.

Personally, I'd not bother. Lua is designed for gaming. The pairs function is likely quite optimized, like certain keywords and symbols so that it directly invokes the function via VM code as if it were local. That would make the choice a toss-up, but then there's the extra local declaration and assignment...

As far as nesting is concerned: If it is local at function scope, it's local in the nested loops as well. That should apply to functions. But any locals defined/assigned inside a loop can have unintended impact unless you know how they are used; see section 2.6 in the Lua 5.1 reference.

---QUOTATION---
I've currently got this 'optimized' as:

local goodwill = (new_goodwill == "enemy" and -8000) or (new_goodwill == "friend" and 12000) or 0


But is the second style of code always (theoretically) faster than the first?
---END QUOTATION---


(2) Second approach might be running at same execution speed or maybe even a bit slower. The "and -8000" might be checking for type-sensitive logical 'true' before returning the answer. This is probably buried somewhere in one of the tips pages online. If it is that important to you, try timing both methods in a large loop.

---QUOTATION---
(3) I read here (http://springrts.com/wiki/Lua_Performance) that when inserting a function as a parameter/argument to another function, that inputted function should be localized. But in S.T.A.L.K.E.R., there are plenty of instances where a function within file-scope is left unlocalized and is taken as an argument by another function within a function in that same file-scope. For instance, in death_manager.script, there is the global function keep_item, which is taken as an argument by iterate_inventory within the function create_release_item:

self.npc:iterate_inventory(keep_item, self.npc)


What should be done about cases like this? Localize the inputted function itself? Or bind it to a local variable within the function in which it appears, if it appears more than once therein?
---END QUOTATION---


(3) One of the problems with function localization is the danger of upvalue loss in instances like recursion, I very vaguely recall without attaching my internet memory. While the processing is correctly handled by the interpreter, the JIT optimization is lost. Again, for your specific use, you could test it in a loop. But your example might not be the best one, as the iterate_inventory function is called only once, and its arguments are evaluated only once, so the implicit death_manager.keep_item indexed address is just a simple address within the iterate_inventory() function for however many items processed.

---QUOTATION---
(4) Is it performance-enhancing to localize 'self.object' in functions where it appears more than once? E.g. xr_motivator.script is filled with such references. Is 'self.object' treated as a local variable automatically (i.e. is it an implicit parameter of a class function even where the function has no explicit parameters at all, as in the case of e.g. motivator_binder:clear_callbacks)? And the same question applies to 'self.npc' where that is relevant.
---END QUOTATION---


(4) Class functions are not executed except as instances with the colon operator. All colon operations have an implicit 'self' that is the combination of the structure members the class has plus the virtual table of member function references. You could execute object:name() as object.name(object); internal to the name() member function is the self variable assigned to the implicit object which is always the first although hidden parameter (figuratively speaking; the object 'knows' itself already -- the parameter doesn't need to be passed).

But whether it is passed or calculated, it resolves to the object table, so "self.object" is an index. I don't see this as optimized because the member function does not know at parse time what the instance will be at run time.

Therefore I've been localizing "self." into "self_" as I process through scripts for cases involving multiple uses of a member variable. Quite a few have already been changed in ZRP 1.07 R5EE.

---QUOTATION---
(5) In xr_effects/xr_conditions, most or all functions take 'p' as an argument. Now, 'p' is treated as a table, so in the relevant functions, we have references to p[1], p[2], etc. I take it that in these cases, it is good to localize just as we do normally?
---END QUOTATION---


(5) Yes.

---QUOTATION---
(6) In some of GSC's scripts, and in some mod scripts, I have seen people localize the key and value .. erm .. variables? .. of the pairs function, like this:

function xxx(actor, npc, p)
	local i, v = 0, 0 -- localized
	for i, v in pairs (p) do
		if v == "a" then
			-- something
		elseif v == "b" then
			-- something
		end
	end
end


What is the point of this?
---END QUOTATION---


(6) The point is useless in the example, but the idea is to preserve the values of variables at loop exit that would otherwise be local to the for loop.

---QUOTATION---
(7) In the link referenced in (3), it also says (in test 2) that "it isn't faster to localize a class method IN the function call." But surely what is meant here is the for iterator function, and not whatever function in whose scope it falls? Or is it better to localize e.g. math.random at file-scope rather than function-scope? (I don't think it is, but some clarity here would be welcome.)
---END QUOTATION---


(7) I'll have to see the site to know the specifics. But I wonder if Lua 5.1 reference's section 2.6 has something to do with this.

---QUOTATION---
(8) I read here (http://lua-users.org/wiki/OptimisationCodingTips) that pairs is faster than a for-loop for tables wherein the element order doesn't matter. But it was my understanding that for-loops are preferable to pairs wherever their use is possible. So which is it? E.g. in death_manager.script, under function init_drop_settings, there is the table community_list which by default is iterated over with pairs. Should it be replaced by a for i = 1, #community_list -loop instead or not?
---END QUOTATION---


(8) Pairs is faster for an "unordered" list because it is just a linear sequential fetch. The ordering is first item, next item, next item, etc. Pairs just returns an iterator function, next(). (I may be confused because it is late for me and the for loop is still used for pairs.) Meanwhile a for loop that works on a table without pairs might use an index in the for statement or in the block that could be anything: number, string, even function. This requires additional processing. E.g., sorted hash compares take more time than "Next!". If there's no index processing, it could be faster than with pairs, save for the extra loop overhead like the index increment.

Using your example from death_manager.script: Instead of

item_by_community[v] = {}

you'd have

item_by_community[community_list[i]] = {}

inside the for loop. That involves indexing into community_list each time to get an index. Grabbing the next value, whatever it is? Much faster.

---QUOTATION---
If you thought SoC's gulag_general.script was poorly optimized with that nice concatenated string declaration at the beginning, check out the CS version of it: http://pastebin.com/Fhvfr8Pu. I'm going to try using the table.concat method to tame that thing since doing it the way you did it in the ZRP crashes the game with an out-of-memory error for the CS version.
---END QUOTATION---


You could make it an external file and just read it. That might be faster than table.concat() once it is in memory.
  12:32:26  9 August 2014
profilee-mailreply Message URLTo the Top
Decane
Senior Resident
 

 
On forum: 04/04/2007
 

Message edited by:
Decane
08/11/2014 3:38:37
Messages: 1701
Thread resurrection

I've been reading about LUA optimization from various sources lately, but there are a couple of things I am still unclear about:

(1) Under what circumstances, and to what scope (i.e. function-scope or file-scope) should one localize the pairs function? Suppose e.g. that we have a file like the CS game_relations.script in which more than half of the used functions feature at least one pairs call, and one function even features a nested call (i.e. pairs within pairs). With my current understanding, I would guess that in this case, pairs should be localized at file-scope and then localized again separately in the one function (set_level_faction_community for those who have the file) where there is a nested instance of it. I think this because I think it would be a waste of memory to localize pairs inside a function if it is used just once, but not so if it is used more than once.

(2) Suppose we have a block like this:

	local goodwill = 0
	if new_goodwill == "enemy" then
		goodwill = -8000
	elseif new_goodwill == "friend" then
		goodwill = 12000
	end


I've currently got this 'optimized' as:

local goodwill = (new_goodwill == "enemy" and -8000) or (new_goodwill == "friend" and 12000) or 0


But is the second style of code always (theoretically) faster than the first? Are there any instances where, on purely performance-grounds, an if-then-else block would be preferred to the second style of code (I don't know the proper name for this style so I keep referring to it as 'the second style'.)

(3) I read here (http://springrts.com/wiki/Lua_Performance) that when inserting a function as a parameter/argument to another function, that inputted function should be localized. But in S.T.A.L.K.E.R., there are plenty of instances where a function within file-scope is left unlocalized and is taken as an argument by another function within a function in that same file-scope. For instance, in death_manager.script, there is the global function keep_item, which is taken as an argument by iterate_inventory within the function create_release_item:

self.npc:iterate_inventory(keep_item, self.npc)


What should be done about cases like this? Localize the inputted function itself? Or bind it to a local variable within the function in which it appears, if it appears more than once therein?

(4) Is it performance-enhancing to localize 'self.object' in functions where it appears more than once? E.g. xr_motivator.script is filled with such references. Is 'self.object' treated as a local variable automatically (i.e. is it an implicit parameter of a class function even where the function has no explicit parameters at all, as in the case of e.g. motivator_binder:clear_callbacks)? And the same question applies to 'self.npc' where that is relevant.

(5) In xr_effects/xr_conditions, most or all functions take 'p' as an argument. Now, 'p' is treated as a table, so in the relevant functions, we have references to p[1], p[2], etc. I take it that in these cases, it is good to localize just as we do normally? I.e.:


function stop_cam_effector(actor, npc, p)
	if p then
		if type(p[1]) == "number" then
			if p[1] > 0 then
				level.remove_cam_effector(p[1])
			end
		end
	end
end


... becomes:

function stop_cam_effector(actor, npc, p)
	if p then
		local p1 = p[1]
		if type(p1) == "number" then
			if p1 > 0 then
				level.remove_cam_effector(p1)
			end
		end
	end
end


(I am expecting here that all the checks in the function pass most of the time.)

(6) In some of GSC's scripts, and in some mod scripts, I have seen people localize the key and value .. erm .. variables? .. of the pairs function, like this:

function xxx(actor, npc, p)
	local i, v = 0, 0 -- localized
	for i, v in pairs (p) do
		if v == "a" then
			-- something
		elseif v == "b" then
			-- something
		end
	end
end


What is the point of this?

EDIT:

(7) In the link referenced in (3), it also says (in test 2) that "it isn't faster to localize a class method IN the function call." But surely what is meant here is the for iterator function, and not whatever function in whose scope it falls? Or is it better to localize e.g. math.random at file-scope rather than function-scope? (I don't think it is, but some clarity here would be welcome.)

EDIT 2:

(8) I read here (http://lua-users.org/wiki/OptimisationCodingTips) that pairs is faster than a for-loop for tables wherein the element order doesn't matter. But it was my understanding that for-loops are preferable to pairs wherever their use is possible. So which is it? E.g. in death_manager.script, under function init_drop_settings, there is the table community_list which by default is iterated over with pairs. Should it be replaced by a for i = 1, #community_list -loop instead or not?

EDIT 3:

And finally, a special note for NatVac if/when you ever read this. If you thought SoC's gulag_general.script was poorly optimized with that nice concatenated string declaration at the beginning, check out the CS version of it: http://pastebin.com/Fhvfr8Pu. I'm going to try using the table.concat method to tame that thing since doing it the way you did it in the ZRP crashes the game with an out-of-memory error for the CS version.
  14:27:31  23 August 2013
profilee-mailreply Message URLTo the Top
Decane
Senior Resident
 

 
On forum: 04/04/2007
Messages: 1701
NatVac, thank you. Your replies are always so informative that writing a simple "thank you" seems somehow deficient, and leaves me feeling a little bit guilty that I did not put in more effort to finding these things out on my own.

Be that as it may, you have removed many a layer of confusion from me, even if a few of the peripheral concepts you referred to in the process (e.g. "stack", "CPU register") are foreign to me (for now). So, "спасибо"!
  09:33:56  23 August 2013
profilee-mailreply Message URLTo the Top
NatVac
Senior Resident
 

 
On forum: 06/15/2007
Messages: 4282
I don't know any really helpful answers for optimization that can't be better stated by the Lua users group (that you originally linked); some links on page 2 of this thread. Another page from that site: http://lua-users.org/wiki/OptimisationCodingTips -- see the Table Access VM code for the Lua 4 version (still applies to Lua 5.1, methinks) to see how the syntax structure is converted into VM code, what I call bytecode.

I can suggest general guidelines (generally good) and opinions (generally ... well, they're opinions).

(1) I don't see any string compares there. I suspect you mean that "==" and "~=" are boolean while "and" and "or" are strings. The first set are relational operators and the second set are logical operators, and they are all eaten at parse time to produce similar low-level bytecode.

First, I think the compiler is smart enough so that generated code tests for "not A" in the same amount of time as it tests for "A". No slowdown contribution here.

But relational operators are apparently slower than logical operators in that the type of each argument is reportedly first checked before they are compared with each other. I say "reportedly" because the bytecode for the "<=" test in the assert example linked above is just JMPLE, so the engine would have to be performing the type check. Are they both numbers? "No" means false. If "yes", then are they the same value? Relational comparisons return boolean results.

The logical tests are different. The statement "return a or b" evaluates a, tests it if false or nil and returns a if not so, else it directly evaluates b and returns it without editorial comment. With "return a and b" you get the same initial test but a false or nil results in that being returned, otherwise b is evaluated and returned. Logical comparisons return false, nil or non-false/non-nil results which could be objects, numbers (unlike C/C++, 0 is NOT false here) or boolean true.

As for the "if" forms of logical comparisons, I'm confident that an evaluation result sets the status flags, so conditional branching requires no additional comparison after the evaluation of b.

Different rules likely apply when a and/or b are constants.

As far as the (asked but not intended) question "which compare faster: booleans or strings" is concerned: I understand that only numbers are directly accessed and everything else is in a hash table. It might depend on where in that table the boolean object or string object is located during the hash lookup -- unless the booleans are treated as numbers (that's what I'd do) in which case the booleans would be faster.

(2) I'd expect B3 to be fastest at run time. B4 creates locals that are only used once, but their creation is based on the same comparison as used in B3, which by default will use short-cut evaluation and return early, without wasting a call to f(), if any prior test fails. B1 and B2 would be tied for 2nd fastest and B4 is last because f() is always called.

Why are B1 and B2 slower than B3? Only because "if x ~= value then return false end" is slower than "return x == value". In the first example the result is evaluated, tested and either branching around or falling through to load the argument 'false' to return. The second example just evaluates a result and returns it.

Re can I write [...]: While you can write that, the answer you want is "no, not valid syntax unless you replace the 'and' keywords with commas". ...Well, it's valid syntax, but not for what you want; only a1 can have a non-nil value.

Tip: Only if a complex expression is evaluated more than once is it likely suitable for a local alias assignment. This includes loops where there can be more than one iteration.

So I'd use a local p1 = p[1] in B3 and adjust the return line to use it.

Now that I've said the above: This is based on my speculation and examination of compiled C/C++, not Lua (other than the VM code in examples on web pages like the link above). Actual testing might say otherwise. In the end, the time spent evaluating all this example code is a minuscule speck against the boulders of nested function calls, loops, global references, table management and dynamically-parsed scripts.

(3) I'm not really understanding the "from their argument places" part of the question. I would not remove "actor, npc" from the "function test(actor, npc, p)" part because xr_logic.script's pick_section_from_condlist() will always pass all three values on the stack (here, CPU registers where possible) to all the xr_condition.script functions (some of which depend on actor and/or npc) even if any or all the parameters are nil. So the table p is the third reference. Removing the unused arguments from the formal parameter list for one would require that you either do it for all or test for a specific exception which defeats the purpose.

You can remove any unused references inside the function body, though.

(4) Which is faster? At run time, neither. (Unrelated: Those function return nil if the first test matches, but that is still considered the logical equivalent of 'false' -- except that any direct comparison of the 'nil' result with 'false' is 'false'.)

(5) D2 is faster, for the same reasons as (2). I'd drop the condition declaration and just return (x() == y()) or z().

Maybe some of this will make sense to you. Others may have useful answers; I hope they will contribute.
  11:41:47  21 August 2013
profilee-mailreply Message URLTo the Top
Decane
Senior Resident
 

 
On forum: 04/04/2007
 

Message edited by:
Decane
08/22/2013 23:49:20
Messages: 1701
More optimization (or just 'good code') questions

(1) Is a boolean compare always preferred to string compares where feasible, even if the boolean compare makes use of lots of "and" statements? For example:

(A1)
function a()
	if not x then
		return false
	end
	if not y then
		return false
	end
	if not z then
		return false
	end
	return something or something_else
end


(A2)
function a()
	return x and y and z and (something or something_else)
end

¯¯¯¯¯¯¯¯¯¯¯¯

(2) How would the following functions be ranked in order of speed, and why? Based on what has already been written in this thread about string compares, I would hypothesize that B3 and B4 both are faster than either of B1 and B2. Based on what has been written about local variables, I would hypothesize that B4 is the fastest of them all. But I really have no clue about B1 vs. B2.

(B1)
function b(p)
	if p[1] == "a" or p[1] == "b" or p[1] == "c" then
		return f() == p[1]
	end
	return false
end


(B2)
function b(p)
	if p[1] ~= "a" then
		if p[1] ~= "b" then
			if p[1] ~= "c" then
				return false
			end
		end
	end
	return f() == p[1]
end


(B3)
function b(p)
	return (p[1] == "a" or p[1] == "b" or p[1] == "c") and f() == p[1]
end


(B4)
function b(p)
	local a1 = p[1] == "a"
	local b1 = p[1] == "b"
	local c1 = p[1] == "c"
	local fx = f() == p[1]
	return (a1 or b1 or c1) and fx
end


Finally, a syntax question. Instead of listing each local boolean on a separate line like I did above, can I write:
local a1, b1, c1, fx = (p[1] == "a") and (p[1] == "b") and (p[1] == "c") and (f() == p[1])

...?

¯¯¯¯¯¯¯¯¯¯¯¯
(3) In xr_conditions.script, almost every function has three arguments (actor, npc, p). But many of the functions do not use more than one, namely p. Would it make any sense to erase the other two unused variables from their argument places, assuming that they are not used?

¯¯¯¯¯¯¯¯¯¯¯¯
(4) Which of these is faster:

(C1)
function xyz()
	if not a then
		return
	end
	if not b then
		return
	end
	return x == y
end


(C2)
function xyz()
	if not a or not b then
		return
	end
	return x == y
end

...?

¯¯¯¯¯¯¯¯¯¯¯¯
(5) And lastly, which of these is faster:

(D1)
function f()
	local a, b, c = x(), y(), z()
	return (a == b) or c
end


(D2)
function f()
	local condition = (x() == y()) or z()
	return condition
end


...?
  09:36:21  20 August 2013
profilee-mailreply Message URLTo the Top
Decane
Senior Resident
 

 
On forum: 04/04/2007
Messages: 1701
Very helpful as always, NatVac. Thank you!
  08:08:23  20 August 2013
profilee-mailreply Message URLTo the Top
NatVac
Senior Resident
 

 
On forum: 06/15/2007
Messages: 4282
Decane, the return statement returns when encountered in the script with the value or values parsed from the argument. In your example, the processing of the first pair in the table would execute the return statement with the boolean value of the compare using the value of that first pair. The remainder of the table processing would be skipped.

You are using a table with just values, equivalent to { [1]=1, [2]=2, [3]=3 }. You might expect the first value to be 1 in your loop. But--

"The order in which the indices are enumerated is not specified even for numeric indices." The docs say use a numerical for or use the ipairs function for this.

As an example of the order issue, look at your inventory from game to game. Certain artifacts will always appear before or after others for the whole game, but this order will vary from game to game. It may be that your table is safe by definition, but tables that are manipulated prior to the loop processing have no guaranteed order.

I'd expect a true value returned for the function with that specific data, but otherwise it's a toss-up.
  22:09:15  19 August 2013
profilee-mailreply Message URLTo the Top
Decane
Senior Resident
 

 
On forum: 04/04/2007
 

Message edited by:
Decane
08/20/2013 0:04:38
Messages: 1701
Suppose that I have the table and function below:
t = {1, 2, 3}

function a()
	for k, v in pairs (t) do
		return 2 >= v
	end
end

I want to know if the function returns the truth value of the first iteration of the pairs() loop (i.e. "true", because 2 >= 1), or if it iterates through the entire table and consequently returns the truth value of the third and final iteration of the pairs() loop (i.e. "false", because 2 >/= 3).

In general, I would wish to know how a "return" declaration behaves when it takes place inside a key-value pairs() loop. (I realize that this is not strictly an optimization question, but it relates to optimization through its implications.)

EDIT: Simplified the question.
 
Each word should be at least 3 characters long.
Search:    
Search conditions:    - spaces as AND    - spaces as OR   
 
Forum Index » S.T.A.L.K.E.R.: Shadow of Chernobyl Forum » Mod discussion
 

All short dates are in Month-Day-Year format.


 

Copyright © 1995-2020 GSC Game World. All rights reserved.
This site is best viewed in Internet Explorer 4.xx and up and Javascript enabled. Webmaster.
Opera Software products are not supported.
If any problem concerning the site functioning under Opera Software appears apply
to Opera Software technical support service.