Lag Busting: How to Make Your Server Run More Smoothly

From NWN Lexicon
Jump to: navigation, search


Nwnee logo.jpg Note: This article documents Neverwinter Nights: Enhanced Edition new content or changes/updates/fixes to 1.69 functions. These are all listed under the category and patches pages.

This page has been rewritten for NWN:EE. There have been many optimisations when running a module (in singleplayer or multiplayer) both from the internal functions being sped up to being outright replaced with faster versions.

Introduction

One of the things all server administrators have in common is the desire to keep their servers running as smoothly as possible. They spend a lot of time working on their servers, and they don't want their players' enjoyment of their work interfered with. One of the most common interferences is "lag". Used loosely, lag is a hiccup in game play, caused by any number of things, ranging in duration from barely noticeable to a minute or more. This tutorial is on the things you can do within the toolset to keep lag to a minimum. It will NOT cover topics like defragging your server's hard drive, optimizing your databases, or any other general computing techniques with application beyond the narrow realm of Neverwinter Nights.

Technical terms used in this tutorial

There are some technical terms or commands used in this tutorial that are used to assist in troubleshooting network problems. While this it is not within the scope of this tutorial to teach the reader how to use the terms, following is a brief definition.

Ping
Ping is a computer network tool used to test whether a particular host is reachable across and Internet Protocol (IP) based network. Ping measures the round-trip time and records any packet loss. Using the "ping" command is often known as "pinging".
Traceroute
Traceroute is a computer network tool used to determine the route taken by packets across an IP network. The traceroute tool is available on most operating systems. Under Windows the command is "tracert" and is available from the Windows command line. In this tutorial, the term "tracert" will mean the Traceroute command.

Lag - What is it?

More properly, lag refers to a delay in information exchanged between client and server, resulting in a hang with breaks immersion and can result in loss of control of one's character, with often unpleasant consequences. Players, however, tend to use the term "lag" much more broadly, to refer to anything that causes a hang or break in play experience. Decoding their meaning is critical to understanding the problem that they're experiencing, and to fixing it, if indeed there's anything you can do - sometimes there isn't. So, for the remainder of this tutorial, we're going to use the term "lag" in this broader sense, and label actual lag "connection lag". Before we can discuss ways to prevent lag, we need to familiarize ourselves with the various types of issues that can give rise to interference with game play. Below is a rough listing.

Connection Lag
Connection Lag, is also called network lag. Connection Lag arises from a problem with the connection between the player's computer and the server. It can have a number of causes, including active programs on the player's computer, the player's router, the server's router or the server's active programs, or somewhere in between. Connection lag can be detected by pinging the server, and checking ping times. A network trace or trace route (tracert command in Windows DOS) will show where, roughly, the problem lies. Often there will be nothing you can do about this sort of lag, other than to assist the player in troubleshooting their system, or waiting for network issues to be resolved. If a ping results in an unusually high number, or the tracert fails at a certain jump, the problem is connection lag.
Graphics Lag
This is one of the most common types of lag, and the one most mistaken by players as network lag, or as some other sort of problem external to their system. Graphics "lag" occurs when the player's computer is overwhelmed with the graphical data it is getting, and fails to render graphics smoothly, resulting in poor frame rate, lockups, and occasionally more exotic issues. It is LARGELY a client-side issue, and the player will need to take steps to fix it. Fixing the problem may require steps, such as, changing their graphics settings or getting a different graphics card. There are, however, some things that a server administrator can and should do to prevent this sort of thing, which we will discuss below. If the "lag" a player experiences is intermittent and coincides with times when a lot is happening on their screen, and other players on the server do not experience it when they do, the problem is likely graphics lag. Graphics lag can often also be detected by having the player hit the tilde (~) key, type "fps", and hit enter while playing, which displays the Frames Per Second they are seeing displayed. The higher the number, the better the framerate; the lower, the choppier ("laggier") things will appear.
Server Lag
This is the final type of "lag", and the one you have the most direct control over. It arises when the server is trying to do too much at once. The game engine begins to run on the hairy edge, and it stops doing certain things, based on priorities in the engine. This often is caused by a lot of players on a server, poorly written scripts, poorly built areas, or some combination of the above. Other times, there may be some technical issue at work, like a crippled game server, an out-of-control process, or insufficient RAM. Server lag is often the trickiest to detect, and can be diagnosed by ruling out both connection and graphics lag. In more extreme cases, however, it is not at all hard to detect, as low-priority processes cease. These include the updating of the game clock, resulting in the game being stuck permanently at a certain time, arresting the day/night cycle. There are other low-priority processes as well. These low priority processes may fail with even a couple players on a well-built server and module, so they are not of much help when diagnosing a problem. Some examples of low priority processes include persistent area of effect heartbeat scripts and spawned-in-placeable heartbeat scripts. These scripts often will not fire, even on a healthy server.

What You can do Within the Toolset to Prevent Lag

This section has been wholly rewritten with 2021 in mind.

How NWN Server Operates and Optimal Hardware

CPU

Due to it's age NWN is a wholly single threaded application. A server running it will not benefit from more than 2 cores - one for the game, and one for the underlying OS. A faster clock speed will run the server better; although a more modern CPU at a lower clock rate may run better than an ancient higher clock rate CPU.

For anything other than playing with a few friends, try not to run anything else on the server; and Linux is recommended since Windows can helpfully spend a lot of CPU running various background and foreground processes, especially updates, sometimes out of your control.

Memory

Improvements to NWN:EE means it is now able to utilise more memory being 64bit compiled. Generally this amount is around 4GB to load in resources and run things. Make sure you have some overhead again for the OS and anything else you are running (NWNX, nwsync etc.).

Network

This game used to run on 28.8k dial up (although possibly not well!). The upstream amount of data needed to run a server, and the download speed required of clients, can vary depending on what is being done but is not a huge amount of data. For instance loading a large area with a lot of placeable objects can make the loading take a while since all that data has to be streamed over (what placeables are where, what tiles are loaded where, etc.).

The distance to the server tends to account for more of lag then anything else; the latency can delay what the client is seeing thus causing some odd bouncing around and lag compensation as the server controls exactly where the player is/what they can do (the client tries its best to estimate what is going on).

Disk

The amount of tiny assets in NWN are quite high. A server doesn't need to load models, textures and the like (although does need to be able to see the file exists), but read speed - thus an SSD - can improve loading a module (for restarting a server) and speed up dynamic asset loading (eg; data files of creatures or scripts not commonly loaded). Again Linux is optimal at this kind of disk access compared to Windows.

For writes there are not many - player .BIC files are written when they leave/exports are done, and campaign database files are written to, but the much more optimal sqlite implementation has sped this up immensely.

Profiling and Improving Scripts

The game is singlethreaded therefore when a script runs for longer than expected it can cause lag, since the server is not keeping up a good tickrate (sending/receiving data to players and updating the world state). For instance a 60 fps / hz tickrate, means each tick has 1000 / 60 = 16.67 milliseconds of time to execute. Since scripts can run in the same frame - for instance by ExecuteScript, or reaction scripts to a spell being cast (SignalEvent) or a world event (player appearing in LOS of monsters trigger the OnPerception event a lot at once).

Once you have got a server running profiling may be a good way to trace what scripts are running, at what frequency, and how long for. Doing too much pre-optimisation now in NWN:EE should be unnecessary, for these reasons:

  • There are optimisations to the speed of GetLocalInt/GetLocalString/GetLocalFloat which speed up access, especially on large amounts of them, massively so.
  • GetObjectByTag has now been sped up with a proper search tree not linear search, so should be fast if constantly used in a larger module.
  • The underlying database functions and new database functions are incredibly fast, some of which run wholly in memory.
  • The "Too Many Instructions" (TMI) limit can now be raised significantly meaning fast-running but instruction-heavy loop scripts do not have to be split across multiple frames.
    • If you do want to split things across multiple frames save where you are in a loop and use DelayCommand(0.0, ExecuteScript()); to run another instance to avoid TMI issues.

The base game scripts are generally not the causes of lag, excepting some odd Horse-based scripts which do things such as try and equip "player hides" on rest, which can TMI and cause a lot of lag.

Script Profiling Example

To enable script profiling find the nwscript.vm.profiling setting in setting.tml. If set to true the server, once started, will start to track what scripts are run and for how long for (in milliseconds). Only enable it when you need it since you don't want to do unnecessary profiling if you're happy. Once the server is shutdown it is output to the client log, for instance:

**************************** Start Script Profiling ****************************
Script Name		Times Run	Total Time
0_enter_pc_plot		60		0
0_modload		1		31
j_ai_onspawn		2		0
j_ai_onuserdef		2		15
0_test_crearray		5		0
j_ai_onheartbeat	8		0
nw_g0_convplac		1		0
NW_G0_Conversat		1		0
0_con_arena		1		0
nw_walk_wp		1		0
0_clearactions		14		0
j_ai_onperceive		3		0
j_ai_oncombatrou	7		142
NW_S0_Haste		1		0
x2_pc_umdcheck		7		0
0_spelldebug		7		16
j_ai_onspellcast	6		0
NW_S0_Invisib		2		0
X2_S2_EpMageArm		1		0
X2_S2_HELLBALL		1		0
X2_S2_EpicWard		1		0
NW_S0_Premo		1		0
**************************** End Script Profiling ******************************

The output is generally very large - you will have hundreds to thousands of scripts being run which is perfectly normal.

Once gathered you can then analyse the results. There are 3 core things to assess:

  • Amount of times run - this can show some events that are unnecessarily being run too often. For instance the OnPerception event could be emptied out in a city where everyone is plot and PvP is off, since walking NPCs can make it trigger hundreds of times a minute.
  • Total time - this can be a good assessment of heavy script usage, especially paired with the average time
  • Average time - a script only run a few dozen times but having a lot of time per run can actually be much worse - and hang/lag the server since it would run over several ticks, bringing the tick rate down stopping world actions/network traffic while it runs. As per above the magical "16.67" number may help assess the worst offenders here.
    • Note since you have not got a "Maximum run time" the average is sometimes deceptive, and a single script could hang for a few seconds (thousands of milliseconds) in some cases but appear to not have a high average time.

An example of some key lines in someones profiling is below. The total time was used as a cutoff and I've cut it down to some relevant entries.

Script Name Times Run Total Time Average Time/script Description and Notes
ms_pc_key 1274 514474 403.8257457 This script pulls a players stored inventory - using a merchant Store object - from the database. It will store will also be saved with this script in usual circumstances. These store objects which can get incredibly full - you can have up to 25 pages per tab. Limiting the amount of objects or having a placeable (which is a much smaller inventory) used instead may be a good idea. However you can also possibly just live with it - the amount of times it is called is minimal. Another option this particular PW was looking at was using ActivatePortal to move combat to another server instance, where this script wouldn't run at all.
ms_pc_hb 9643 529173 54.87638702 The main reason it is a high average time on this heartbeat script is that it checks if the player is in combat, if so, it will store the player storage which ms_pc_key opens.
hc_on_combat_mod 277665 1242721 4.475612699 This was hooking into NWNX events to run some code for particular combat events, such as before someone is attacked (NWNX_ON_INPUT_ATTACK_OBJECT_BEFORE) it checks PvP. It's actually reasonably optimal, but is called a lot of times due to how actions work in NWN.
player_bosscheck 66 1750 26.51515152 This is only fired 66 times but runs for 26.5 milliseconds on average. A possible one to look to improve because of the high amount of time per run, since it could cause a bit of stutter with other things going on as well.
j_ai_onheartbeat 7469580 999480 0.13380672 The main creature OnHeartbeat script. The amount of times run is the most in the entire module, but the average amount of time spent running it is only 0.13 milliseconds - so this is quite optimal. Heartbeats do not fire all at once and are regularly delayed for creatures in non-PC areas.
j_ai_walkwaypoin 1926399 255543 0.132653204 Waypoints running Bioware code has a similar impact on average to the heartbeat script, and it's run about a quarter of the amount of times.
j_ai_onconversat 1694502 70620 0.041675961 The OnConversation script in this case probably is optimal - the AI likes to tell each other what is going on (a sort of "help me I'm under attack") which will fire this script. Note the run time is tiny - a mere 0.04 milliseconds on average. It also will fire when the creature is clicked on for normal dialogue.
j_ai_onpercieve 1644818 36431 0.022148955 The OnPerception script looks optimal. It has a cut down time of 0.022 milliseconds/run on average. As you can tell it is fired a lot - every time a creature walks in range of another it will fire 2 times - once for being seen and once for being heard (you do wonder why Bioware didn't code the script to fire once in this case!). Creatures walking around in towns can fire this often, and sometimes is worth removing it from them.
j_ai_ondamaged 359663 31601 0.087862805 The OnDamaged seems to be the next highest fired script of AI scripts, and is because every time damage is dealt - EffectDamage, melee and ranged, it fires. It is also quite a fast script so shouldn't worry you much.
j_ai_detercombat 207112 244670 1.181341496 The OnCombatRoundEnd script - obviously this can take up a lot of time. 1.18 on average is quite a lot but it likely has a higher maximum and minimum depending on the creatures spells and abilities and the amount of enemies in the area to assess.
NW_S0_Summon 1407 5455 3.877043355 Standard spell script for the Summon Monster 1 through 9 spells. This is run a mere 1407 times but accounts for a comparative lot of script run time having an average of 3.87ms runtime. This is likely simply due to the fact summons created by EffectSummonCreature are themselves basically CreateObject mixed in with some other things, thus relatively slow, so probably nothing to worry about.

Script Improvement Suggestions

So you've perhaps found some slow running scripts. How would you fix them up? Some things to look for:

  • Massive loops - iterating over every object in the area, or potentially every object in the module is now possible. This takes a lot of time in larger modules - if you started small with only a few dozen areas and expanded this may become a bigger problem as a module grows. To fix this consider rewriting the loops to instead be done once and store the object you are searching for as a local object.
  • Large amounts of DelayCommand usage - If you're firing this off a lot reconsider the usage scenarios. There is a right and wrong way to use DelayCommand and the wrong way is to simulate a Heartbeat event. If you are checking when things are complete after longer periods of time also consider using some timer variant (eg; setting the future time such a thing expires/needs to be sorted) rather than many constant long DelayCommand calls.
  • Heavy Heartbeat usage - Heartbeat scripts can be very efficient such as those which check if something needs to be done using local variables, then does a larger amount of work off that. However having a heartbeat script that does a lot of complex calls every 6 seconds regardless of the actions taken should be reconsidered. Some alternatives:
    • If you are testing distances to a PC (eg; to call a greeting) a well configured Trigger (if the NPC doesn't move) or EffectAreaOfEffect (for ease of use/if the creature does move) can run an OnEnter script instead.
    • If you are gathering information to do something, such as searching for all objects of a certain tag, you could do this only every X heartbeats with a quick incremental SetLocalInt counter. This spreads out the load if the heartbeat call is otherwise heavy.
    • You can also consider using PC-specific OnHeartbeat event scripts with SetEventScript which can be more efficient since it linearly scales with each PC and the heartbeat scripts are timed to not fire exactly at the same time.

Some scripts are inherently slow even with very little in them - some function calls are heavy on the engine. You can sometimes be a little more efficient for instance using CopyObject instead of CreateObject in some cases, but make sure to test any efficiency savings before and after. Sometimes the source of the issue can be stemmed - for instance player persistent world storage could be more limited so players don't store everything, by having a script limit the amount of items in their storage.

Building Area Considerations

The larger the area, the higher the cost in lag both from a server and player perspective. The game does deal better than it did in 2002 large 32x32 areas, but smaller ones tend to help keep what is active at a minimum and assist with pathfinding. The number of placeables (static or not) and creatures in the areas of course also affect performance both for the server (pathfinding meshes are altered by placeables, making pathfinding take longer) and client (loading more models of course uses up more graphical resources).

Pathfinding

A quick note that pathfinding operations are probably the biggest lag causer. There are two main causes:

  • NPCs walking waypoints: In the case of 2 NPCs trying to walk past each other or on complicated placeables/terrain with waypoints. The pathfinder has been improved several times but if you see a "stuck" NPC who should be walking it might be worth investigating why; one reason may be the expensive pathfinding is failing constantly. Add more waypoints and a better path which doesn't intersect other NPCs or common PC paths.
  • Bad terrain pathfinding - Many user made tilesets contain some bad pathfinding walkmeshes. They are either too complicated or don't match up tile-to-tile making pathfinding break. These are harder to find but will regularly affect PCs so they may be able to assist. Having too many placeables in an area can affect this as well.

Creature Spawns

Spawning a lot of creatures and leaving them spawned can add to lag if lots of areas have this occur. Consider having staged spawns as players move through a level, or smaller combat areas, if it becomes a major issue - however NWN should be able to cope with dozens of creatures pre-spawned. If dozens of them get into combat it may slow things down a bit more since more combat AI scripts and abilities fire off, however on modern CPUs this shouldn't be terrible.

Having a lot of creatures active is not necessarily a problem, if they're not doing much generally. Cleaning up these leftover spawns regularly (eg; every 5 minutes or so) would help. You can also in NWN:EE instance areas with CreateArea or CopyArea - destroying them destroys the creatures and all other objects within - however this can be quite a lot of work to get perfect.

It'd be recommended in areas where you want creature vs creature battles to take place to have the creatures turn off (SetCommandable) or despawn while players are not in the area, since constant combat tends to trigger a lot of scripts, engine calls and other things.

For areas with no battling at all - such as protected cities where every NPC is plot - you could remove most scripts or even all scripts from a creature if they are simply standing there. No scripts means nothing is run when the event that would trigger them happens. An empty OnConversation script still fires the default one so a NPC will begin it's dialogue like usual.

Model Quality

There are a number of older NWN models on the vault and from other sources that are inefficient or badly made. These can cause graphical lag more than any kind of server lag (servers do not load the models entirely). Consider replacing or fixing these models if your FPS drops significantly when loaded.

Regular Cleanup or Reboots

Reboots of a server is generally a last resort, but regular cleanup of things can help alleviate lag. NWN has of course got some potential memory leaks, but these are lessened by cleanup of unused or useless objects left around. For instance:

  • Clean up dropped items in areas. A local variable set the first time an item is detected on the item itself, and a second to remove it, in a cleanup routine that only runs every 30 minutes or so helps.
  • Clean up merchant stores. The objects can be either recreated when safe (ie shop isn't open by anyone because the area is empty, in the OnExit script). You can remove the old one with DestroyObject and use CreateObject with the stores resref, or you could manually DestroyObject every item in the store which has not got an "Infinite" flag - usually denoting store stock.
  • As noted above cleaning up creature spawns really helps if otherwise left unchecked.

Other Ways to Combat Lag

Some old school ways:

  • Limit the server population - obviously a major extreme but no players means no lag.
  • Limit PC party size - the amount of PCs going into an area of course, causes a lot more spells, abilities, fighting, and whatnot. Traffic also needs to be sent to each PC for what they see / what happens, which is of course increased slightly in bigger parties. However in the modern networking age this shouldn't be a major issue.

Some more modern ways:

  • Profile more with NWNX - they include a way to check what other things than scripts are making the game run slower. It could be the pathfinding is going insane for one single NPC, or it could be some rare occurrence of issues unrelated to scripts.
  • Instance or shard servers - advanced users only! ActivatePortal can go to another server, which if a shared Server Vault exists can "port" the player over. The largest of persistent worlds need to do this to combat lag, even if they upped the player counts significantly.

Conclusion

Much of the above, especially the lower numbers in the lists, comes down to judgment calls on the part of the builder. Think of them as general guidelines, which you can stray from at a cost. Whether the cost is worth the benefit is up to you. But the higher up on the above list they are, the greater the cost associated with them, generally speaking.


author: FunkySwerve, Acaos, editors: Mistress, Kolyana, contributor: Kookoo, Phann