speed of lookup

Fri Jan 08, 2010 1:40 am

Hi George,

I was doing some benchmarking of lookups and came across some interesting numbers...

I have for a long time been using a set of Hash functions based on lists, and was wondering if I should make a switch to using lookup. I did some comparisons, and it seems that lookup is slower, which surprised me a lot. I wonder if this is something that you would have expected.

You can see the example here along with supporting code:

http://transcendence.pastebin.com/f6df1906

Edit: Betel pointed out that the data is different, and that is right. Lookup uses nested lists for supporting lookups, while the Hash functions use an alternating structure, with every even element being a key and every odd element it's value.

Fri Jan 08, 2010 2:23 am

I'm really glad you're doing benchmarks!

I'm not too surprised that hash tables are faster than lookup. Looking up a value on a hash table should have almost constant speed, while the lookup function is O(n) speed.

As the number of entries in the table grow, the difference in speed will grow. My guess, though, is that for lookups in lists <100, the different between one and the other will be in the noise. Likely less than 1 ms total time (but I'm not sure because I haven't done benchmarks!)

Fri Jan 08, 2010 2:41 am

OK, I ran some new tests, this time with more targeted functions, and bigger sets, and your predictions are right:

http://transcendence.pastebin.com/f2eb2e2f6

(bench 100 100000) -> ~9000
(bench2 100 100000) -> ~16000

(bench 50 100000) -> ~5500
(bench2 50 100000) -> ~10000

(bench 25 100000) -> ~4000
(bench2 25 100000) -> ~7500

(bench 10 100000) -> ~3000
(bench 10 100000) -> ~6000

(bench 5 100000) -> ~2600
(bench2 5 100000) -> ~5500

(bench 2 100000) -> ~2300
(bench2 2 100000) -> ~5300

I even don't seem to be able to reproduce the favorable result I was getting before... wonder what is going on

Edit: the functions generate lists of i elements, and look for the value of the last element

Fri Jan 08, 2010 3:23 am

alterecco wrote:Edit: the functions generate lists of i elements, and look for the value of the last element

One possible issue: The value of i is undefined outside of the for loop (the variable is scoped to the loop).

Not sure if it makes a difference in timing...

Fri Jan 08, 2010 3:33 am

doh, you are right... too many `i's in there

I corrected it and re-ran the benchmarks, but with no noticeable change...

The end result for me is that the Hash functions are fast enough that I will keep using them in the DSF, where their syntax is easier to use then lookups would be (I use them as a data structure), and where I rarely deal with very large lists.

If you do however end up implementing a version similar to the Hash functions in the core, you would hear no complaints from me