Functions to sieve blocks of sigma function. More...

Functions
sieve_config_t *	moews_init_sieve (size_t bound, size_t seg_len)
	creates a sieve_config_t which is shared by a pool of workers More...

void	moews_destroy_sieve (sieve_config_t *cfg)
	free's memory associated with sieve_config_t More...

sieve_worker_t *	moews_init_worker (const sieve_config_t *cfg)
	creates a worker object which can be used to run the sieve More...

void	destroy_worker (sieve_worker_t *worker)
	free's memory associated with sieve_worker_t More...

void	moews_sieve_odd_standard (sieve_worker_t *worker, const uint64_t seg_start)
	runs sieve for odd sigma(m) More...

void	moews_sieve_odd_squared (sieve_worker_t *worker, const uint64_t seg_start)
	takes odd m in range and runs sieve for sigma(m * m) More...

uint64_t	moews_lookup_sigma_m (const sieve_worker_t *worker, uint64_t m)
	lookup sigma(m) from m in a sieved block More...

uint64_t	moews_lookup_sigma_m_squared (const sieve_worker_t *worker, uint64_t m)
	lookup sigma(m * m) from m in a sieved block More...

bool	moews_check_prime (const sieve_worker_t *worker, uint64_t m)
	checks if odd numbers in range (seg_start, seg_start + seg_len) is prime More...

size_t	moews_estimate_heap_usage (size_t bound, size_t seg_len, size_t num_workers)
	estimates how much heap the sieving buffers will require More...

Detailed Description

Functions to sieve blocks of sigma function.

Author: Anton Mosunov and Gavin Guinn (gavin.nosp@m.guin.nosp@m.n1@gm.nosp@m.ail..nosp@m.com)

Date: Originally Apr 18, 2014, modified 2021-12-27

Copyright: Public Domain (Please credit me; if you find this code useful I would love to hear about your work!)

PERFORMANCE SUMMARY

The parameter seg_len has will increase sieve runtime by upto a order-of-magnitude if set to low/high. If seg_len is set to low the sieve will execute far more instructions than needed to complete the same outcome, as it's performance is improves with larger values of seg_len. I believe^ the amount of instructions increases exponentially as seg_len shrinks, judging by the cachegrind instruction lookup counts. However if seg_len is set to high the sieve will start thrashing at the cache missing reads/writes in the intermediate sieving steps thus degrading performance. I believe^^ this effect is more pronounced at bounds lower than ~10^9 as seg_len's that result in a reasonable amount of operations also happen to require sieving buffers that fit (very approximately) into my 32mb LL cache.

TODO

test ^ and ^^

PERFORMANCE

The performance of this sieve is highly variable on the value of seg_len, unfortunately not in a clearly tractable way. It is clear that if seg_len is set to low (whatever that means) the sieve will do a lot of additional operations for the same result. This selected cachegrind/time output from 2 pom_yang run is illustrative:

./cli --bound=100000000 --seg_len=400000 --num_locks=1000000 --preimage_count_bits=1 --num_thread=12
 ==*== I   refs:      19,091,262,194
 ==*==
 ==*== D   refs:       26,107,875,360  (25,263,200,724 rd   + 844,674,636 wr)
 ==*== D1  misses:        268,703,487  (   255,239,558 rd   +  13,463,929 wr)
 ==*== LLd misses:            406,127  (       274,082 rd   +     132,045 wr)
 ==*== D1  miss rate:             1.0% (           1.0%     +         1.6%  )
6.28s user 0.07s system 901% cpu 0.704 total
 
./cli --bound=100000000 --seg_len=12500 --num_locks=1000000 --preimage_count_bits=1 --num_thread=12
 ==*== I   refs:      190,995,976,640
 ==*==
 ==*== D   refs:       4,448,783,095  (3,658,655,908 rd   + 790,127,187 wr)
 ==*== D1  misses:       304,675,262  (  291,274,461 rd   +  13,400,801 wr)
 ==*== LLd misses:         8,850,772  (    5,407,998 rd   +   3,442,774 wr)
 ==*== D1  miss rate:            6.8% (          8.0%     +         1.7%  )
28.27s user 0.04s system 1052% cpu 2.689 total

Note that the seg_len=12500 is running an order of magnitude more instructions and far fewer references on the data cache, these references are relatively inaccurate.

The seg_len value can also degrade performace if it is set to high:

./cli --bound=100000000 --seg_len=4000000 --num_locks=$((10**6)) --preimage_count_bits=1 --num_thread=12
 ==*== I   refs:      13,701,814,468
 ==*==
 ==*== D   refs:       3,755,366,091  (2,955,265,779 rd   + 800,100,312 wr)
 ==*== D1  misses:       312,794,587  (  298,947,273 rd   +  13,847,314 wr)
 ==*== LLd misses:         6,388,217  (    2,678,555 rd   +   3,709,662 wr)
 ==*== D1  miss rate:            8.3% (         10.1%     +         1.7%  )
12.94s user 0.36s system 744% cpu 1.786 total

Comparing to the seg_len=400000 run we can observe that slightly less instructions are being executed but we are still in the same order of magnitude. More interestingly we can notice this run made an order of magnitude fewer references to the data cache and.. those references were much less accurate, likely accounting for the performace degradation. The vast majority of these misses originate from interactions with the sigma and numbers buffers.

SEE

Files in the cg directory for more details.

ALGORITHM

This is a rewrite of Anton Mosunov's implementation of the [Moews and Moews] sieving algorithm, prepared to utilized in the tabulation of non-aliquots, sect. 3 of [Chum et al.]. This implementation includes undocumented optimization on the original method of [Moews and Moews] that are essential for performance, I can't make any sense of why it works however. The _meows_naive_sieve implements the algorithm as described in the paper, it is much slower.

CITATIONS

-> [Moews and Moews] Moews, D. and Moews, P. C. (1991). A search for aliquot cycles below 10 10. Mathematics of Computation, 57(196):849.

-> [Chum et al.] Chum, K., Guy, R. K., Jacobson, J. M. J., and Mosunov, A. S. (2018). Numerical and statistical analysis of aliquot sequences. Experimental Mathematics, 29(4):414–425.

Function Documentation

◆ destroy_worker()

void destroy_worker ( sieve_worker_t * worker )

free's memory associated with sieve_worker_t

Parameters

worker to be free'd

◆ moews_check_prime()

bool moews_check_prime	(	const sieve_worker_t *	worker,
		uint64_t	m
	)

checks if odd numbers in range (seg_start, seg_start + seg_len) is prime

Parameters

worker	that has previously run moews_sieve_odd_standard
m	check if this odd number in range (seg_start, seg_start + seg_len) is prime

Returns: bool is prime

◆ moews_destroy_sieve()

void moews_destroy_sieve ( sieve_config_t * cfg )

free's memory associated with sieve_config_t

Parameters

cfg	to be free'd

◆ moews_estimate_heap_usage()

size_t moews_estimate_heap_usage	(	size_t	bound,
		size_t	seg_len,
		size_t	num_workers
	)

estimates how much heap the sieving buffers will require

Parameters

bound	sieve all odd sigma upto to bound
seg_len	size of range to sieve in a step
num_workers	number of threads working

Returns: size_t approx bytes to be consumed

◆ moews_init_sieve()

sieve_config_t* moews_init_sieve	(	size_t	bound,
		size_t	seg_len
	)

creates a sieve_config_t which is shared by a pool of workers

Parameters

bound	sieving all odd sigma upto to bound
seg_len	size of range to sieve in a step

Returns: sieve_config_t* config struct which is free'd by calling moews_destroy_sieve

◆ moews_init_worker()

sieve_worker_t* moews_init_worker ( const sieve_config_t * cfg )

creates a worker object which can be used to run the sieve

Parameters

cfg	initalized sieve configuration struct

Returns: sieve_worker_t*

◆ moews_lookup_sigma_m()

uint64_t moews_lookup_sigma_m	(	const sieve_worker_t *	worker,
		uint64_t	m
	)

lookup sigma(m) from m in a sieved block

Parameters

worker	see definition, has run moews_sieve_odd_standard last
m	number to lookup

Returns: uint64_t sigma(m)

◆ moews_lookup_sigma_m_squared()

uint64_t moews_lookup_sigma_m_squared	(	const sieve_worker_t *	worker,
		uint64_t	m
	)

lookup sigma(m * m) from m in a sieved block

Parameters

worker	see definition, has run moews_sieve_odd_squared last
m	number to lookup

Returns: uint64_t sigma(m * m)

◆ moews_sieve_odd_squared()

void moews_sieve_odd_squared	(	sieve_worker_t *	worker,
		uint64_t	seg_start
	)

takes odd m in range and runs sieve for sigma(m * m)

Parameters

worker	see struct definiton
seg_start	start of range to sieve

◆ moews_sieve_odd_standard()

void moews_sieve_odd_standard	(	sieve_worker_t *	worker,
		uint64_t	seg_start
	)

runs sieve for odd sigma(m)

Parameters

worker	see struct definiton
seg_start	start of range to sieve