Skip to content
Open
54 changes: 24 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,47 @@
![pytest](https://github.com/N3PDF/pycompressor/workflows/pytest/badge.svg)
[![documentation](https://github.com/N3PDF/pycompressor/workflows/docs/badge.svg)](https://n3pdf.github.io/pycompressor/)

### pycompressor
## pycompressor

Fast and efficient python implementation of PDF set **compressor** (https://arxiv.org/abs/1504.06469).
Fast and efficient python implementation of PDF **compression** (https://arxiv.org/abs/1504.06469).

#### New features

Additional new features have been added to the following python package. The two main features are:
- **Covariance Matrix Adaptation-Evlotion strategy (CMA-ES):** in addition to the Genetic
Algorithm (GA), there is now the possibility to choose as a minimizer the CMA. The choice
of minimizer can be defined in the `runcard.yml` file.
- **Generative Adversarial Strategy (GANs):** this is a standalone python [package](https://github.com/N3PDF/ganpdfs/tree/master)
that can enhance the statistics of the prior PDF replicas before compression by generating
synthetic replicas. For more details, refer to the [documentation](https://n3pdf.github.io/ganpdfs/)
(still has to be done). In a similar way, in order to trigger the enhancement, one just has to set
the value of `enhance` in the runcard to be `True`. Setting this value to `False` will just run the
standard compression. The GANs also requires extra-parameters (as shown in the example
[runcard.yml](https://github.com/N3PDF/pycompressor/blob/master/runcard.yml)) that defines
the structure of the networks.

#### Installation
### How to install

To install `pyCompressor`, just type:
```bash
python setup.py install
```
or if you are a developer:
```bash
python setup.py develop
python setup.py install # or python setup.py develop (if you want development mode)
```

#### How to use
### How to use

#### Standard compression

The input parameters that define the compression is contained in a YAML file. To run
the `pycompressor` code, just type the following:
The input parameters that define the compression is contained in a YAML file. To run the standard compression,
Comment thread
Radonirinaunimi marked this conversation as resolved.
Outdated
use the reference [runcard](https://github.com/N3PDF/pycompressor/blob/master/runcards/runcard.yml) as it is by
just replacing the entry of the `pdf` key with the name of the PDF set, then run the following:
Comment thread
Radonirinaunimi marked this conversation as resolved.
Outdated
```bash
pycomp runcards/runcard.yml [--threads NUMB_THREADS]
```
A detailed instruction on how to set the different parameters in the runcard can be found here.

#### Generating compressed PDF set & post-analysis
#### Using GAN and/or Compressing from an enhanced set

Although it is advised to run the [ganpdfs](https://github.com/N3PDF/ganpdfs) independently, it is possible
to generate enhanced PDF replicas within the `pycompressor`. To do so, just set the entry `enhance` in the
runcard to `True` and specify the total number of replicas (prior+synthetics).

Finally, in order to perform a compression with an enhanced set, set the entry `existing_enhanced` to `True`.

A detailed instruction on how to set the different parameters in the runcard can be found
[here](https://n3pdf.github.io/pycompressor/howto/howto.html).

### Generating compressed PDF set & post-analysis

The code will create a folder named after the prior PDF sets. To generate the
compressed PDF grid, run the following command:
```bash
get-grid -i <PDF_NAME>/compressed_<PDF_NAME>_<NB_COMPRESSED>_output.dat
```
Note that if the compression is done from an enhanced set, the output folder will be append by `_enhanced`.
Note that if the compression is done from an enhanced set, the output folder will be appended by `_enhanced`.

Finally, in order to generate ERF plots, enter in the `erfs_output` directory and run the following:
```bash
Expand All @@ -56,7 +50,7 @@ validate --random erf_randomized.dat --reduced erf_reduced.dat
This script can also plot the ERF validation from the old compressor code by adding the flag
`--format ccomp`.

#### Warning
### Warning

This package cannot be installed with python 3.9 yet due to the numba dependency. This will be resolved
soon according to [#6579](https://github.com/numba/numba/pull/6579).
6 changes: 0 additions & 6 deletions runcards/ganpdfs.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,3 @@
#############################################################################################
# Input PDF #
#############################################################################################
pdf: NNPDF40_nnlo_as_0118_1000

#############################################################################################
# PDF Grids: #
# --------- #
Expand Down Expand Up @@ -70,4 +65,3 @@ nd_steps : 4 # Number of steps to train
ng_steps : 3 # Number of steps to train the Generator for one training run
batch_size : 70 # Batch size per epoch in terms of percentage
epochs : 1000 # Number of epochs
pdf: NNPDF40_nnlo_as_0118_1000
216 changes: 115 additions & 101 deletions src/pycompressor/compressing.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,20 +63,6 @@ def check_validity(pdfsetting, compressed, gans, est_dic):
f" {members} members if enhancing is not active.")


@make_argcheck
def check_adiabaticity(pdfsetting, gans, compressed):
""" Check whether we are in an adiabatic optimization and if so if it can be performed """
pdf_name = pdfsetting["pdf"]
if pdfsetting.get("existing_enhanced") and not gans.get("enhanced"):
adiabatic_result = f"{pdf_name}/compress_{pdf_name}_{compressed}_output.dat"
if not pathlib.Path(adiabatic_result).exists():
raise CheckError(
"Adiabatic optimization needs to be ran first with existing_enhanced: False"
f"\nMissing the file: {adiabatic_result}"
)


@check_adiabaticity
@check_validity
def compressing(pdfsetting, compressed, minimizer, est_dic, gans):
"""
Expand All @@ -94,7 +80,7 @@ def compressing(pdfsetting, compressed, minimizer, est_dic, gans):
"""

pdf = str(pdfsetting["pdf"])
enhanced_already_exists = pdfsetting.get("existing_enhanced", False)
enhd_exists = pdfsetting.get("existing_enhanced", False)

if gans["enhance"]:
from pycompressor.postgans import postgans
Expand All @@ -121,95 +107,123 @@ def compressing(pdfsetting, compressed, minimizer, est_dic, gans):
postgans(str(pdf), outfolder, nbgen)

splash()
# Set seed
rndgen = Generator(PCG64(seed=0))

console.print("\n• Load PDF sets & Printing Summary:", style="bold blue")
xgrid = XGrid().build_xgrid()
# Load Prior Sets
prior = PdfSet(pdf, xgrid, Q0, NF).build_pdf()
rndindex = rndgen.choice(prior.shape[0], compressed, replace=False)
# Load Enhanced Sets
if enhanced_already_exists:
try:
postgan = pdf + "_enhanced"
final_result = {"pdfset_name": postgan}
enhanced = PdfSet(postgan, xgrid, Q0, NF).build_pdf()
except RuntimeError as excp:
raise LoadingEnhancedError(f"{excp}")
nb_iter, ref_estimators = 100000, None
init_index = np.array(extract_index(pdf, compressed))
else:
final_result = {"pdfset_name": pdf}
nb_iter, ref_estimators = 15000, None
init_index, enhanced = rndindex, prior

# Create output folder
outrslt = postgan if enhanced_already_exists else pdf
folder = pathlib.Path().absolute() / outrslt
folder.mkdir(exist_ok=True)
# Create output folder for ERF stats
out_folder = pathlib.Path().absolute() / "erfs_output"
out_folder.mkdir(exist_ok=True)

# Output Summary
table = Table(show_header=True, header_style="bold magenta")
table.add_column("Parameters", justify="left", width=24)
table.add_column("Description", justify="left", width=50)
table.add_row("PDF set name", f"{pdf}")
table.add_row("Size of Prior", f"{prior.shape[0] - 1} replicas")
if enhanced_already_exists:
table.add_row("Size of enhanced", f"{enhanced.shape[0] - 1} replicas")
table.add_row("Size of compression", f"{compressed} replicas")
table.add_row("Input energy Q0", f"{Q0} GeV")
table.add_row(
"x-grid size",
f"{xgrid.shape[0]} points, x=({xgrid[0]:.4e}, {xgrid[-1]:.4e})"
)
table.add_row("Minimizer", f"{minimizer}")
console.print(table)

# Init. Compressor class
comp = Compress(
prior,
enhanced,
est_dic,
compressed,
init_index,
ref_estimators,
out_folder,
rndgen
)
# Start compression depending on the Evolution Strategy
erf_list = []
console.print("\n• Compressing MC PDF replicas:", style="bold blue")
if minimizer == "genetic":
# Run compressor using GA
with trange(nb_iter) as iter_range:
for _ in iter_range:
iter_range.set_description("Compression")
erf, index = comp.genetic_algorithm(nb_mut=5)
erf_list.append(erf)
iter_range.set_postfix(ERF=erf)
elif minimizer == "cma":
# Run compressor using CMA
erf, index = comp.cma_algorithm(std_dev=0.8)
else:
raise ValueError(f"{minimizer} is not a valid minimizer.")

# Prepare output file
final_result["ERFs"] = erf_list
final_result["index"] = index.tolist()
outfile = open(f"{outrslt}/compress_{pdf}_{compressed}_output.dat", "w")
outfile.write(json.dumps(final_result, indent=2))
outfile.close()
# Fetching ERF and construct reduced PDF grid
console.print(f"\n• Final ERF: [bold red]{erf}.", style="bold red")

# Compute final ERFs for the final choosen replicas
final_err_func = comp.final_erfs(index)
serfile = open(f"{out_folder}/erf_reduced.dat", "a+")
serfile.write(f"{compressed}:")
serfile.write(json.dumps(final_err_func))
serfile.write("\n")
serfile.close()

outname = [pdf]
final_result = [{"pdfset_name": pdf}]
nb_iter, ref_estimators = [15000], [None]
Comment thread
Radonirinaunimi marked this conversation as resolved.
Outdated
init_index, enhanced = [rndindex], [prior]
Comment thread
Radonirinaunimi marked this conversation as resolved.
Outdated

# Methodological iterations
mtd_iteration = 2 if enhd_exists else 1

for cmtype in range(mtd_iteration):
# necessary to get the same normalization
rndgen = Generator(PCG64(seed=0))
_ = rndgen.choice(prior.shape[0], compressed, replace=False)
# reference log
if cmtype==0:
console.print(
"Standard compression using Input set",
style="bold green underline"
)
elif cmtype==1:
console.print(
"Adiabatic compression using Enhanced set",
style="bold green underline"
)

# Create output folder
outrslt = outname[cmtype]
folder = pathlib.Path().absolute() / outrslt
folder.mkdir(exist_ok=True)
# Create output folder for ERF stats
out_folder = pathlib.Path().absolute() / "erfs_output"
out_folder.mkdir(exist_ok=True)

# Output Summary
console.print("\n• Compression Summary:", style="bold blue")
table = Table(show_header=True, header_style="bold magenta")
table.add_column("Parameters", justify="left", width=24)
table.add_column("Description", justify="left", width=50)
table.add_row("PDF set name", f"{pdf}")
table.add_row("Size of Prior", f"{prior.shape[0] - 1} replicas")
if cmtype!=0 and enhd_exists:
table.add_row(
"Size of enhanced",
f"{enhanced[1].shape[0] - 1} replicas"
)
table.add_row("Size of compression", f"{compressed} replicas")
table.add_row("Input energy Q0", f"{Q0} GeV")
table.add_row(
"x-grid size",
f"{xgrid.shape[0]} points, x=({xgrid[0]:.4e}, {xgrid[-1]:.4e})"
)
table.add_row("Minimizer", f"{minimizer}")
console.print(table)

# Init. Compressor class
comp = Compress(
prior,
enhanced[cmtype],
est_dic,
compressed,
init_index[cmtype],
ref_estimators[cmtype],
out_folder,
rndgen
)
# Start compression depending on the Evolution Strategy
erf_list = []
console.print("\n• Compressing MC PDF replicas:", style="bold blue")
if minimizer == "genetic":
# Run compressor using GA
with trange(nb_iter[cmtype]) as iter_range:
for _ in iter_range:
iter_range.set_description("Compression")
erf, index = comp.genetic_algorithm(nb_mut=5)
erf_list.append(erf)
iter_range.set_postfix(ERF=erf)
elif minimizer == "cma":
# Run compressor using CMA
erf, index = comp.cma_algorithm(std_dev=0.8)
else:
raise ValueError(f"{minimizer} is not a valid minimizer.")

# Prepare output file
final_result[cmtype]["ERFs"] = erf_list
final_result[cmtype]["index"] = index.tolist()
outfile = open(f"{outrslt}/compress_{pdf}_{compressed}_output.dat", "w")
outfile.write(json.dumps(final_result[cmtype], indent=2))
outfile.close()
# Fetching ERF and construct reduced PDF grid
console.print(f"\n• Final ERF: {erf}.", style="bold blue")

if (cmtype!=0 and enhd_exists) or (cmtype==0 and not enhd_exists):
# Compute final ERFs for the final choosen replicas
final_err_func = comp.final_erfs(index)
serfile = open(f"{out_folder}/erf_reduced.dat", "a+")
serfile.write(f"{compressed}:")
serfile.write(json.dumps(final_err_func))
serfile.write("\n")
serfile.close()

# Load Enhanced Sets
if cmtype==0 and enhd_exists:
try:
postgan = pdf + "_enhanced"
outname.append(postgan)
final_result.append({"pdfset_name": postgan})
enhncd = PdfSet(postgan, xgrid, Q0, NF).build_pdf()
enhanced.append(enhncd)
except RuntimeError as excp:
raise LoadingEnhancedError(f"{excp}")
nb_iter.append(100000)
ref_estimators.append(None)
pre_index = np.array(extract_index(pdf, compressed))
init_index.append(pre_index)