vqa_benchmarking_backend.metrics.robustness

Module Contents

Functions

inputs_for_image_robustness_imagespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, trials: int = 3, gaussian_mean: float = 0.0, gaussian_variance: float = 0.025, salt_pepper_amount: float = 0.1, salt_vs_pepper_ratio: float = 0.5, speckle_mean: float = 0.0, speckle_variance: float = 0.05, noise_types=['gaussian', 'poisson', 's&p', 'speckle'], seed: int = 12345) → List[vqa_benchmarking_backend.datasets.dataset.DataSample]

NOTE: creates len(noise_types) * trials outputs, because 1 output per noise type ()

inputs_for_image_robustness_featurespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, std: float = 0.01, trials: int = 15) → List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Additive gaussian noise for input features

inputs_for_question_robustness_wordspace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, trials: int = 15, noise_types=['typo', 'insert', 'permute', 'synonyms', 'delete'], max_edits_per_sample: int = 2) → List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Ideas:

inputs_for_question_robustness_featurespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, adapter: vqa_benchmarking_backend.datasets.dataset.DatasetModelAdapter, std: float = 0.01, trials: int = 15) → List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Additive gaussian noise for input features

eval_robustness(dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, original_class_prediction: str, predictions: torch.FloatTensor) → Tuple[Dict[int, float], float]

Evalutate predictions generated with inputs_for_question_bias_featurespace,

vqa_benchmarking_backend.metrics.robustness.inputs_for_image_robustness_imagespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, trials: int = 3, gaussian_mean: float = 0.0, gaussian_variance: float = 0.025, salt_pepper_amount: float = 0.1, salt_vs_pepper_ratio: float = 0.5, speckle_mean: float = 0.0, speckle_variance: float = 0.05, noise_types=['gaussian', 'poisson', 's&p', 'speckle'], seed: int = 12345) List[vqa_benchmarking_backend.datasets.dataset.DataSample]

NOTE: creates len(noise_types) * trials outputs, because 1 output per noise type ()

https://scikit-image.org/docs/stable/api/skimage.util.html#random-noise

Args:

noise_types: sub-list of [‘gaussian’, ‘localvar’, ‘poisson’, ‘salt’, ‘pepper’, ‘s&p’, ‘speckle’]

Returns:

List[DataSample] of length len(noise_types)*trials

vqa_benchmarking_backend.metrics.robustness.inputs_for_image_robustness_featurespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, std: float = 0.01, trials: int = 15) List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Additive gaussian noise for input features

vqa_benchmarking_backend.metrics.robustness.inputs_for_question_robustness_wordspace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, trials: int = 15, noise_types=['typo', 'insert', 'permute', 'synonyms', 'delete'], max_edits_per_sample: int = 2) List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Ideas: * typos (might not be in vocab… - should be doable with BERT and fastText.subwords though) * change order of words (does it have to be grammatically safe?) * insert unneccessary words (when is that safe?) * replace with synonyms (where to get synonym map?) * delete word (when is that safe? e.g. don’t delete ‘color’ from ‘What color is…?’)

maybe noise is more meaningful in feature space than word space

vqa_benchmarking_backend.metrics.robustness.inputs_for_question_robustness_featurespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, adapter: vqa_benchmarking_backend.datasets.dataset.DatasetModelAdapter, std: float = 0.01, trials: int = 15) List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Additive gaussian noise for input features

vqa_benchmarking_backend.metrics.robustness.eval_robustness(dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, original_class_prediction: str, predictions: torch.FloatTensor) Tuple[Dict[int, float], float]
Evalutate predictions generated with inputs_for_question_bias_featurespace,

inputs_for_question_bias_imagespace, inputs_for_image_bias_featurespace or inputs_for_image_bias_wordspace.

Args:

predictions (trials): Model predictions (probabilities)

Returns:
  • Mapping from best prediction class -> fraction of total predictions

  • normalized robustness score (float), where 0 means not robust, and 1 means 100% robust