vqa_benchmarking_backend.metrics.robustness
¶
Module Contents¶
Functions¶
|
NOTE: creates len(noise_types) * trials outputs, because 1 output per noise type () |
|
Additive gaussian noise for input features |
|
Ideas: |
|
Additive gaussian noise for input features |
|
Evalutate predictions generated with inputs_for_question_bias_featurespace, |
- vqa_benchmarking_backend.metrics.robustness.inputs_for_image_robustness_imagespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, trials: int = 3, gaussian_mean: float = 0.0, gaussian_variance: float = 0.025, salt_pepper_amount: float = 0.1, salt_vs_pepper_ratio: float = 0.5, speckle_mean: float = 0.0, speckle_variance: float = 0.05, noise_types=['gaussian', 'poisson', 's&p', 'speckle'], seed: int = 12345) List[vqa_benchmarking_backend.datasets.dataset.DataSample] ¶
NOTE: creates len(noise_types) * trials outputs, because 1 output per noise type ()
https://scikit-image.org/docs/stable/api/skimage.util.html#random-noise
- Args:
noise_types: sub-list of [‘gaussian’, ‘localvar’, ‘poisson’, ‘salt’, ‘pepper’, ‘s&p’, ‘speckle’]
- Returns:
List[DataSample] of length len(noise_types)*trials
- vqa_benchmarking_backend.metrics.robustness.inputs_for_image_robustness_featurespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, std: float = 0.01, trials: int = 15) List[vqa_benchmarking_backend.datasets.dataset.DataSample] ¶
Additive gaussian noise for input features
- vqa_benchmarking_backend.metrics.robustness.inputs_for_question_robustness_wordspace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, trials: int = 15, noise_types=['typo', 'insert', 'permute', 'synonyms', 'delete'], max_edits_per_sample: int = 2) List[vqa_benchmarking_backend.datasets.dataset.DataSample] ¶
Ideas: * typos (might not be in vocab… - should be doable with BERT and fastText.subwords though) * change order of words (does it have to be grammatically safe?) * insert unneccessary words (when is that safe?) * replace with synonyms (where to get synonym map?) * delete word (when is that safe? e.g. don’t delete ‘color’ from ‘What color is…?’)
maybe noise is more meaningful in feature space than word space
- vqa_benchmarking_backend.metrics.robustness.inputs_for_question_robustness_featurespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, adapter: vqa_benchmarking_backend.datasets.dataset.DatasetModelAdapter, std: float = 0.01, trials: int = 15) List[vqa_benchmarking_backend.datasets.dataset.DataSample] ¶
Additive gaussian noise for input features
- vqa_benchmarking_backend.metrics.robustness.eval_robustness(dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, original_class_prediction: str, predictions: torch.FloatTensor) Tuple[Dict[int, float], float] ¶
- Evalutate predictions generated with inputs_for_question_bias_featurespace,
inputs_for_question_bias_imagespace, inputs_for_image_bias_featurespace or inputs_for_image_bias_wordspace.
- Args:
predictions (trials): Model predictions (probabilities)
- Returns:
Mapping from best prediction class -> fraction of total predictions
normalized robustness score (float), where 0 means not robust, and 1 means 100% robust