vqa_benchmarking_backend.metrics.bias

Module Contents

Functions

inputs_for_question_bias_featurespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, min_img_feat_val: torch.FloatTensor, max_img_feat_val: torch.FloatTensor, min_img_feats: int = 10, max_img_feats: int = 100, trials: int = 15) → List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Creates inputs for measuring bias towards questions by creating random image features.

inputs_for_question_bias_imagespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, trials: int = 15) → List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Creates inputs for measuring bias towards questions by replacing the current sample’s image with images drawn randomly from the dataset.

inputs_for_image_bias_featurespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, min_question_feat_val: torch.FloatTensor, max_question_feat_val: torch.FloatTensor, min_tokens: int, max_tokens: int, trials: int = 15) → List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Creates inputs for measuring bias towards images by creating random question features.

_extract_subjects_and_objects_from_text(text: str) → Tuple[Set[str], Set[str]]

_questions_different(q_a: str, q_b: str) → bool

Simple comparison for the semantic equality of 2 questions.

inputs_for_image_bias_wordspace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, trials: int = 15) → List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Creates inputs for measuring bias towards images by replacing the current sample’s question with questions drawn randomly from the dataset.

eval_bias(dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, original_class_prediction: str, predictions: torch.FloatTensor) → Tuple[Dict[int, float], float]

Evalutate predictions generated with inputs_for_question_bias_featurespace,

Attributes

nlp

vqa_benchmarking_backend.metrics.bias.nlp
vqa_benchmarking_backend.metrics.bias.inputs_for_question_bias_featurespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, min_img_feat_val: torch.FloatTensor, max_img_feat_val: torch.FloatTensor, min_img_feats: int = 10, max_img_feats: int = 100, trials: int = 15) List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Creates inputs for measuring bias towards questions by creating random image features.

Args:

min_img_feat_val (img_feat_dim): vector containing minimum value per feature dimension max_img_feat_val (img_feat_dim): vector containing maximum value per feature dimension

Returns:
trials x [min_img_feats..max_img_feats] x img_feat_dimTensor of randomly generated feature inputs in range [min_img_feat_val, max_img_feat_val].

Number of drawn features (dim=1) is randomly drawn from [min_img_feats, max_img_feats]

vqa_benchmarking_backend.metrics.bias.inputs_for_question_bias_imagespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, trials: int = 15) List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Creates inputs for measuring bias towards questions by replacing the current sample’s image with images drawn randomly from the dataset. Also, checks that the labels of the current sample and the drawn samples don’t overlap.

vqa_benchmarking_backend.metrics.bias.inputs_for_image_bias_featurespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, min_question_feat_val: torch.FloatTensor, max_question_feat_val: torch.FloatTensor, min_tokens: int, max_tokens: int, trials: int = 15) List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Creates inputs for measuring bias towards images by creating random question features.

vqa_benchmarking_backend.metrics.bias._extract_subjects_and_objects_from_text(text: str) Tuple[Set[str], Set[str]]
vqa_benchmarking_backend.metrics.bias._questions_different(q_a: str, q_b: str) bool

Simple comparison for the semantic equality of 2 questions. Tests, if the subjects and objects in the question are the same.

vqa_benchmarking_backend.metrics.bias.inputs_for_image_bias_wordspace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, trials: int = 15) List[vqa_benchmarking_backend.datasets.dataset.DataSample]

Creates inputs for measuring bias towards images by replacing the current sample’s question with questions drawn randomly from the dataset. Also, checks that the questions don’t overlap.

vqa_benchmarking_backend.metrics.bias.eval_bias(dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, original_class_prediction: str, predictions: torch.FloatTensor) Tuple[Dict[int, float], float]
Evalutate predictions generated with inputs_for_question_bias_featurespace,

inputs_for_question_bias_imagespace, inputs_for_image_bias_featurespace or inputs_for_image_bias_wordspace.

Args:

predictions (trials x answer space): Model predictions (probabilities)

Returns:
  • Mapping from best prediction class -> fraction of total predictions

  • normalized bias score (float), where 0 means no bias, and 1 means 100% bias