vqa_benchmarking_backend.metrics.bias
¶
Module Contents¶
Functions¶
|
Creates inputs for measuring bias towards questions by creating random image features. |
|
Creates inputs for measuring bias towards questions by replacing the current sample’s image with images drawn randomly from the dataset. |
|
Creates inputs for measuring bias towards images by creating random question features. |
|
|
|
Simple comparison for the semantic equality of 2 questions. |
|
Creates inputs for measuring bias towards images by replacing the current sample’s question with questions drawn randomly from the dataset. |
|
Evalutate predictions generated with inputs_for_question_bias_featurespace, |
Attributes¶
- vqa_benchmarking_backend.metrics.bias.nlp¶
- vqa_benchmarking_backend.metrics.bias.inputs_for_question_bias_featurespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, min_img_feat_val: torch.FloatTensor, max_img_feat_val: torch.FloatTensor, min_img_feats: int = 10, max_img_feats: int = 100, trials: int = 15) List[vqa_benchmarking_backend.datasets.dataset.DataSample] ¶
Creates inputs for measuring bias towards questions by creating random image features.
- Args:
min_img_feat_val (img_feat_dim): vector containing minimum value per feature dimension max_img_feat_val (img_feat_dim): vector containing maximum value per feature dimension
- Returns:
- trials x [min_img_feats..max_img_feats] x img_feat_dimTensor of randomly generated feature inputs in range [min_img_feat_val, max_img_feat_val].
Number of drawn features (dim=1) is randomly drawn from [min_img_feats, max_img_feats]
- vqa_benchmarking_backend.metrics.bias.inputs_for_question_bias_imagespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, trials: int = 15) List[vqa_benchmarking_backend.datasets.dataset.DataSample] ¶
Creates inputs for measuring bias towards questions by replacing the current sample’s image with images drawn randomly from the dataset. Also, checks that the labels of the current sample and the drawn samples don’t overlap.
- vqa_benchmarking_backend.metrics.bias.inputs_for_image_bias_featurespace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, min_question_feat_val: torch.FloatTensor, max_question_feat_val: torch.FloatTensor, min_tokens: int, max_tokens: int, trials: int = 15) List[vqa_benchmarking_backend.datasets.dataset.DataSample] ¶
Creates inputs for measuring bias towards images by creating random question features.
- vqa_benchmarking_backend.metrics.bias._extract_subjects_and_objects_from_text(text: str) Tuple[Set[str], Set[str]] ¶
- vqa_benchmarking_backend.metrics.bias._questions_different(q_a: str, q_b: str) bool ¶
Simple comparison for the semantic equality of 2 questions. Tests, if the subjects and objects in the question are the same.
- vqa_benchmarking_backend.metrics.bias.inputs_for_image_bias_wordspace(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample, dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, trials: int = 15) List[vqa_benchmarking_backend.datasets.dataset.DataSample] ¶
Creates inputs for measuring bias towards images by replacing the current sample’s question with questions drawn randomly from the dataset. Also, checks that the questions don’t overlap.
- vqa_benchmarking_backend.metrics.bias.eval_bias(dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, original_class_prediction: str, predictions: torch.FloatTensor) Tuple[Dict[int, float], float] ¶
- Evalutate predictions generated with inputs_for_question_bias_featurespace,
inputs_for_question_bias_imagespace, inputs_for_image_bias_featurespace or inputs_for_image_bias_wordspace.
- Args:
predictions (trials x answer space): Model predictions (probabilities)
- Returns:
Mapping from best prediction class -> fraction of total predictions
normalized bias score (float), where 0 means no bias, and 1 means 100% bias