vqa_benchmarking_backend.metrics.sear

Module Contents

Functions

_apply_SEAR_1(question_postagged: List[Tuple[str, str]])

SEAR 1: WP VBZ -> WP’s

_apply_SEAR_2(question_postagged: List[Tuple[str, str]])

SEAR 2: What NOUN -> Which NOUN

_apply_SEAR_3(question_tokenized: List[str])

SEAR 3: color -> colour

_apply_SEAR_4(question_postagged: List[Tuple[str, str]])

SEAR 4: ADV VBZ -> ADV’s

inputs_for_question_sears(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample) → Tuple[Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None], Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None], Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None], Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None]]

Creates inputs where semantically equivalent changes are applied to the input questions

eval_sears(dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, sear_inputs: Tuple[Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None], Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None], Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None], Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None]], sear_predictions: Tuple[Union[torch.FloatTensor, None]], original_class_prediction: str) → Dict[str, dict]

Evalutate predictions generated with inputs_for_question_sears.

vqa_benchmarking_backend.metrics.sear._apply_SEAR_1(question_postagged: List[Tuple[str, str]])

SEAR 1: WP VBZ -> WP’s

vqa_benchmarking_backend.metrics.sear._apply_SEAR_2(question_postagged: List[Tuple[str, str]])

SEAR 2: What NOUN -> Which NOUN

vqa_benchmarking_backend.metrics.sear._apply_SEAR_3(question_tokenized: List[str])

SEAR 3: color -> colour

vqa_benchmarking_backend.metrics.sear._apply_SEAR_4(question_postagged: List[Tuple[str, str]])

SEAR 4: ADV VBZ -> ADV’s

vqa_benchmarking_backend.metrics.sear.inputs_for_question_sears(current_sample: vqa_benchmarking_backend.datasets.dataset.DataSample) Tuple[Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None], Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None], Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None], Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None]]

Creates inputs where semantically equivalent changes are applied to the input questions

Returns:

A tuple with 4 entries of either type DataSample or with value None. 1st entry corresponds to SEAR 1, 2nd entry to SEAR 2, … . Note: if a value in the tuple at index i is None, that means that SEAR i is not applicable.

vqa_benchmarking_backend.metrics.sear.eval_sears(dataset: vqa_benchmarking_backend.datasets.dataset.DiagnosticDataset, sear_inputs: Tuple[Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None], Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None], Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None], Union[vqa_benchmarking_backend.datasets.dataset.DataSample, None]], sear_predictions: Tuple[Union[torch.FloatTensor, None]], original_class_prediction: str) Dict[str, dict]

Evalutate predictions generated with inputs_for_question_sears.

Args:

sear_inputs: the 4 outputs generated by inputs_for_question_sears predictions List[(1 x answer space)] of length 4: Model predictions for SEAR questions or None if SEAR input was None. (probabilities)

Returns:
dictionary with information per sear, e.g.
sear_4: {

‘predicted_class’: 10, ‘flipped’: False, ‘applied’: True

}