.. _Evaluate Metrics: Evaluate Metrics ================ To start the evaluation on a given list of metrics, you need to instantiate a dataset inherting from our ``DiagnosticDataset``. The calculation starts by calling ``calculate_metrics``, and pass the model adapter, dataset, output directory and amount of trials as parameters. The parameter ``trials`` refers to the number of monte carlo trials that are performed and averaged for respective metrics. The following code block contains an example how a script could look like. .. code-block:: python from vqa_benchmarking_backend.datasets.GQADataset import GQADataset # or import your own dataset from vqa_benchmarking_backend.metrics.metrics import calculate_metrics output_dir = '/path/to/my/ouput/dir' # set output directory for results. This should match the directory you are supplying to the webserver in webapp/server.py # directories containing the data qsts_path = 'path/to/GQA/questions.json' img_dir = 'path/to/GQA/images/' # file that contains a dictionary mapping from answer index to answer text: {idx: ans_str} idx2ans = load_idx_mapping() # instantiate dataset using data directories and index/answer mapping dataset = GQADataset(question_file=qsts_path, img_dir= img_dir, img_feat_dir='', idx2ans=idx2ans, name='GQA') # define a list with all metrics the model should be tested on. Remove as needed. metrics = [ 'accuracy', 'question_bias_imagespace', 'image_bias_wordspace', 'image_robustness_imagespace', 'image_robustness_featurespace', 'question_robustness_featurespace', 'sears', 'uncertainty' ] # Run the metrics calculation. Once finished, start the webserver at webapp/server.py and the vue.js app using 'npm start' in webapp/ folder, then inspect the results in your webbrowser. calculate_metrics(adapter=model_adapter, dataset=dataset, output_path=output_dir, metrics=metrics, trials=7)