Visual Question Answer

The Problem Statement:

Predict the answer of a open-ended question related to given a image.

VQA Library and Setup:

  1. Torch setup
  2. Keras(Theano as Backend )
  3. Keras(TensorFlow as Backend)

Reference Models:

1. neural-vqa


2.Deeper LSTM+ normalized CNN for Visual Question Answering


3. Hierarchical Question-Image Co-Attention for Visual Question Answering


4.Simple Baseline for Visual Question Answering


5.Visual7W QA Models

link :

6.VQA Demo


7.Deep Learning for Visual Question Answering


Issue List:

  1. Implementation IssueImplementation Issue

List of References:

  • L. Ma, Z. Lu, and H. Li., ‘‘Learning to Answer Questions From Image using Convolutional Neural Network”,CoRR abs/1506.00333, Nov, 2015.
  • H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang and W. Xu.,‘‘Are you talking to a machine? dataset and methods for multilingual image question answering.”,arXiv 1505.05612v3, Nov, 2015.
  • M. Ren, R. Kiros, and R. S. Zemel, ‘‘Exploring models and data for image question answering”,arXiv 1505.02074,2015.
  • M. Malinowski, M. Rohrbach, and M. Fritz.,‘‘Ask your neurons: A neural-based approach to answering questions about images.”,arXiv 1505.01121, Nov, 2015.

Useful Links:

1. Memory Networks for Language Understanding, ICML Tutorial 2016


2. End-To-End Memory Networks for Question Answering


3.Implementing Dynamic memory networks











Deep learning


Image and word attention



Compositional Semantic Parsing on Semi-Structured Tables

Click to access 1508.00305v1.pdf

A Deep Architecture for Semantic Parsing

Click to access 1404.7296v1.pdf

Question Answering over Knowledge Base with Neural Attention Combining Global Knowledge Information

Click to access 1606.00979v1.pdf

Recurrent Neural Network Encoder with Attention for Community Question Answering

Click to access 1603.07044v1.pdf



Hierarchical Attention Networks

Click to access 1606.02393v1.pdf

Diversified Visual Attention Networks for Fine-Grained Object Classification

Click to access 1606.08572v1.pdf



Simple Baseline for Visual Question Answering

Click to access 1512.02167v2.pdf


Towards Transparent AI Systems: Interpreting Visual Question Answering Models

Click to access 13_Goyal_SUNw.pdf

Visual Question Answering Literature Survey



Attention Mechanism


Good one for VQA

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

A Focused Dynamic Attention Model for Visual Question Answering

Visual7W: Grounded Question Answering in Images

Stacked Attention Networks for Image Question Answering

Where To Look: Focus Regions for Visual Question Answering

Revisiting Visual Question Answering Baselines

Simple Baseline for Visual Question Answering

Click to access zhu2016cvpr.pdf

Image Question Answering: A Visual Semantic Embedding Model and a New Dataset

Highway Networks for Visual Question Answering

Neural Self Talk: Image Understanding via Continuous Questioning and Answering


Role of Attention for Visual Question Answering Submitted By: Harsh Agrawal (harsh92)


Nice one

Compositional Memory for Visual Question Answering

Click to access 1412.7755v2.pdf


Good links


The visual question answers loss function

Click to access 1606.03647.pdf

Click to access report.pdf

Click to access 1511.05676v1.pdf


Click to access shuhui.pdf…4666065.4679367.0.4681151.….0…1.1.64.serp..21.12.2779…0j35i39k1j0i67k1j0i22i30k1j0i22i10i30k1j33i21k1j0i7i30k1j0i8i7i30k1j0i8i30k1j30i10k1.7IoURnmZrO4



Getting Started with Word2Vec and GloVe

Dive Into NLTK, Part I: Getting Started with NLTK


Torch7. Hello World, Neural Networks!


Learning Resources for NLP, Sentiment Analysis, and Deep Learning,-Sentiment-Analysis,-and-Deep-Learning




word_center = nn.Identity()()
word_outer = nn.Identity()()

x_center_ = Embedding(vocab_size, 100)(word_center)
x_center = nn.Linear(100, 50)(x_center_)
x_center = nn.Tanh()(x_center)

x_outer_ = Embedding(vocab_size, 100)(word_outer)
x_outer = nn.Linear(100, 50)(x_outer_)
x_outer = nn.Tanh()(x_outer)

x_center_minus = nn.MulConstant(-1)(x_center)

z = nn.CAddTable()({x_outer, x_center_minus})
z = nn.Power(2)(z)
z = nn.Sum(2)(z)

m = nn.gModule({word_center, word_outer}, {z, x_outer_, x_center_})


How A.I. will help kids on the Autism spectrum find employment




How to avoid Over-fitting using Regularization?

Click to access 2010Overfitting_0416.pdf


IMp Overfitting

L2 regularisition




batch size





Click to access DufourNick.pdf

Click to access hyhieu_final.pdf






Movie QA


Jointly Modeling Embedding and Translation to Bridge Video and Language

Click to access 1505.01861.pdf

Sequence to Sequence – Video to Text

Click to access 1505.00487.pdf

Uncovering Temporal Context for Video Question and Answering


Two-Stream Convolutional Networks for Action Recognition in Videos

Beyond Short Snippets: Deep Networks for Video Classification

Learning Common Sense Through Visual Abstraction

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors





Describing Videos by Exploiting Temporal Structure


Translating Videos to Natural Language Using Deep Recurrent Neural Networks



Click to access 1505.00487v3.pdf

Click to access IVU_Convolutional_Networks_and_Video_Representations.pdf






Getting Started with Word2Vec and GloVe


Attention In VQA

Attention Mechanism

Click to access 1511.02793v2.pdf


LSTM hyperparamater

Click to access LiuSingh.pdf







Github VQA link





Click to access Yang.pdf




Good Paper

Click to access cvpr2014-deepvideo-rahuls.pdf







Click to access CS229.pdf





For Video




For DVS:




(gedit:8803): WARNING **: Couldn’t connect to accessibility bus: Failed to connect to socket /tmp/dbus-WjKgPvfxFu: Connection refused
The shell command:

export NO_AT_BRIDGE=1


From your output we see a “defunct”, which means the process has either completed its task or has been corrupted or killed, but its child processes are still running or these parent process is monitoring its child process. To kill this kind of process kill -9 PID don’t work, you can try to kill with this command but it will show this again and again.

Determine which is the parent process of this defunct process and kill it. To know this run the command:

ps -ef | grep defunct


1000 637 27872 0 Oct12 ? 00:00:04 [chrome] <defunct>

1000 1808 1777 0 Oct04 ? 00:00:00 [zeitgeist-datah] <defunct>
Then kill -9 637 27872 then verify the defunct process is gone by ps -ef | grep defunct
ps -ef | grep defunct

ps -xal |grep defunct

ps -u


  1. First find the process id of firefox using the following command in any directory:
    pidof firefox
  2. Kill firefox process using the following command in any directory:
    kill [firefox pid]

The easiest solution for a program that is not responding would be:

killall -9 firefox









Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s