Visual Question Answer

The Problem Statement:

Predict the answer of a open-ended question related to given a image.

VQA Library and Setup:

  1. Torch setup
  2. Keras(Theano as Backend )
  3. Keras(TensorFlow as Backend)

Reference Models:

1. neural-vqa

link: https://github.com/abhshkdz/neural-vqa

2.Deeper LSTM+ normalized CNN for Visual Question Answering

link: https://github.com/VT-vision-lab/VQA_LSTM_CNN

3. Hierarchical Question-Image Co-Attention for Visual Question Answering

link: https://github.com/jiasenlu/HieCoAttenVQA

4.Simple Baseline for Visual Question Answering

Link: https://github.com/metalbubble/VQAbaseline

5.Visual7W QA Models

link :https://github.com/yukezhu/visual7w-qa-models

6.VQA Demo

link: http://iamaaditya.github.io/2016/04/visual_question_answering_demo_notebook

7.Deep Learning for Visual Question Answering

link: https://github.com/avisingh599/visual-qa

Issue List:

  1. Implementation IssueImplementation Issue

List of References:

  • L. Ma, Z. Lu, and H. Li., ‘‘Learning to Answer Questions From Image using Convolutional Neural Network”,CoRR abs/1506.00333, Nov, 2015.
  • H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang and W. Xu.,‘‘Are you talking to a machine? dataset and methods for multilingual image question answering.”,arXiv 1505.05612v3, Nov, 2015.
  • M. Ren, R. Kiros, and R. S. Zemel, ‘‘Exploring models and data for image question answering”,arXiv 1505.02074,2015.
  • M. Malinowski, M. Rohrbach, and M. Fritz.,‘‘Ask your neurons: A neural-based approach to answering questions about images.”,arXiv 1505.01121, Nov, 2015.

Useful Links:

1. Memory Networks for Language Understanding, ICML Tutorial 2016

link: http://www.thespermwhale.com/jaseweston/icml2016/

2. End-To-End Memory Networks for Question Answering

link: https://github.com/vinhkhuc/MemN2N-babi-python

3.Implementing Dynamic memory networks

Link: https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/

 

 

MISC

http://www.arxiv-sanity.com/1606.03556

http://www.arxiv-sanity.com/1601.01705

http://www.arxiv-sanity.com/1606.02393

 

 

Attention

 

 

 

Deep learning

http://colah.github.io/

 

Image and word attention

http://yanran.li/peppypapers/2015/12/11/nips-2015-deep-learning-symposium-part-i.html

 

 

Compositional Semantic Parsing on Semi-Structured Tables

Click to access 1508.00305v1.pdf

A Deep Architecture for Semantic Parsing

Click to access 1404.7296v1.pdf

Question Answering over Knowledge Base with Neural Attention Combining Global Knowledge Information

Click to access 1606.00979v1.pdf

Recurrent Neural Network Encoder with Attention for Community Question Answering

Click to access 1603.07044v1.pdf

 

IMAge

Hierarchical Attention Networks

Click to access 1606.02393v1.pdf

Diversified Visual Attention Networks for Fine-Grained Object Classification

Click to access 1606.08572v1.pdf

 

VQA

Simple Baseline for Visual Question Answering

Click to access 1512.02167v2.pdf

 

Towards Transparent AI Systems: Interpreting Visual Question Answering Models

Click to access 13_Goyal_SUNw.pdf

 

https://computing.ece.vt.edu/~harsh/visualAttention/ProjectWebpage/#approach

http://cjds.github.io/image%20recognition/machine%20learning/2016/05/02/Visual-Question-Generation/

https://www.semanticscholar.org/paper/Character-Level-Question-Answering-with-Attention-Golub-He/47170ca3d7faa8535229e1fa4766fce0ce30cab2

Visual Question Answering Literature Survey

http://iamaaditya.github.io/research/literature/

 

Attention

https://blog.heuritech.com/2016/01/20/attention-mechanism/

 

Good one for VQA

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

A Focused Dynamic Attention Model for Visual Question Answering

Visual7W: Grounded Question Answering in Images

Stacked Attention Networks for Image Question Answering

Where To Look: Focus Regions for Visual Question Answering

Revisiting Visual Question Answering Baselines

Simple Baseline for Visual Question Answering

Click to access zhu2016cvpr.pdf

Image Question Answering: A Visual Semantic Embedding Model and a New Dataset

Highway Networks for Visual Question Answering

Neural Self Talk: Image Understanding via Continuous Questioning and Answering

—————————————————————-

Role of Attention for Visual Question Answering Submitted By: Harsh Agrawal (harsh92)

https://computing.ece.vt.edu/~harsh/visualAttention/ProjectWebpage/#approach

http://cjds.github.io/image%20recognition/machine%20learning/2016/05/02/Visual-Question-Generation/

————————————————————-

Nice one

Compositional Memory for Visual Question Answering

Click to access 1412.7755v2.pdf

——————————————————————

Good links

https://github.com/kjw0612/awesome-deep-vision

———————————————————————–

The visual question answers loss function

Click to access 1606.03647.pdf

Click to access report.pdf

http://people.cs.vt.edu/~bhuang/courses/pgmsp16/projects/mahendru-pgmsp16

Click to access 1511.05676v1.pdf

 

Click to access shuhui.pdf

https://www.google.co.in/search?q=visual+question+answer+loss+function&oq=visual+question+answer+loss+function&gs_l=serp.3…4666065.4679367.0.4681151.42.40.0.0.0.0.521.6024.0j4j16j3j0j1.24.0….0…1.1.64.serp..21.12.2779…0j35i39k1j0i67k1j0i22i30k1j0i22i10i30k1j33i21k1j0i7i30k1j0i8i7i30k1j0i8i30k1j30i10k1.7IoURnmZrO4

https://github.com/kundan2510/vqa_LSTM

—————————————————————-

NLTK

http://textminingonline.com/getting-started-with-word2vec-and-glove

http://textminingonline.com/dive-into-nltk-part-i-getting-started-with-nltk

—————————————————————

Torch7. Hello World, Neural Networks!

http://mdtux89.github.io/2015/12/11/torch-tutorial.html

———————————–

Learning Resources for NLP, Sentiment Analysis, and Deep Learning

https://github.com/Lab41/sunny-side-up/wiki/Learning-Resources-for-NLP,-Sentiment-Analysis,-and-Deep-Learning

———————————————————-

MIsc

https://github.com/vivanov879/word2vec

 

word_center = nn.Identity()()
word_outer = nn.Identity()()

x_center_ = Embedding(vocab_size, 100)(word_center)
x_center = nn.Linear(100, 50)(x_center_)
x_center = nn.Tanh()(x_center)

x_outer_ = Embedding(vocab_size, 100)(word_outer)
x_outer = nn.Linear(100, 50)(x_outer_)
x_outer = nn.Tanh()(x_outer)

x_center_minus = nn.MulConstant(-1)(x_center)

z = nn.CAddTable()({x_outer, x_center_minus})
z = nn.Power(2)(z)
z = nn.Sum(2)(z)

m = nn.gModule({word_center, word_outer}, {z, x_outer_, x_center_})

 

How A.I. will help kids on the Autism spectrum find employment

 

————————————————————–

Overfitting

https://www.quora.com/How-can-I-avoid-overfitting

http://stats.stackexchange.com/questions/9053/how-does-cross-validation-overcome-the-overfitting-problem

https://www.researchgate.net/post/How_to_Avoid_Overfitting

http://www.kdnuggets.com/2015/01/clever-methods-overfitting-avoid.html

How to avoid Over-fitting using Regularization?

Click to access 2010Overfitting_0416.pdf

 

IMp Overfitting

http://cs231n.github.io/neural-networks-2/#reg

http://cs231n.github.io/neural-networks-1/

L2 regularisition

 

https://siavashk.github.io/2016/03/10/l21-regularization/

https://gitter.im/torch/torch7/archives/2015/06/13

https://computing.ece.vt.edu/~harsh/

 

 

https://marcino239.github.io/

 

Optimization

http://cs231n.github.io/neural-networks-3/

https://github.com/torch/optim/blob/master/doc/algos.md

https://github.com/torch/optim/blob/master/sgd.lua

 

batch size

http://stats.stackexchange.com/questions/140811/how-large-should-the-batch-size-be-for-stochastic-gradient-descent

 

 

 

Imp

Click to access DufourNick.pdf

http://cs231n.stanford.edu/reports.html

Click to access hyhieu_final.pdf

 

 

 

 

 

Movie QA

http://movieqa.cs.toronto.edu/home/

 

Jointly Modeling Embedding and Translation to Bridge Video and Language

Click to access 1505.01861.pdf

Sequence to Sequence – Video to Text

Click to access 1505.00487.pdf

Uncovering Temporal Context for Video Question and Answering

 

Two-Stream Convolutional Networks for Action Recognition in Videos

Beyond Short Snippets: Deep Networks for Video Classification

Learning Common Sense Through Visual Abstraction

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors

 

torch-lrcn

https://github.com/garythung/torch-lrcn

ActivityNet

https://github.com/jrbtaylor/ActivityNet

 

Describing Videos by Exploiting Temporal Structure

SA-tensorflow

https://github.com/tsenghungchen/SA-tensorflow

https://github.com/yaoli/arctic-capgen-vid

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

https://www.cs.utexas.edu/~vsub/naacl15_project.html#code

 

video_to_sequence

https://github.com/jazzsaxmafia/video_to_sequence

https://github.com/vsubhashini/caffe/tree/recurrent/examples/s2vt

https://github.com/vsubhashini/caption-eval

https://vsubhashini.github.io/s2vt.html

Click to access 1505.00487v3.pdf

Click to access IVU_Convolutional_Networks_and_Video_Representations.pdf

 

Segment-CNN

https://github.com/zhengshou/scnn

https://github.com/tmbo/video-classification/blob/master/paper/paper.bib

https://github.com/gtoderici/sports-1m-dataset

http://cs.stanford.edu/people/karpathy/deepvideo/

artistic-videos

https://github.com/manuelruder/artistic-videos

https://github.com/yaoli/arctic-capgen-vid

———————————————————–

Word2VEC

http://www.programcreek.com/java-api-examples/index.php?api=edu.stanford.nlp.parser.lexparser.LexicalizedParser

http://textminingonline.com/getting-started-with-word2vec-and-glove

https://radimrehurek.com/gensim/models/word2vec.html

Doc2vec tutorial

 


Attention In VQA

http://iamaaditya.github.io/research/literature/

https://github.com/HyeonwooNoh/DPPnet

https://github.com/ryankiros/skip-thoughts

https://libraries.io/github/johnny5550822/awesome-neat-rnn

https://blog.heuritech.com/2016/01/20/attention-mechanism/

Click to access 1511.02793v2.pdf

————————————————-

LSTM hyperparamater

Click to access LiuSingh.pdf

http://deeplearning4j.org/lstm.html

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard5/tf.nn.rnn_cell.LSTMCell.md


https://github.com/torch/demos/tree/master/attention

https://github.com/torch/demos

 

 

 

—————————————————–

https://handong1587.github.io/deep_learning/2015/10/09/rnn-and-lstm.html

https://handong1587.github.io/deep_learning/2015/10/09/nlp.html

http://torch.ch/blog/2015/09/21/rmva.html

 

http://torch.ch/blog/2015/09/21/rmva.html

 

 

Github VQA link

https://github.com/handong1587/handong1587.github.io/tree/master/_posts/deep_learning

https://github.com/handong1587/handong1587.github.io/blob/master/_posts/deep_learning/2015-10-09-video-applications.md

https://github.com/handong1587/handong1587.github.io/blob/master/_posts/deep_learning/2015-10-09-nlp.md

https://github.com/JamesChuanggg/awesome-vqa

https://github.com/vsubhashini/caffe/tree/recurrent/examples/youtube

https://github.com/handong1587/handong1587.github.io/blob/master/_posts/deep_learning/2015-10-09-image-video-captioning.md

https://www.cs.utexas.edu/~vsub/naacl15_project.html#code

 

Dataset

https://github.com/shuzi/insuranceQA

 

Thesis

Click to access Yang.pdf

 

 

 

Good Paper

Click to access cvpr2014-deepvideo-rahuls.pdf

 

 

 

 

 

 

Click to access CS229.pdf

 

 

IMP PPT

https://github.com/Atcold/torch-Video-Tutorials

 

For Video

https://github.com/anibali/torchvid

 

 

IMP MovieQA

https://github.com/makarandtapaswi/MovieQA_benchmark

For DVS:

http://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/vision-and-language/mpii-movie-description-dataset/

 

 

 

Problem
(gedit:8803): WARNING **: Couldn’t connect to accessibility bus: Failed to connect to socket /tmp/dbus-WjKgPvfxFu: Connection refused
solution
The shell command:

export NO_AT_BRIDGE=1

——————————————————

From your output we see a “defunct”, which means the process has either completed its task or has been corrupted or killed, but its child processes are still running or these parent process is monitoring its child process. To kill this kind of process kill -9 PID don’t work, you can try to kill with this command but it will show this again and again.

Determine which is the parent process of this defunct process and kill it. To know this run the command:

ps -ef | grep defunct

UID PID PPID C STIME TTY TIME CMD

1000 637 27872 0 Oct12 ? 00:00:04 [chrome] <defunct>

1000 1808 1777 0 Oct04 ? 00:00:00 [zeitgeist-datah] <defunct>
Then kill -9 637 27872 then verify the defunct process is gone by ps -ef | grep defunct
ps -ef | grep defunct

ps -xal |grep defunct

ps -u

 


  1. First find the process id of firefox using the following command in any directory:
    pidof firefox
    
  2. Kill firefox process using the following command in any directory:
    kill [firefox pid]

The easiest solution for a program that is not responding would be:

killall -9 firefox

 

 

 

 

 

 

 

 

Leave a comment