Visual Question Answer

The Problem Statement:

Predict the answer of a open-ended question related to given a image.

VQA Library and Setup:

  1. Torch setup
  2. Keras(Theano as Backend )
  3. Keras(TensorFlow as Backend)

Reference Models:

1. neural-vqa

link: https://github.com/abhshkdz/neural-vqa

2.Deeper LSTM+ normalized CNN for Visual Question Answering

link: https://github.com/VT-vision-lab/VQA_LSTM_CNN

3. Hierarchical Question-Image Co-Attention for Visual Question Answering

link: https://github.com/jiasenlu/HieCoAttenVQA

4.Simple Baseline for Visual Question Answering

Link: https://github.com/metalbubble/VQAbaseline

5.Visual7W QA Models

link :https://github.com/yukezhu/visual7w-qa-models

6.VQA Demo

link: http://iamaaditya.github.io/2016/04/visual_question_answering_demo_notebook

7.Deep Learning for Visual Question Answering

link: https://github.com/avisingh599/visual-qa

Issue List:

  1. Implementation IssueImplementation Issue

List of References:

  • L. Ma, Z. Lu, and H. Li., ‘‘Learning to Answer Questions From Image using Convolutional Neural Network”,CoRR abs/1506.00333, Nov, 2015.
  • H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang and W. Xu.,‘‘Are you talking to a machine? dataset and methods for multilingual image question answering.”,arXiv 1505.05612v3, Nov, 2015.
  • M. Ren, R. Kiros, and R. S. Zemel, ‘‘Exploring models and data for image question answering”,arXiv 1505.02074,2015.
  • M. Malinowski, M. Rohrbach, and M. Fritz.,‘‘Ask your neurons: A neural-based approach to answering questions about images.”,arXiv 1505.01121, Nov, 2015.

Useful Links:

1. Memory Networks for Language Understanding, ICML Tutorial 2016

link: http://www.thespermwhale.com/jaseweston/icml2016/

2. End-To-End Memory Networks for Question Answering

link: https://github.com/vinhkhuc/MemN2N-babi-python

3.Implementing Dynamic memory networks

Link: https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/

 

 

MISC

http://www.arxiv-sanity.com/1606.03556

http://www.arxiv-sanity.com/1601.01705

http://www.arxiv-sanity.com/1606.02393

 

 

Attention

 

 

 

Deep learning

http://colah.github.io/

 

Image and word attention

http://yanran.li/peppypapers/2015/12/11/nips-2015-deep-learning-symposium-part-i.html

 

 

Compositional Semantic Parsing on Semi-Structured Tables

http://arxiv.org/pdf/1508.00305v1.pdf

A Deep Architecture for Semantic Parsing

http://arxiv.org/pdf/1404.7296v1.pdf

Question Answering over Knowledge Base with Neural Attention Combining Global Knowledge Information

http://arxiv.org/pdf/1606.00979v1.pdf

Recurrent Neural Network Encoder with Attention for Community Question Answering

http://arxiv.org/pdf/1603.07044v1.pdf

 

IMAge

Hierarchical Attention Networks

http://arxiv.org/pdf/1606.02393v1.pdf

Diversified Visual Attention Networks for Fine-Grained Object Classification

http://arxiv.org/pdf/1606.08572v1.pdf

 

VQA

Simple Baseline for Visual Question Answering

http://arxiv.org/pdf/1512.02167v2.pdf

 

Towards Transparent AI Systems: Interpreting Visual Question Answering Models

http://sunw.csail.mit.edu/papers/13_Goyal_SUNw.pdf

 

https://computing.ece.vt.edu/~harsh/visualAttention/ProjectWebpage/#approach

http://cjds.github.io/image%20recognition/machine%20learning/2016/05/02/Visual-Question-Generation/

https://www.semanticscholar.org/paper/Character-Level-Question-Answering-with-Attention-Golub-He/47170ca3d7faa8535229e1fa4766fce0ce30cab2

Visual Question Answering Literature Survey

http://iamaaditya.github.io/research/literature/

 

Attention

Attention Mechanism

 

Good one for VQA

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

A Focused Dynamic Attention Model for Visual Question Answering

Visual7W: Grounded Question Answering in Images

Stacked Attention Networks for Image Question Answering

Where To Look: Focus Regions for Visual Question Answering

Revisiting Visual Question Answering Baselines

Simple Baseline for Visual Question Answering

http://vision.stanford.edu/pdf/zhu2016cvpr.pdf

Image Question Answering: A Visual Semantic Embedding Model and a New Dataset

Highway Networks for Visual Question Answering

Neural Self Talk: Image Understanding via Continuous Questioning and Answering

—————————————————————-

Role of Attention for Visual Question Answering Submitted By: Harsh Agrawal (harsh92)

https://computing.ece.vt.edu/~harsh/visualAttention/ProjectWebpage/#approach

http://cjds.github.io/image%20recognition/machine%20learning/2016/05/02/Visual-Question-Generation/

————————————————————-

Nice one

Compositional Memory for Visual Question Answering

http://arxiv.org/pdf/1412.7755v2.pdf

——————————————————————

Good links

https://github.com/kjw0612/awesome-deep-vision

———————————————————————–

The visual question answers loss function

https://arxiv.org/pdf/1606.03647.pdf

http://home.iitk.ac.in/~kundan/cs498a/report.pdf

http://people.cs.vt.edu/~bhuang/courses/pgmsp16/projects/mahendru-pgmsp16

http://arxiv.org/pdf/1511.05676v1.pdf

 

https://cs224d.stanford.edu/reports/shuhui.pdf

https://www.google.co.in/search?q=visual+question+answer+loss+function&oq=visual+question+answer+loss+function&gs_l=serp.3…4666065.4679367.0.4681151.42.40.0.0.0.0.521.6024.0j4j16j3j0j1.24.0….0…1.1.64.serp..21.12.2779…0j35i39k1j0i67k1j0i22i30k1j0i22i10i30k1j33i21k1j0i7i30k1j0i8i7i30k1j0i8i30k1j30i10k1.7IoURnmZrO4

https://github.com/kundan2510/vqa_LSTM

—————————————————————-

NLTK

Getting Started with Word2Vec and GloVe

Dive Into NLTK, Part I: Getting Started with NLTK

—————————————————————

Torch7. Hello World, Neural Networks!

http://mdtux89.github.io/2015/12/11/torch-tutorial.html

———————————–

Learning Resources for NLP, Sentiment Analysis, and Deep Learning

https://github.com/Lab41/sunny-side-up/wiki/Learning-Resources-for-NLP,-Sentiment-Analysis,-and-Deep-Learning

———————————————————-

MIsc

https://github.com/vivanov879/word2vec

 

word_center = nn.Identity()()
word_outer = nn.Identity()()

x_center_ = Embedding(vocab_size, 100)(word_center)
x_center = nn.Linear(100, 50)(x_center_)
x_center = nn.Tanh()(x_center)

x_outer_ = Embedding(vocab_size, 100)(word_outer)
x_outer = nn.Linear(100, 50)(x_outer_)
x_outer = nn.Tanh()(x_outer)

x_center_minus = nn.MulConstant(-1)(x_center)

z = nn.CAddTable()({x_outer, x_center_minus})
z = nn.Power(2)(z)
z = nn.Sum(2)(z)

m = nn.gModule({word_center, word_outer}, {z, x_outer_, x_center_})

 

How A.I. will help kids on the Autism spectrum find employment

 

————————————————————–

Overfitting

https://www.quora.com/How-can-I-avoid-overfitting

http://stats.stackexchange.com/questions/9053/how-does-cross-validation-overcome-the-overfitting-problem

https://www.researchgate.net/post/How_to_Avoid_Overfitting

http://www.kdnuggets.com/2015/01/clever-methods-overfitting-avoid.html

How to avoid Over-fitting using Regularization?

https://medschool.vanderbilt.edu/cqs/files/cqs/media/2010Overfitting_0416.pdf

 

IMp Overfitting

http://cs231n.github.io/neural-networks-2/#reg

http://cs231n.github.io/neural-networks-1/

L2 regularisition

 

https://siavashk.github.io/2016/03/10/l21-regularization/

https://gitter.im/torch/torch7/archives/2015/06/13

https://computing.ece.vt.edu/~harsh/

 

 

https://marcino239.github.io/

 

Optimization

http://cs231n.github.io/neural-networks-3/

https://github.com/torch/optim/blob/master/doc/algos.md

https://github.com/torch/optim/blob/master/sgd.lua

 

batch size

http://stats.stackexchange.com/questions/140811/how-large-should-the-batch-size-be-for-stochastic-gradient-descent

 

 

 

Imp

http://cs224d.stanford.edu/reports/DufourNick.pdf

http://cs231n.stanford.edu/reports.html

http://cs231n.stanford.edu/reports/hyhieu_final.pdf

 

 

 

 

 

Movie QA

http://movieqa.cs.toronto.edu/home/

 

Jointly Modeling Embedding and Translation to Bridge Video and Language

http://arxiv.org/pdf/1505.01861.pdf

Sequence to Sequence – Video to Text

http://arxiv.org/pdf/1505.00487.pdf

Uncovering Temporal Context for Video Question and Answering

 

Two-Stream Convolutional Networks for Action Recognition in Videos

Beyond Short Snippets: Deep Networks for Video Classification

Learning Common Sense Through Visual Abstraction

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors

 

torch-lrcn

https://github.com/garythung/torch-lrcn

ActivityNet

https://github.com/jrbtaylor/ActivityNet

 

Describing Videos by Exploiting Temporal Structure

SA-tensorflow

https://github.com/tsenghungchen/SA-tensorflow

https://github.com/yaoli/arctic-capgen-vid

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

https://www.cs.utexas.edu/~vsub/naacl15_project.html#code

 

video_to_sequence

https://github.com/jazzsaxmafia/video_to_sequence

https://github.com/vsubhashini/caffe/tree/recurrent/examples/s2vt

https://github.com/vsubhashini/caption-eval

https://vsubhashini.github.io/s2vt.html

http://arxiv.org/pdf/1505.00487v3.pdf

http://feichtenhofer.github.io/pubs/teaching/IVU_Convolutional_Networks_and_Video_Representations.pdf

 

Segment-CNN

https://github.com/zhengshou/scnn

https://github.com/tmbo/video-classification/blob/master/paper/paper.bib

https://github.com/gtoderici/sports-1m-dataset

http://cs.stanford.edu/people/karpathy/deepvideo/

artistic-videos

https://github.com/manuelruder/artistic-videos

https://github.com/yaoli/arctic-capgen-vid

———————————————————–

Word2VEC

http://www.programcreek.com/java-api-examples/index.php?api=edu.stanford.nlp.parser.lexparser.LexicalizedParser

Getting Started with Word2Vec and GloVe

https://radimrehurek.com/gensim/models/word2vec.html

http://rare-technologies.com/doc2vec-tutorial

 


Attention In VQA

http://iamaaditya.github.io/research/literature/

https://github.com/HyeonwooNoh/DPPnet

https://github.com/ryankiros/skip-thoughts

https://libraries.io/github/johnny5550822/awesome-neat-rnn

Attention Mechanism

http://arxiv.org/pdf/1511.02793v2.pdf

————————————————-

LSTM hyperparamater

https://cs224d.stanford.edu/reports/LiuSingh.pdf

http://deeplearning4j.org/lstm.html

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard5/tf.nn.rnn_cell.LSTMCell.md


https://github.com/torch/demos/tree/master/attention

https://github.com/torch/demos

 

 

 

—————————————————–

https://handong1587.github.io/deep_learning/2015/10/09/rnn-and-lstm.html

https://handong1587.github.io/deep_learning/2015/10/09/nlp.html

http://torch.ch/blog/2015/09/21/rmva.html

 

http://torch.ch/blog/2015/09/21/rmva.html

 

 

Github VQA link

https://github.com/handong1587/handong1587.github.io/tree/master/_posts/deep_learning

https://github.com/handong1587/handong1587.github.io/blob/master/_posts/deep_learning/2015-10-09-video-applications.md

https://github.com/handong1587/handong1587.github.io/blob/master/_posts/deep_learning/2015-10-09-nlp.md

https://github.com/JamesChuanggg/awesome-vqa

https://github.com/vsubhashini/caffe/tree/recurrent/examples/youtube

https://github.com/handong1587/handong1587.github.io/blob/master/_posts/deep_learning/2015-10-09-image-video-captioning.md

https://www.cs.utexas.edu/~vsub/naacl15_project.html#code

 

Dataset

https://github.com/shuzi/insuranceQA

 

Thesis

http://crcv.ucf.edu/papers/theses/Yang.pdf

 

 

 

Good Paper

http://www.cs.cmu.edu/~rahuls/pub/cvpr2014-deepvideo-rahuls.pdf

 

 

 

 

 

 

http://cs.stanford.edu/people/ssandeep/reports/CS229.pdf

 

 

IMP PPT

https://github.com/Atcold/torch-Video-Tutorials

 

For Video

https://github.com/anibali/torchvid

 

 

IMP MovieQA

https://github.com/makarandtapaswi/MovieQA_benchmark

For DVS:

http://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/vision-and-language/mpii-movie-description-dataset/

 

 

 

Problem
(gedit:8803): WARNING **: Couldn’t connect to accessibility bus: Failed to connect to socket /tmp/dbus-WjKgPvfxFu: Connection refused
solution
The shell command:

export NO_AT_BRIDGE=1

——————————————————

From your output we see a “defunct”, which means the process has either completed its task or has been corrupted or killed, but its child processes are still running or these parent process is monitoring its child process. To kill this kind of process kill -9 PID don’t work, you can try to kill with this command but it will show this again and again.

Determine which is the parent process of this defunct process and kill it. To know this run the command:

ps -ef | grep defunct

UID PID PPID C STIME TTY TIME CMD

1000 637 27872 0 Oct12 ? 00:00:04 [chrome] <defunct>

1000 1808 1777 0 Oct04 ? 00:00:00 [zeitgeist-datah] <defunct>
Then kill -9 637 27872 then verify the defunct process is gone by ps -ef | grep defunct
ps -ef | grep defunct

ps -xal |grep defunct

ps -u

 


  1. First find the process id of firefox using the following command in any directory:
    pidof firefox
    
  2. Kill firefox process using the following command in any directory:
    kill [firefox pid]

The easiest solution for a program that is not responding would be:

killall -9 firefox

 

 

 

 

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s