The Problem Statement:

Predict the answer of a open-ended question related to given a image.

VQA Library and Setup:

Reference Models:

Issue List:

Implementation Issue Implementation Issue

List of References:

L. Ma, Z. Lu, and H. Li., ‘‘Learning to Answer Questions From Image using Convolutional Neural Network”,CoRR abs/1506.00333, Nov, 2015.
H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang and W. Xu.,‘‘Are you talking to a machine? dataset and methods for multilingual image question answering.”,arXiv 1505.05612v3, Nov, 2015.
M. Ren, R. Kiros, and R. S. Zemel, ‘‘Exploring models and data for image question answering”,arXiv 1505.02074,2015.
M. Malinowski, M. Rohrbach, and M. Fritz.,‘‘Ask your neurons: A neural-based approach to answering questions about images.”,arXiv 1505.01121, Nov, 2015.

Useful Links:

1. Memory Networks for Language Understanding, ICML Tutorial 2016

link: http://www.thespermwhale.com/jaseweston/icml2016/

2. End-To-End Memory Networks for Question Answering

link: https://github.com/vinhkhuc/MemN2N-babi-python

3.Implementing Dynamic memory networks

Link: https://yerevann.github.io/2016/02/05/implementing-dynamic-memory-networks/

MISC

http://www.arxiv-sanity.com/1606.03556

http://www.arxiv-sanity.com/1601.01705

http://www.arxiv-sanity.com/1606.02393

Attention

Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF – 11/13/15 from MLconf

Backpropagation in Convolutional Neural Network from Hiroshi Kuwajima

Deep learning

http://colah.github.io/

deleted from Anton Konushin

Image and word attention

http://yanran.li/peppypapers/2015/12/11/nips-2015-deep-learning-symposium-part-i.html

Compositional Semantic Parsing on Semi-Structured Tables

Click to access 1508.00305v1.pdf

A Deep Architecture for Semantic Parsing

Click to access 1404.7296v1.pdf

Question Answering over Knowledge Base with Neural Attention Combining Global Knowledge Information

Click to access 1606.00979v1.pdf

Recurrent Neural Network Encoder with Attention for Community Question Answering

Click to access 1603.07044v1.pdf

IMAge

Hierarchical Attention Networks

Click to access 1606.02393v1.pdf

Diversified Visual Attention Networks for Fine-Grained Object Classification

Click to access 1606.08572v1.pdf

VQA

Simple Baseline for Visual Question Answering

Click to access 1512.02167v2.pdf

Towards Transparent AI Systems: Interpreting Visual Question Answering Models

Click to access 13_Goyal_SUNw.pdf

https://computing.ece.vt.edu/~harsh/visualAttention/ProjectWebpage/#approach

http://cjds.github.io/image%20recognition/machine%20learning/2016/05/02/Visual-Question-Generation/

https://www.semanticscholar.org/paper/Character-Level-Question-Answering-with-Attention-Golub-He/47170ca3d7faa8535229e1fa4766fce0ce30cab2

Visual Question Answering Literature Survey

http://iamaaditya.github.io/research/literature/

Attention

https://blog.heuritech.com/2016/01/20/attention-mechanism/

Good one for VQA

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

A Focused Dynamic Attention Model for Visual Question Answering

Visual7W: Grounded Question Answering in Images

Stacked Attention Networks for Image Question Answering

Where To Look: Focus Regions for Visual Question Answering

Revisiting Visual Question Answering Baselines

Simple Baseline for Visual Question Answering

Click to access zhu2016cvpr.pdf

Image Question Answering: A Visual Semantic Embedding Model and a New Dataset

Highway Networks for Visual Question Answering

Neural Self Talk: Image Understanding via Continuous Questioning and Answering

—————————————————————-

Role of Attention for Visual Question Answering Submitted By: Harsh Agrawal (harsh92)

https://computing.ece.vt.edu/~harsh/visualAttention/ProjectWebpage/#approach

http://cjds.github.io/image%20recognition/machine%20learning/2016/05/02/Visual-Question-Generation/

————————————————————-

Nice one

Compositional Memory for Visual Question Answering

Click to access 1412.7755v2.pdf

——————————————————————

Good links

https://github.com/kjw0612/awesome-deep-vision

———————————————————————–

The visual question answers loss function

Click to access 1606.03647.pdf

Click to access report.pdf

http://people.cs.vt.edu/~bhuang/courses/pgmsp16/projects/mahendru-pgmsp16

Click to access 1511.05676v1.pdf

Click to access shuhui.pdf

https://www.google.co.in/search?q=visual+question+answer+loss+function&oq=visual+question+answer+loss+function&gs_l=serp.3…4666065.4679367.0.4681151.42.40.0.0.0.0.521.6024.0j4j16j3j0j1.24.0….0…1.1.64.serp..21.12.2779…0j35i39k1j0i67k1j0i22i30k1j0i22i10i30k1j33i21k1j0i7i30k1j0i8i7i30k1j0i8i30k1j30i10k1.7IoURnmZrO4

https://github.com/kundan2510/vqa_LSTM

—————————————————————-

NLTK

http://textminingonline.com/getting-started-with-word2vec-and-glove

http://textminingonline.com/dive-into-nltk-part-i-getting-started-with-nltk

—————————————————————

Torch7. Hello World, Neural Networks!

http://mdtux89.github.io/2015/12/11/torch-tutorial.html

———————————–

Learning Resources for NLP, Sentiment Analysis, and Deep Learning

https://github.com/Lab41/sunny-side-up/wiki/Learning-Resources-for-NLP,-Sentiment-Analysis,-and-Deep-Learning

———————————————————-

MIsc

https://github.com/vivanov879/word2vec

word_center = nn.Identity()()
word_outer = nn.Identity()()

x_center_ = Embedding(vocab_size, 100)(word_center)
x_center = nn.Linear(100, 50)(x_center_)
x_center = nn.Tanh()(x_center)

x_outer_ = Embedding(vocab_size, 100)(word_outer)
x_outer = nn.Linear(100, 50)(x_outer_)
x_outer = nn.Tanh()(x_outer)

x_center_minus = nn.MulConstant(-1)(x_center)

z = nn.CAddTable()({x_outer, x_center_minus})
z = nn.Power(2)(z)
z = nn.Sum(2)(z)

m = nn.gModule({word_center, word_outer}, {z, x_outer_, x_center_})

How A.I. will help kids on the Autism spectrum find employment

————————————————————–

Overfitting

https://www.quora.com/How-can-I-avoid-overfitting

http://stats.stackexchange.com/questions/9053/how-does-cross-validation-overcome-the-overfitting-problem

https://www.researchgate.net/post/How_to_Avoid_Overfitting

http://www.kdnuggets.com/2015/01/clever-methods-overfitting-avoid.html

How to avoid Over-fitting using Regularization?

Click to access 2010Overfitting_0416.pdf

IMp Overfitting

http://cs231n.github.io/neural-networks-2/#reg

http://cs231n.github.io/neural-networks-1/

L2 regularisition

https://siavashk.github.io/2016/03/10/l21-regularization/

https://gitter.im/torch/torch7/archives/2015/06/13

https://computing.ece.vt.edu/~harsh/

https://marcino239.github.io/

Optimization

http://cs231n.github.io/neural-networks-3/

https://github.com/torch/optim/blob/master/doc/algos.md

https://github.com/torch/optim/blob/master/sgd.lua

batch size

http://stats.stackexchange.com/questions/140811/how-large-should-the-batch-size-be-for-stochastic-gradient-descent

Imp

Click to access DufourNick.pdf

http://cs231n.stanford.edu/reports.html

Click to access hyhieu_final.pdf

Movie QA

http://movieqa.cs.toronto.edu/home/

Jointly Modeling Embedding and Translation to Bridge Video and Language

Click to access 1505.01861.pdf

Sequence to Sequence – Video to Text

Click to access 1505.00487.pdf

Uncovering Temporal Context for Video Question and Answering

Two-Stream Convolutional Networks for Action Recognition in Videos

Beyond Short Snippets: Deep Networks for Video Classification

Learning Common Sense Through Visual Abstraction

Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors

torch-lrcn

https://github.com/garythung/torch-lrcn

ActivityNet

https://github.com/jrbtaylor/ActivityNet

Describing Videos by Exploiting Temporal Structure

SA-tensorflow

https://github.com/tsenghungchen/SA-tensorflow

https://github.com/yaoli/arctic-capgen-vid

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

https://www.cs.utexas.edu/~vsub/naacl15_project.html#code

video_to_sequence

https://github.com/jazzsaxmafia/video_to_sequence

https://github.com/vsubhashini/caffe/tree/recurrent/examples/s2vt

https://github.com/vsubhashini/caption-eval

https://vsubhashini.github.io/s2vt.html

Click to access 1505.00487v3.pdf

Click to access IVU_Convolutional_Networks_and_Video_Representations.pdf

Segment-CNN

https://github.com/zhengshou/scnn

https://github.com/tmbo/video-classification/blob/master/paper/paper.bib

https://github.com/gtoderici/sports-1m-dataset

http://cs.stanford.edu/people/karpathy/deepvideo/

artistic-videos

https://github.com/manuelruder/artistic-videos

https://github.com/yaoli/arctic-capgen-vid

———————————————————–

Word2VEC

http://www.programcreek.com/java-api-examples/index.php?api=edu.stanford.nlp.parser.lexparser.LexicalizedParser

http://textminingonline.com/getting-started-with-word2vec-and-glove

https://radimrehurek.com/gensim/models/word2vec.html

Doc2vec tutorial

Attention In VQA

http://iamaaditya.github.io/research/literature/

https://github.com/HyeonwooNoh/DPPnet

https://github.com/ryankiros/skip-thoughts

https://libraries.io/github/johnny5550822/awesome-neat-rnn

https://blog.heuritech.com/2016/01/20/attention-mechanism/

Click to access 1511.02793v2.pdf

————————————————-

LSTM hyperparamater

Click to access LiuSingh.pdf

http://deeplearning4j.org/lstm.html

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard5/tf.nn.rnn_cell.LSTMCell.md

https://github.com/torch/demos/tree/master/attention

https://github.com/torch/demos

—————————————————–

https://handong1587.github.io/deep_learning/2015/10/09/rnn-and-lstm.html

https://handong1587.github.io/deep_learning/2015/10/09/nlp.html

http://torch.ch/blog/2015/09/21/rmva.html

Github VQA link

https://github.com/handong1587/handong1587.github.io/tree/master/_posts/deep_learning

https://github.com/handong1587/handong1587.github.io/blob/master/_posts/deep_learning/2015-10-09-video-applications.md

https://github.com/handong1587/handong1587.github.io/blob/master/_posts/deep_learning/2015-10-09-nlp.md

https://github.com/JamesChuanggg/awesome-vqa

https://github.com/vsubhashini/caffe/tree/recurrent/examples/youtube

https://github.com/handong1587/handong1587.github.io/blob/master/_posts/deep_learning/2015-10-09-image-video-captioning.md

https://www.cs.utexas.edu/~vsub/naacl15_project.html#code

Dataset

https://github.com/shuzi/insuranceQA

Thesis

Click to access Yang.pdf

Good Paper

Click to access cvpr2014-deepvideo-rahuls.pdf

Click to access CS229.pdf

IMP PPT

https://github.com/Atcold/torch-Video-Tutorials

For Video

https://github.com/anibali/torchvid

IMP MovieQA

https://github.com/makarandtapaswi/MovieQA_benchmark

For DVS:

http://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/vision-and-language/mpii-movie-description-dataset/

Problem
(gedit:8803): WARNING **: Couldn’t connect to accessibility bus: Failed to connect to socket /tmp/dbus-WjKgPvfxFu: Connection refused
solution
The shell command:

export NO_AT_BRIDGE=1

——————————————————

From your output we see a “defunct”, which means the process has either completed its task or has been corrupted or killed, but its child processes are still running or these parent process is monitoring its child process. To kill this kind of process kill -9 PID don’t work, you can try to kill with this command but it will show this again and again.

Determine which is the parent process of this defunct process and kill it. To know this run the command:

ps -ef | grep defunct

UID PID PPID C STIME TTY TIME CMD

1000 637 27872 0 Oct12 ? 00:00:04 [chrome] <defunct>

1000 1808 1777 0 Oct04 ? 00:00:00 [zeitgeist-datah] <defunct>
Then kill -9 637 27872 then verify the defunct process is gone by ps -ef | grep defunct
ps -ef | grep defunct

ps -xal |grep defunct

ps -u

First find the process id of firefox using the following command in any directory:
```
pidof firefox
```
Kill firefox process using the following command in any directory:
```
kill [firefox pid]
```

The easiest solution for a program that is not responding would be:

killall -9 firefox

badripatro

Visual Question Answer

The Problem Statement:

VQA Library and Setup:

Reference Models:

1. neural-vqa

2.Deeper LSTM+ normalized CNN for Visual Question Answering

3. Hierarchical Question-Image Co-Attention for Visual Question Answering

4.Simple Baseline for Visual Question Answering

5.Visual7W QA Models

6.VQA Demo

7.Deep Learning for Visual Question Answering

Issue List:

List of References:

Useful Links:

1. Memory Networks for Language Understanding, ICML Tutorial 2016

2. End-To-End Memory Networks for Question Answering

3.Implementing Dynamic memory networks

Visual Question Answering Literature Survey

Role of Attention for Visual Question Answering Submitted By: Harsh Agrawal (harsh92)

The visual question answers loss function

Torch7. Hello World, Neural Networks!

———————————–

Learning Resources for NLP, Sentiment Analysis, and Deep Learning

torch-lrcn

ActivityNet

Describing Videos by Exploiting Temporal Structure

SA-tensorflow

video_to_sequence

Segment-CNN

artistic-videos

Github VQA link

IMP PPT

Leave a comment Cancel reply

The Problem Statement:

VQA Library and Setup:

Reference Models:

1. neural-vqa

2.Deeper LSTM+ normalized CNN for Visual Question Answering

3. Hierarchical Question-Image Co-Attention for Visual Question Answering

4.Simple Baseline for Visual Question Answering

5.Visual7W QA Models

6.VQA Demo

7.Deep Learning for Visual Question Answering

Issue List:

List of References:

Useful Links:

1. Memory Networks for Language Understanding, ICML Tutorial 2016

2. End-To-End Memory Networks for Question Answering

3.Implementing Dynamic memory networks

Visual Question Answering Literature Survey

Role of Attention for Visual Question Answering Submitted By: Harsh Agrawal (harsh92)

The visual question answers loss function

Torch7. Hello World, Neural Networks!

———————————–

Learning Resources for NLP, Sentiment Analysis, and Deep Learning

torch-lrcn

ActivityNet

Describing Videos by Exploiting Temporal Structure

SA-tensorflow

video_to_sequence

Segment-CNN

artistic-videos

Github VQA link

IMP PPT

Share this:

Related

Leave a comment Cancel reply