英文摘要
| Deep learning is good at generating distributed representaions, but they cannot be well interpreted. While disentangled representation is a recently discussed concept that features modularity, compactness and explicitness, which is explainable via the generating factors. This thesis makes use of aligned text and image data of E-commerce products to learn a model that can transfrom
a product title representation to a disentangled one, which can be divided into two modules, one of them encodes the information that is commonly conveyed by the title and the image, while the other encodes the rest information cannot inferred from the image but only known from the title. We achieve our goal by injecting variational dropout, which also provides us meaningful dropout rates learned from the data. The experiment and evaluation results show that the transformed disentangled representations are good at calculating the similarity between different product titles, meanwhile, different sections of the representation show different patterns when doing the evaluation tasks, which might be useful for more applications. We also show that the properties of disentanlgement can be basically satisfied by our learning methods. |