Free Seminar for Chula Engineering Students (all majors)

Department of Industrial Engineering, Chula Engineering, and The University of Tokyo, Japan, present
“Text Mining by Using Python: Application to Patent Documents”
Instructor: Prof. Kazuyuki Motohashi (U-Tokyo, Japan)
Teaching Assitant: Dr. Suchit Pongnumkul
Date: May 27, 29, 31, 2019
Time: 09:00-12:00
Venue: Room 407, 4F, Engineering Building 4, Faculty of Engineering, Chulalongkorn University
Language: English (and Thai by TA)

– Introduction to patent data analysis
λ What is patent data? Why is it used for technology management research?
λ Various kinds of patent database: PATSTAT, JPO (IIP patent database), USPTO
λ Keyword extraction, TF-IDF, similarity measures
λ VIDEO (O’Reilly Media)
λ HOMEWORK: Apply patent abstract documents to the Video exercise above, and extract three keywords by using TF-IDF scores

– Patent Similarity
λ Review of homework
λ Preprocessing of text: tokenization regularization (lowering character), stemming/lemmatization and stop words exclusion
λ Text processing of tf-idf vectors:
① Create dictionary: mapping every word to a number
② Corpus (list of bags of words) : a list of number of words occurring in each documents
λ Calculation of similarity measures across each documents : genism.similarities

– Topic Modeling
λ What is topic modeling (with some examples)
① Understanding the concept: LDA
λ Gensim module in Python Topic modeling works for the following three technologies, comparison with JPO classification
① Artificial intelligence
② Autonomous driving
③ Gene modification technology
λ Good reference about topic modeling by genism
λ Assignment for further works (three weeks program)

Capacity: 20 seats
(Registration deadline: May 18, 2019)
Contact: Natt Leelawat, D.Eng. (