딥러닝을 이용한 검색 고도화 기획안1 : ColBERT를 이용한 검색결과 랭킹모델 재고

딥러닝을 이용한 검색 고도화 기획안1 : ColBERT를 이용한 검색결과 랭킹모델 재고

2023. 1. 13. 13:54ㆍmachine learning

728x90

나중에 시간 날 때 기술검토 예정

검색 결과 랭킹모델

1. 네이버 deview2021

대기업은 다르군...

300 억 벡터를 서빙하라! 네이버 검색은 ColBERT 벡터 유사도 검색 도전 중 (naver.com)

300 억 벡터를 서빙하라! 네이버 검색은 ColBERT 벡터 유사도 검색 도전 중

NAVER Engineering | 반정호/전보성 - 300 억 벡터를 서빙하라! 네이버 검색은 ColBERT 벡터 유사도 검색 도전 중

tv.naver.com

Colbert – bert 기반이나 검색에 특화된 알고리즘인듯 함. 스탠포드

2. 깃허브 소스

stanford-futuredata/ColBERT: ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22) (github.com)

GitHub - stanford-futuredata/ColBERT: ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22)

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22) - GitHub - stanford-futuredata/ColBERT: ColBERT: state-of-the-art neural search (SIGIR...

github.com

ColBERT is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.

Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.

As Figure 1 illustrates, ColBERT relies on fine-grained contextual late interaction: it encodes each passage into a matrix of token-level embeddings (shown above in blue). Then at search time, it embeds every query into another matrix (shown in green) and efficiently finds passages that contextually match the query using scalable vector-similarity (MaxSim) operators.

These rich interactions allow ColBERT to surpass the quality of single-vector representation models, while scaling efficiently to large corpora. You can read more in our papers:

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT (SIGIR'20).
Relevance-guided Supervision for OpenQA with ColBERT (TACL'21).
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21).
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction (NAACL'22).
PLAID: An Efficient Engine for Late Interaction Retrieval (CIKM'22).

728x90

'machine learning' 카테고리의 다른 글

쿠버네티스 + mlflow(머신러닝) = Kubeflow 란 무엇인가 (0)	2023.01.13
딥러닝 이용한 검색고도화 기획안2 : click model (0)	2023.01.13
자연어와 트랜스포머, BERT, GPT (0)	2023.01.13
자연어 처리 - Transformer, Bert, GPT-3 (0)	2023.01.13
삼성sds - korean albert 자료 (0)	2022.06.13

주식, 정보처리기술사, IT, 인공지능 자료창고

주식, 정보처리기술사, IT, 인공지능 자료창고

ColBERT is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.

'machine learning' 카테고리의 다른 글

관련글

티스토리툴바