딥러닝을 이용한 검색 고도화 기획안1 : ColBERT를 이용한 검색결과 랭킹모델 재고

2023. 1. 13. 13:54machine learning

728x90
반응형

나중에 시간 날 때 기술검토 예정

검색 결과 랭킹모델

 

 1. 네이버 deview2021

대기업은 다르군...

 

300 벡터를 서빙하라! 네이버 검색은 ColBERT 벡터 유사도 검색 도전 중 (naver.com)

 

300 억 벡터를 서빙하라! 네이버 검색은 ColBERT 벡터 유사도 검색 도전 중

NAVER Engineering | 반정호/전보성 - 300 억 벡터를 서빙하라! 네이버 검색은 ColBERT 벡터 유사도 검색 도전 중

tv.naver.com

Colbert bert 기반이나 검색에 특화된 알고리즘인듯 함. 스탠포드

 

2. 깃허브 소스

stanford-futuredata/ColBERT: ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22) (github.com)

 

GitHub - stanford-futuredata/ColBERT: ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22)

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22) - GitHub - stanford-futuredata/ColBERT: ColBERT: state-of-the-art neural search (SIGIR...

github.com

 

ColBERT is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.

Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.

As Figure 1 illustrates, ColBERT relies on fine-grained contextual late interaction: it encodes each passage into a matrix of token-level embeddings (shown above in blue). Then at search time, it embeds every query into another matrix (shown in green) and efficiently finds passages that contextually match the query using scalable vector-similarity (MaxSim) operators.

These rich interactions allow ColBERT to surpass the quality of single-vector representation models, while scaling efficiently to large corpora. You can read more in our papers:

728x90
반응형