2023. 1. 13. 13:54ㆍmachine learning
나중에 시간 날 때 기술검토 예정
검색 결과 랭킹모델
1. 네이버 deview2021
대기업은 다르군...
300 억 벡터를 서빙하라! 네이버 검색은 ColBERT 벡터 유사도 검색 도전 중 (naver.com)
Colbert – bert 기반이나 검색에 특화된 알고리즘인듯 함. 스탠포드
2. 깃허브 소스
ColBERT is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
Figure 1: ColBERT's late interaction, efficiently scoring the fine-grained similarity between a queries and a passage.
As Figure 1 illustrates, ColBERT relies on fine-grained contextual late interaction: it encodes each passage into a matrix of token-level embeddings (shown above in blue). Then at search time, it embeds every query into another matrix (shown in green) and efficiently finds passages that contextually match the query using scalable vector-similarity (MaxSim) operators.
These rich interactions allow ColBERT to surpass the quality of single-vector representation models, while scaling efficiently to large corpora. You can read more in our papers:
- ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT (SIGIR'20).
- Relevance-guided Supervision for OpenQA with ColBERT (TACL'21).
- Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21).
- ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction (NAACL'22).
- PLAID: An Efficient Engine for Late Interaction Retrieval (CIKM'22).
'machine learning' 카테고리의 다른 글
쿠버네티스 + mlflow(머신러닝) = Kubeflow 란 무엇인가 (0) | 2023.01.13 |
---|---|
딥러닝 이용한 검색고도화 기획안2 : click model (0) | 2023.01.13 |
자연어와 트랜스포머, BERT, GPT (0) | 2023.01.13 |
자연어 처리 - Transformer, Bert, GPT-3 (0) | 2023.01.13 |
삼성sds - korean albert 자료 (0) | 2022.06.13 |