Models¤
Supported Models¤
bookacle
supports three kinds of models:
- Embedding Models - These models are used to embed text into a vector.
- Summarization Models - These models are used to summarize text.
- Question Answering Models - These models are used for question-answering on PDF documents.
bookacle
comes with built-in implementations for all of the models, making it easy to use. All implementations are local out of the box - you do not need an OpenAI key to use bookacle
(just good hardware ).
Custom models can be implemented easily by implementing the corresponding protocols. By implementing these protocols, bookacle
can practically support any model available in the market.
Working with Embedding Models¤
The EmbeddingModelLike
protocol¤
All embedding models in bookacle
need to implement the EmbeddingModelLike
protocol.
This is how the protocol is defined:
class EmbeddingModelLike(Protocol):
"""A protocol that defines the methods and attributes that an embedding model should implement."""
@property
def tokenizer(self) -> TokenizerLike:
"""
Returns:
The tokenizer used by the model.
"""
...
@property
def model_max_length(self) -> int:
"""
Returns:
The maximum length of the input that the model can accept.
"""
...
@overload
def embed(self, text: str) -> list[float]:
"""Embed a single input text.
Args:
text: The input text to embed.
Returns:
The embeddings of the input text.
"""
...
@overload
def embed(self, text: list[str]) -> list[list[float]]:
"""Embed a list of input texts.
Args:
text: The list of input texts to embed.
Returns:
The embeddings of the input texts.
"""
...
def embed(self, text: str | list[str]) -> list[float] | list[list[float]]:
"""Embed the input text or list of texts.
Args:
text: The input text or list of input texts to embed.
Returns:
The embeddings of the input text or list of texts.
"""
...
Use an embedding model from sentence-transformers
¤
bookacle
supports any model from the sentence-transformers
library via the SentenceTransformerEmbeddingModel
class. Of course, it implements the EmbeddingLikeProtocol
.
You can embed a list of texts:
from bookacle.models.embedding import SentenceTransformerEmbeddingModel
embedding_model = SentenceTransformerEmbeddingModel(model_name="all-MiniLM-L6-v2")
texts = ["This is a test", "This is another text"]
embeddings = embedding_model.embed(texts)
print(embeddings)
[[0.030612455680966377, 0.013831389136612415, -0.020843813195824623, 0.01632791757583618, -0.010231463238596916, -0.04798430949449539, -0.017313335090875626, 0.03728746995329857, 0.0458872951567173, 0.03440503776073456, -0.01995977759361267, -0.04465901851654053, -0.013102819211781025, 0.04284117370843887, -0.055393286049366, -0.05897996947169304, 0.013357817195355892, -0.04093948006629944, -0.046640243381261826, 0.030635887756943703, 0.034367453306913376, 0.060174837708473206, -0.059833962470293045, 0.01768563687801361, 0.006318147759884596, -0.011531705968081951, -0.05604183301329613, 0.02306288108229637, 0.035522907972335815, -0.0007312067900784314, -0.0045328824780881405, 0.057125385850667953, 0.06493885815143585, 0.022896159440279007, 0.039082884788513184, 0.015843145549297333, 0.07268378138542175, 0.047734133899211884, 0.008836441673338413, 0.03844047337770462, 0.0178163331001997, -0.09784718602895737, 0.019852813333272934, 0.026004815474152565, 0.004592681769281626, 0.05503907427191734, -0.04667206481099129, 0.03559368848800659, -0.06167946383357048, 0.0011078554671257734, -0.016140859574079514, -0.02640579268336296, -0.07949255406856537, -0.08490555733442307, 0.02612098678946495, -0.0003536582225933671, 0.03115987405180931, -0.02975994348526001, 0.07323474436998367, 0.05357594043016434, -0.015631122514605522, -0.016084633767604828, -0.02865142747759819, 0.03595297411084175, 0.09541890770196915, 0.027765456587076187, -0.04436489939689636, -0.08868944644927979, -0.012653685174882412, -0.05754895135760307, 0.00029446242842823267, 0.011904983781278133, 0.02711832895874977, 0.0427401065826416, 0.029987173154950142, 0.013047449290752411, -0.03863212466239929, -0.09747397154569626, 0.06669681519269943, 0.04018721356987953, -0.09090323001146317, -0.06058499589562416, -0.01149536482989788, 0.02387799136340618, -0.011324130930006504, 0.06687264144420624, 0.08630163967609406, -0.025462999939918518, -0.0988529697060585, -0.006116452161222696, -0.011836444959044456, 0.055986322462558746, -0.06093982234597206, -0.0124645521864295, 0.016563139855861664, -0.021560925990343094, -0.027123969048261642, -0.0020143548026680946, -0.0597769059240818, 0.1048697829246521, -0.002733848989009857, 0.013900971971452236, -0.011673804372549057, 0.013032770715653896, -0.07930494844913483, -0.07446805387735367, 0.035116150975227356, -0.04340590909123421, 0.06082131341099739, 0.00016749520727898926, -0.0566425584256649, -0.026391491293907166, 0.061022572219371796, 0.05601632222533226, 0.017586346715688705, -0.02037360705435276, -0.11545398086309433, 0.04475245624780655, 0.029722360894083977, 0.033025164157152176, 0.08177473396062851, -0.007757130078971386, -0.0009870959911495447, -0.03362540155649185, 0.013657270930707455, -0.022546403110027313, 0.019304458051919937, -6.218771396084904e-33, -0.01287372037768364, -0.041750065982341766, 0.02567782998085022, 0.07849729806184769, -0.01629389449954033, 0.01920202374458313, -0.0353076346218586, 0.048524148762226105, -0.010347978211939335, -0.02134625054895878, -0.02838769555091858, -0.07118804007768631, -0.0011105583980679512, 0.016433585435152054, 0.08224150538444519, 0.11024076491594315, -0.013450153172016144, 0.10640890896320343, -0.07292719185352325, 0.06126890704035759, -0.055247943848371506, 0.031974971294403076, 0.001206150627695024, -0.10626255720853806, -0.08820132166147232, -0.05346548929810524, -0.012100035324692726, -0.0006844053859822452, -0.005452693905681372, 0.026338350027799606, -0.012515079230070114, 0.03960097208619118, -0.07330325990915298, 0.03817977011203766, -0.026006445288658142, -0.022073565050959587, 0.012753617018461227, -0.01975109800696373, -0.011463118717074394, 0.014134258031845093, -0.060215335339307785, -0.04115573689341545, -0.033304862678050995, 0.05939899757504463, 0.06408748775720596, -0.028469707816839218, -0.025994781404733658, -0.03844074904918671, 0.07594479620456696, 0.0013290714705362916, -0.00634453259408474, -0.0017964687431231141, -0.002401819685474038, -0.012808777391910553, -0.04043186455965042, 0.050200529396533966, 0.03893585875630379, 0.017597705125808716, -0.04384371265769005, 0.09622462093830109, 0.038134876638650894, 0.031135767698287964, -0.0758492648601532, 0.017818808555603027, -0.05012990161776543, 0.007719798944890499, -0.04694923013448715, -0.029028410091996193, 0.020347613841295242, 0.0519399493932724, 0.020630426704883575, -0.03577140346169472, 0.026522178202867508, 0.038080524653196335, -0.051192525774240494, -0.06872225552797318, -0.03543980047106743, 0.08038211613893509, -0.03772076219320297, -0.0019484650110825896, 0.037762049585580826, -0.10211655497550964, 0.04411964863538742, -0.07546437531709671, -0.040720872581005096, -0.01922151818871498, -0.02725924737751484, -0.08222347497940063, 0.003876697737723589, -0.061851996928453445, -0.031574055552482605, 0.024813687428832054, 0.004459897987544537, -0.06730470806360245, 0.07732278108596802, 2.7396774800164803e-33, -0.07120928168296814, 0.09144872426986694, -0.0604870580136776, 0.09686953574419022, 0.08657430112361908, -0.02067793533205986, 0.12994472682476044, 0.007910454645752907, -0.08014624565839767, 0.1802346408367157, -0.0012835395755246282, 0.03514110669493675, 0.042759232223033905, -0.032923728227615356, 0.03707115724682808, 0.015503033995628357, 0.06208204850554466, -0.0637759417295456, -0.00826842151582241, -0.031668417155742645, -0.09215668588876724, 0.12675628066062927, 0.05259893462061882, 0.06083960831165314, -0.06681504845619202, 0.041024234145879745, 0.023306608200073242, -0.0696190670132637, -0.0015632979338988662, -0.018594680353999138, 0.015709752216935158, 0.0037315862718969584, -0.06175531446933746, 0.006468655075877905, 0.031019357964396477, -0.013525241985917091, 0.1241416186094284, -0.03099038079380989, -0.053906723856925964, 0.08038649708032608, 0.014954712241888046, 0.11118161678314209, 0.11494535952806473, 0.10129859298467636, -0.008285170421004295, 0.018935466185212135, 0.018787886947393417, -0.09819530695676804, 0.021266572177410126, 0.055501148104667664, -0.06605244427919388, -0.00862936582416296, 0.026670807972550392, 0.06042525917291641, -0.042968593537807465, 0.016999373212456703, -0.0363522469997406, -0.0009375516674481332, 0.024538744240999222, 0.01933225430548191, -0.08255282789468765, 0.07356218248605728, -0.03603252395987511, 0.04258878529071808, -0.0128817493095994, -0.022433863952755928, -0.07963509112596512, 0.07690770924091339, 0.040927913039922714, 0.010309535078704357, 0.0663880705833435, 0.03659835457801819, -0.13319484889507294, -0.05818798765540123, 0.06462288647890091, -0.09328170120716095, -0.04428894445300102, 0.006638913415372372, 0.02975262701511383, -0.043026696890592575, -0.04989458993077278, -0.07609459012746811, 0.003996340092271566, 0.034756213426589966, -0.07185260951519012, 0.09365355968475342, -0.02969425544142723, 0.032247986644506454, -0.04721519351005554, 0.026805879548192024, -0.0013494578888639808, -3.35526347043924e-05, -0.009545459412038326, -0.049362294375896454, -0.015678929165005684, -1.6930899349176798e-08, -0.009994355030357838, -0.049812640994787216, -0.005801562685519457, 0.011708877980709076, -0.030966216698288918, 0.07448813319206238, 0.04264145717024803, -0.053642142564058304, -0.05357576534152031, 0.005138280801475048, 0.09430532902479172, 0.0592377744615078, -0.06483539938926697, 0.047453366219997406, 0.0664837434887886, -0.08378928154706955, 0.013958822935819626, 0.001708249212242663, -0.010309797711670399, 0.07282484322786331, -0.09265510737895966, 0.029341895133256912, 0.030910231173038483, 0.040054284036159515, -0.01109660230576992, 0.056348562240600586, 0.025945425033569336, 0.08576096594333649, 0.003964757081121206, 0.024685867130756378, 0.055565573275089264, 0.06292921304702759, -0.040880974382162094, -0.05910543352365494, -0.02184230647981167, 0.062095899134874344, 0.03349875286221504, -0.03533301129937172, 0.0267812330275774, 0.01788383349776268, -0.10489799827337265, -0.005059277638792992, 0.0010323041351512074, 0.04416537284851074, -0.05984349548816681, -0.04777807369828224, -0.07838761806488037, -0.02046036347746849, -0.030605660751461983, -0.05824664235115051, -0.03258360177278519, -0.005576286930590868, 0.04223371669650078, -0.020661145448684692, 0.011428107507526875, -0.04831383749842644, -0.036218300461769104, -0.010730714537203312, -0.08764959871768951, 0.036847177892923355, 0.11295994371175766, -0.016820374876260757, 0.09417641907930374, -0.04484279081225395], [-0.013409674167633057, 0.05422273650765419, -0.012761354446411133, -0.003200049512088299, 0.03609046712517738, -0.027999019250273705, 0.07846680283546448, -0.013265153393149376, 0.11536920070648193, -0.04115389660000801, 0.0166016835719347, 0.02461712248623371, -0.027169110253453255, -0.006817244458943605, -0.06958436220884323, 0.015025901608169079, 0.030989298596978188, -0.04469304904341698, -0.06639228016138077, 0.002000785432755947, 0.012641465291380882, 0.14097419381141663, -0.018916962668299675, 0.03220295533537865, 0.02342841774225235, 0.06915004551410675, -0.06172248348593712, 0.12996183335781097, 0.028342965990304947, 0.052997082471847534, -0.04949696362018585, 0.11866146326065063, 0.06798027455806732, 0.02636704407632351, 0.021692758426070213, -0.006293952465057373, -0.02747163362801075, 0.10144687443971634, -0.04329666122794151, 0.017107022926211357, -0.005475620273500681, -0.1162223294377327, -0.01428981963545084, 0.02461080811917782, 0.0023441272787749767, 0.029968956485390663, -0.07655058056116104, -0.040692903101444244, 0.03034655563533306, 0.012212011963129044, -0.03760629519820213, -0.04391976818442345, -0.053823985159397125, -0.024345140904188156, 0.04097140207886696, 0.1133681982755661, 0.04634343460202217, -0.04171735420823097, 0.061605993658304214, -0.03524138405919075, 0.03466030955314636, 0.07317081093788147, 0.01769956573843956, 0.04271915555000305, 0.11453939974308014, -0.009656463749706745, -0.050005584955215454, -0.0064838905818760395, -0.15254244208335876, 0.003397691296413541, 0.07279632985591888, 0.029871443286538124, -0.021627379581332207, 0.02484232746064663, -0.03440726175904274, -0.024347681552171707, -0.04441449046134949, -0.051247738301754, -0.0069200098514556885, 0.012414912693202496, -0.045305293053388596, -0.018079020082950592, 0.014311539009213448, 0.008569728583097458, -0.08081069588661194, 0.034178510308265686, -0.027360523119568825, -0.06653930246829987, -0.00910444837063551, -0.0007475292659364641, 0.03391420096158981, 0.0058415113016963005, 0.09097747504711151, 0.007869628258049488, -0.07194098085165024, -0.0475233756005764, -0.11210166662931442, -0.022146044299006462, -0.06701599806547165, 0.13312245905399323, -0.03824518248438835, -0.006490915548056364, 0.009442590177059174, -0.015479057095944881, -0.04820578917860985, -0.10529038310050964, -0.0122987637296319, -0.00735439732670784, 0.026313068345189095, -0.057879652827978134, -0.03335835784673691, -0.007489229552447796, -0.09510820358991623, -0.01833377219736576, 0.04835982993245125, 0.027976877987384796, 0.06150103732943535, -0.0011439473601058125, 0.04285573959350586, 0.06363803893327713, -0.008691895753145218, -0.05781340226531029, -0.044351059943437576, 0.004791758488863707, -0.020956024527549744, -0.09381714463233948, 0.038750771433115005, -2.732741696004644e-33, -0.0026793337892740965, 0.013229786418378353, -0.025732608512043953, 0.02061227709054947, 0.06149739399552345, 0.03521405905485153, -0.04310207813978195, -0.04096528887748718, -0.04375782608985901, -0.11426932364702225, -0.0614008754491806, -0.04373839125037193, 0.03776666522026062, -0.07069312036037445, -0.01711208000779152, -0.049604203552007675, -0.03836197406053543, 0.1185254454612732, 0.07539859414100647, 0.032252222299575806, -0.03706010431051254, 0.1127210482954979, 0.028489885851740837, -0.05077983811497688, -0.01520354300737381, 0.031775761395692825, 0.03518928214907646, 0.00552782230079174, 0.018520168960094452, 0.025301096960902214, 0.012308867648243904, 0.05100933089852333, 0.03741074353456497, -0.0235601793974638, 0.03964461386203766, 0.019408168271183968, 0.028130052611231804, -0.05097424238920212, -0.01400467287749052, -0.016896693035960197, 0.010420695878565311, -0.009123064577579498, -0.023002108559012413, -0.05087205395102501, 0.07496114820241928, 0.06271716207265854, 0.01702369935810566, -0.022203098982572556, -0.02816150337457657, -0.05796828866004944, -0.031894925981760025, 0.03728947788476944, 0.051321841776371, 0.006152238231152296, 0.015158654190599918, -0.003982344642281532, -0.005508152302354574, 0.0738743245601654, -0.004804818890988827, -0.006980707868933678, 0.003288507228717208, 0.02226649597287178, -0.0400947704911232, 0.014281767420470715, -0.023971589282155037, 0.01445505116134882, -0.016815539449453354, -0.017695611342787743, 0.039227165281772614, -0.026634477078914642, -0.06059538200497627, -0.018703101202845573, 0.02909553423523903, 0.024894120171666145, -0.009444073773920536, -0.07256387919187546, -0.0020334599539637566, 0.033345289528369904, 0.024411972612142563, 0.006007564719766378, -0.06679173558950424, -0.11798728257417679, -0.003403669223189354, 0.006011255085468292, -0.0025281149428337812, 0.03225762024521828, 0.02124553918838501, -0.0964810699224472, -0.0011573724914342165, 0.10348601639270782, -0.1366959810256958, 0.0394381619989872, -0.03272629529237747, 0.0059622968547046185, 0.03281940147280693, 9.2006503035387e-34, -0.048096396028995514, 0.06512829661369324, -0.04944799467921257, -0.021626846864819527, 0.0917009487748146, 0.025447316467761993, 0.05869617313146591, 0.010040703229606152, 0.013440764509141445, 0.06642237305641174, -0.026902498677372932, 0.05462001636624336, 0.09447095543146133, -0.05588965117931366, 0.03945977985858917, 0.03600950911641121, 0.1140575110912323, 0.02209986001253128, -0.003542839316651225, 0.026985537260770798, -0.047703567892313004, 0.04990284889936447, -0.051563311368227005, 0.0596298985183239, 0.030864978209137917, 0.015144355595111847, 0.05397241562604904, 0.024021295830607414, -0.0475536473095417, -0.021129969507455826, 0.011311481706798077, -0.00578182702884078, 0.015052086673676968, -0.027789870277047157, 0.0034026377834379673, -0.018340326845645905, 0.09134574979543686, -0.04772049933671951, -0.14426125586032867, 0.08465246856212616, 0.0858253464102745, 0.014751513488590717, 0.025063764303922653, 0.0035462044179439545, -0.00598553754389286, -0.018565179780125618, -0.06410066038370132, -0.0381188802421093, 0.014023351483047009, 0.05712719261646271, -0.021907728165388107, -0.11641982942819595, -0.05747463181614876, -0.0027848200406879187, -0.0722687691450119, -0.002128751017153263, -0.0041648312471807, 0.01990804448723793, 0.02579299919307232, -0.07708142697811127, -0.010573351755738258, 0.03548162057995796, -0.08939314633607864, 0.07463721930980682, 0.09440936893224716, -0.05008842051029205, -0.03960919380187988, 0.04992206022143364, -0.03649460896849632, 0.0305943563580513, 0.0391831248998642, -0.08915091305971146, -0.16025546193122864, -0.03586980327963829, -0.012061011977493763, -0.023771710693836212, -0.005546705797314644, -0.015973707661032677, -0.09265945851802826, -0.007542621344327927, 0.11325205117464066, -0.045546334236860275, -0.004810459911823273, 0.07179175317287445, -0.014281295239925385, 0.03135373815894127, -0.05736076459288597, 0.014675918035209179, -0.010075701400637627, 0.004987342748790979, -0.07707076519727707, 0.033745113760232925, 0.010859940201044083, 0.07166961580514908, -0.02194942906498909, -1.7299866428288624e-08, 0.022551756352186203, -0.10146799683570862, -0.08867359906435013, -0.04218815639615059, 0.030636783689260483, 0.06206211820244789, 0.01646772399544716, -0.13293850421905518, -0.004454590380191803, -0.02087506651878357, 0.1039414331316948, 0.07758404314517975, 0.0017092915950343013, -0.004377019125968218, 0.024237168952822685, -0.006486197933554649, -0.0028265404980629683, -0.05986762046813965, 0.02342396415770054, -0.015133627690374851, 0.003996767569333315, 0.04529254138469696, 0.002878433559089899, -0.006209454499185085, -0.03688574582338333, 0.06549371778964996, -0.021985584869980812, 0.014119823463261127, -0.030616212636232376, 6.007072897773469e-06, 0.06446658819913864, 0.05123062804341316, -0.04049473628401756, -0.05678323283791542, -0.005555553361773491, 0.010750820860266685, 0.055623818188905716, -0.005718015134334564, 0.07253670692443848, 0.031946249306201935, 0.02091604843735695, -0.01535823568701744, 0.0013207634910941124, 0.05684935301542282, 0.016686629503965378, 0.010076011531054974, 0.03392864018678665, -0.09180062264204025, 0.022942716255784035, -0.02175895683467388, 0.017377551645040512, -0.03734732046723366, 0.06608318537473679, 0.031051473692059517, 0.04631378874182701, -0.008197366259992123, 0.019672859460115433, 0.05536578968167305, -0.07264737784862518, 0.04302215576171875, 0.07741106301546097, -0.032122619450092316, 0.08090478926897049, -0.03179511800408363]]
You can also embed a single text:
text = "This is a test"
embeddings = embedding_model.embed(text)
print(embeddings)
[0.030612455680966377, 0.013831389136612415, -0.020843813195824623, 0.01632791757583618, -0.010231463238596916, -0.04798430949449539, -0.017313335090875626, 0.03728746995329857, 0.0458872951567173, 0.03440503776073456, -0.01995977759361267, -0.04465901851654053, -0.013102819211781025, 0.04284117370843887, -0.055393286049366, -0.05897996947169304, 0.013357817195355892, -0.04093948006629944, -0.046640243381261826, 0.030635887756943703, 0.034367453306913376, 0.060174837708473206, -0.059833962470293045, 0.01768563687801361, 0.006318147759884596, -0.011531705968081951, -0.05604183301329613, 0.02306288108229637, 0.035522907972335815, -0.0007312067900784314, -0.0045328824780881405, 0.057125385850667953, 0.06493885815143585, 0.022896159440279007, 0.039082884788513184, 0.015843145549297333, 0.07268378138542175, 0.047734133899211884, 0.008836441673338413, 0.03844047337770462, 0.0178163331001997, -0.09784718602895737, 0.019852813333272934, 0.026004815474152565, 0.004592681769281626, 0.05503907427191734, -0.04667206481099129, 0.03559368848800659, -0.06167946383357048, 0.0011078554671257734, -0.016140859574079514, -0.02640579268336296, -0.07949255406856537, -0.08490555733442307, 0.02612098678946495, -0.0003536582225933671, 0.03115987405180931, -0.02975994348526001, 0.07323474436998367, 0.05357594043016434, -0.015631122514605522, -0.016084633767604828, -0.02865142747759819, 0.03595297411084175, 0.09541890770196915, 0.027765456587076187, -0.04436489939689636, -0.08868944644927979, -0.012653685174882412, -0.05754895135760307, 0.00029446242842823267, 0.011904983781278133, 0.02711832895874977, 0.0427401065826416, 0.029987173154950142, 0.013047449290752411, -0.03863212466239929, -0.09747397154569626, 0.06669681519269943, 0.04018721356987953, -0.09090323001146317, -0.06058499589562416, -0.01149536482989788, 0.02387799136340618, -0.011324130930006504, 0.06687264144420624, 0.08630163967609406, -0.025462999939918518, -0.0988529697060585, -0.006116452161222696, -0.011836444959044456, 0.055986322462558746, -0.06093982234597206, -0.0124645521864295, 0.016563139855861664, -0.021560925990343094, -0.027123969048261642, -0.0020143548026680946, -0.0597769059240818, 0.1048697829246521, -0.002733848989009857, 0.013900971971452236, -0.011673804372549057, 0.013032770715653896, -0.07930494844913483, -0.07446805387735367, 0.035116150975227356, -0.04340590909123421, 0.06082131341099739, 0.00016749520727898926, -0.0566425584256649, -0.026391491293907166, 0.061022572219371796, 0.05601632222533226, 0.017586346715688705, -0.02037360705435276, -0.11545398086309433, 0.04475245624780655, 0.029722360894083977, 0.033025164157152176, 0.08177473396062851, -0.007757130078971386, -0.0009870959911495447, -0.03362540155649185, 0.013657270930707455, -0.022546403110027313, 0.019304458051919937, -6.218771396084904e-33, -0.01287372037768364, -0.041750065982341766, 0.02567782998085022, 0.07849729806184769, -0.01629389449954033, 0.01920202374458313, -0.0353076346218586, 0.048524148762226105, -0.010347978211939335, -0.02134625054895878, -0.02838769555091858, -0.07118804007768631, -0.0011105583980679512, 0.016433585435152054, 0.08224150538444519, 0.11024076491594315, -0.013450153172016144, 0.10640890896320343, -0.07292719185352325, 0.06126890704035759, -0.055247943848371506, 0.031974971294403076, 0.001206150627695024, -0.10626255720853806, -0.08820132166147232, -0.05346548929810524, -0.012100035324692726, -0.0006844053859822452, -0.005452693905681372, 0.026338350027799606, -0.012515079230070114, 0.03960097208619118, -0.07330325990915298, 0.03817977011203766, -0.026006445288658142, -0.022073565050959587, 0.012753617018461227, -0.01975109800696373, -0.011463118717074394, 0.014134258031845093, -0.060215335339307785, -0.04115573689341545, -0.033304862678050995, 0.05939899757504463, 0.06408748775720596, -0.028469707816839218, -0.025994781404733658, -0.03844074904918671, 0.07594479620456696, 0.0013290714705362916, -0.00634453259408474, -0.0017964687431231141, -0.002401819685474038, -0.012808777391910553, -0.04043186455965042, 0.050200529396533966, 0.03893585875630379, 0.017597705125808716, -0.04384371265769005, 0.09622462093830109, 0.038134876638650894, 0.031135767698287964, -0.0758492648601532, 0.017818808555603027, -0.05012990161776543, 0.007719798944890499, -0.04694923013448715, -0.029028410091996193, 0.020347613841295242, 0.0519399493932724, 0.020630426704883575, -0.03577140346169472, 0.026522178202867508, 0.038080524653196335, -0.051192525774240494, -0.06872225552797318, -0.03543980047106743, 0.08038211613893509, -0.03772076219320297, -0.0019484650110825896, 0.037762049585580826, -0.10211655497550964, 0.04411964863538742, -0.07546437531709671, -0.040720872581005096, -0.01922151818871498, -0.02725924737751484, -0.08222347497940063, 0.003876697737723589, -0.061851996928453445, -0.031574055552482605, 0.024813687428832054, 0.004459897987544537, -0.06730470806360245, 0.07732278108596802, 2.7396774800164803e-33, -0.07120928168296814, 0.09144872426986694, -0.0604870580136776, 0.09686953574419022, 0.08657430112361908, -0.02067793533205986, 0.12994472682476044, 0.007910454645752907, -0.08014624565839767, 0.1802346408367157, -0.0012835395755246282, 0.03514110669493675, 0.042759232223033905, -0.032923728227615356, 0.03707115724682808, 0.015503033995628357, 0.06208204850554466, -0.0637759417295456, -0.00826842151582241, -0.031668417155742645, -0.09215668588876724, 0.12675628066062927, 0.05259893462061882, 0.06083960831165314, -0.06681504845619202, 0.041024234145879745, 0.023306608200073242, -0.0696190670132637, -0.0015632979338988662, -0.018594680353999138, 0.015709752216935158, 0.0037315862718969584, -0.06175531446933746, 0.006468655075877905, 0.031019357964396477, -0.013525241985917091, 0.1241416186094284, -0.03099038079380989, -0.053906723856925964, 0.08038649708032608, 0.014954712241888046, 0.11118161678314209, 0.11494535952806473, 0.10129859298467636, -0.008285170421004295, 0.018935466185212135, 0.018787886947393417, -0.09819530695676804, 0.021266572177410126, 0.055501148104667664, -0.06605244427919388, -0.00862936582416296, 0.026670807972550392, 0.06042525917291641, -0.042968593537807465, 0.016999373212456703, -0.0363522469997406, -0.0009375516674481332, 0.024538744240999222, 0.01933225430548191, -0.08255282789468765, 0.07356218248605728, -0.03603252395987511, 0.04258878529071808, -0.0128817493095994, -0.022433863952755928, -0.07963509112596512, 0.07690770924091339, 0.040927913039922714, 0.010309535078704357, 0.0663880705833435, 0.03659835457801819, -0.13319484889507294, -0.05818798765540123, 0.06462288647890091, -0.09328170120716095, -0.04428894445300102, 0.006638913415372372, 0.02975262701511383, -0.043026696890592575, -0.04989458993077278, -0.07609459012746811, 0.003996340092271566, 0.034756213426589966, -0.07185260951519012, 0.09365355968475342, -0.02969425544142723, 0.032247986644506454, -0.04721519351005554, 0.026805879548192024, -0.0013494578888639808, -3.35526347043924e-05, -0.009545459412038326, -0.049362294375896454, -0.015678929165005684, -1.6930899349176798e-08, -0.009994355030357838, -0.049812640994787216, -0.005801562685519457, 0.011708877980709076, -0.030966216698288918, 0.07448813319206238, 0.04264145717024803, -0.053642142564058304, -0.05357576534152031, 0.005138280801475048, 0.09430532902479172, 0.0592377744615078, -0.06483539938926697, 0.047453366219997406, 0.0664837434887886, -0.08378928154706955, 0.013958822935819626, 0.001708249212242663, -0.010309797711670399, 0.07282484322786331, -0.09265510737895966, 0.029341895133256912, 0.030910231173038483, 0.040054284036159515, -0.01109660230576992, 0.056348562240600586, 0.025945425033569336, 0.08576096594333649, 0.003964757081121206, 0.024685867130756378, 0.055565573275089264, 0.06292921304702759, -0.040880974382162094, -0.05910543352365494, -0.02184230647981167, 0.062095899134874344, 0.03349875286221504, -0.03533301129937172, 0.0267812330275774, 0.01788383349776268, -0.10489799827337265, -0.005059277638792992, 0.0010323041351512074, 0.04416537284851074, -0.05984349548816681, -0.04777807369828224, -0.07838761806488037, -0.02046036347746849, -0.030605660751461983, -0.05824664235115051, -0.03258360177278519, -0.005576286930590868, 0.04223371669650078, -0.020661145448684692, 0.011428107507526875, -0.04831383749842644, -0.036218300461769104, -0.010730714537203312, -0.08764959871768951, 0.036847177892923355, 0.11295994371175766, -0.016820374876260757, 0.09417641907930374, -0.04484279081225395]
To access the underlying tokenizer used by the model, you can use the tokenizer
property:
print(embedding_model.tokenizer)
BertTokenizerFast(name_or_path='sentence-transformers/all-MiniLM-L6-v2', vocab_size=30522, model_max_length=256, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=True), added_tokens_decoder={
0: AddedToken("[PAD]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
100: AddedToken("[UNK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
101: AddedToken("[CLS]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
102: AddedToken("[SEP]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
103: AddedToken("[MASK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
You can also check the maximum length of a sequence supported by the model:
print(embedding_model.model_max_length)
256
GPU Inference
SentenceTransformerEmbeddingModel
supports GPU inference. To use a GPU, set the use_gpu
attribute to True
when creating the embedding model.
Implement a custom embedding model¤
Below is an example of how to implement a custom embedding model. We are going to implement an embedding model which uses OpenAI Embeddings by creating a class called OpenAIEmbeddingModel
. It will implement the EmbeddingModelLike
protocol:
Dependencies
This example requires the following packages:
$ python -m pip install openai tiktoken
$ uv add openai tiktoken
from typing import overload
import tiktoken
from openai import OpenAI
from tiktoken.core import Encoding
class OpenAIEmbeddingModel:
def __init__(
self, model_name: str, tokenizer_name: str = "", dimensions: int = 256
) -> None:
self.model_name = model_name
self.dimensions = dimensions
self.tokenizer_name = tokenizer_name
self._client = OpenAI()
# If the tokenizer is not provided, automatically fetch it for the model
self._tokenizer = (
tiktoken.encoding_for_model(model_name)
if not tokenizer_name
else tiktoken.get_encoding(tokenizer_name)
)
# See: https://platform.openai.com/docs/guides/embeddings/embedding-models
self._model_max_length = 8191
def __repr__(self) -> str:
return f"{self.__class__.__name__}(model_name={self.model_name!r}, dimensions={self.dimensions!r}, tokenizer_name={self.tokenizer_name!r})"
@property
def tokenizer(self) -> Encoding:
return self._tokenizer
@property
def model_max_length(self) -> int:
return self._model_max_length
@overload
def embed(self, text: str) -> list[float]: ...
@overload
def embed(self, text: list[str]) -> list[list[float]]: ...
def embed(self, text: str | list[str]) -> list[float] | list[list[float]]:
response = self._client.embeddings.create(
input=[text] if isinstance(text, str) else text,
model=self.model_name,
dimensions=self.dimensions,
encoding_format="float",
)
if isinstance(text, str):
return response.data[0].embedding
return [item.embedding for item in response.data]
Example usage:
embedding_model = OpenAIEmbeddingModel(model_name="text-embedding-3-small")
text = "This is a test"
embeddings = embedding_model.embed(text)
print(embeddings)