Baseten

Software Development

San Francisco, CA 4,241 followers

Fast, scalable inference in our cloud or yours

See jobs Follow

View all 43 employees

About us

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature.

Website: https://www.baseten.co/
External link for Baseten
Industry: Software Development
Company size: 11-50 employees
Headquarters: San Francisco, CA
Type: Privately Held
Specialties: developer tools and software engineering

Products

Baseten

Machine Learning Software

Locations

Primary

San Francisco, CA, US

Get directions
New York, NY, US

Get directions

Employees at Baseten

See all employees

Updates

Baseten

4,241 followers
3w Edited
Report this post
🎉 We’re excited to announce Baseten Self-hosted for unparalleled control over AI model deployments! 👉🏻 Check out our announcement blog to learn more: <<<<<https://lnkd.in/gVR6GhQ6>>>>> After working with countless AI builders across different industries, we consistently heard the need for a high-performance inference solution running in their VPC to: • Meet strict data residency requirements • Align with organizational and industry compliance standards • Leverage existing cloud commitments and resources • Customize hardware and GPU usage Both Baseten Cloud and Baseten Self-hosted offer enterprise-grade security, performance, and reliability. Baseten Self-hosted is specifically designed for companies and enterprises needing enhanced control over infrastructure and data, while gaining the performance, reliability, and scale we specialize in. 🥇 Baseten Self-hosted enables you to run inference in your own VPC with the same user experience as our Cloud offering. Model inference inputs and outputs go directly to your compute—they never touch our premises. 💚 We love to support our customers with state-of-the-art AI inference. If Baseten Self-hosted can help you meet your security and compliance needs, provide necessary control over hardware, or leverage your existing resources, get in touch! <<<<<https://lnkd.in/gSQWwH5m>>>>>
4 Comments

Like Comment Share
Baseten

4,241 followers
1d
Report this post
LLM inference on GPUs has bottlenecks at two stages: – GPU compute (in FLOPS) during prefill, when the input is being processed to generate the first token. – GPU memory (in GB/s) for the rest of inference, when the autoregressive model generates each subsequent token. To prove this: – Calculate the ops:byte ratio for a given GPU. – Calculate the arithmetic intensity of various stages of LLM inference. – Compare the two values to see where inference is compute-bound and where it is memory-bound. Follow the math for yourself: https://lnkd.in/edpvU8PM

A guide to LLM inference and performance

baseten.co

Like Comment Share
Baseten

4,241 followers
3d
Report this post
How do you measure and maximize GPU utilization? Three stats to consider: 🏃 Compute usage: what percentage of the time is a GPU running a kernel vs sitting idle? 📖 Memory usage: what amount of the GPU’s VRAM is active during inference? 🚚 Memory bandwidth usage: how much of the available bandwidth is being used to send data to the compute cores? Check out Marius Killinger and Philip Kiely's thoughts on optimizing usage to improve throughput and cost: https://lnkd.in/gP-Gmnwe
Like Comment Share
Baseten

4,241 followers
5d
Report this post
We have three awesome events coming up: Our End of Summer Party + AI Happy Hour with Bland AI 💚 RSVP: https://lu.ma/j03r0ag1 Our CEO and Co-Founder, Tuhin Srivastava, joins tech leaders in a fireside chat on the economics of Gen AI 🧠 Don't miss it! https://lu.ma/j03r0ag1 Plus, don't miss us at PyTorch 2024! 🚀
Like Comment Share
Baseten

4,241 followers
6d
Report this post
🤔 How do you choose the right instance size for your ML models? To decide, the key questions to answer are: ⚙ CPU or GPU? ⚙ How much memory do you need? Larger instances are more powerful, but also more expensive—and the naming and sizing of instances gets complicated quickly. 💡 Philip Kiely created a guide to help you make an informed decision: https://lnkd.in/erQKyZ5H
2 Comments

Like Comment Share
Baseten

4,241 followers
1w
Report this post
Launch Faster Whisper Small for rapid transcription from our model library! The Whisper models are state-of-the-art automatic speech recognition (ASR) models commonly used for transcription. With only 244M parameters, Whisper Small strikes a balance between speed and accuracy—with the Faster implementation being up to 4x faster, with no performance loss. ⚡️ 👀 Check out our audio transcription models: https://lnkd.in/e7KqeSuZ 🚀 Launch Faster Whisper Small from our model lib: https://lnkd.in/ecR63E75

1 Comment

Like Comment Share
Baseten

4,241 followers
1w
Report this post
🏆 After considering metrics like visual output quality, prompt adherence, and accurate word generation, Philip Kiely put together a list of the best open-source image generation models. The contenders: 🏋♀️ FLUX.1 by Black Forest Labs 🏃♀️ Stable Diffusion 3 by Stability AI 🚴♂️ SDXL Lightning by ByteDance 🤺 Playground 2.5 by Playground 🥇 And the winner is...up to you! Let us know which images look best to you in the comments, and check the article for our thoughts.
4 Comments

Like Comment Share
Baseten

4,241 followers
1w
Report this post
Bland AI announced $22M of funding and launched on Product Hunt today! 🎊 We’re so pumped to support the future of AI phone calling that we decided to throw an end-of-summer party with them. 🍸 Check the comments for the registration link. With Baseten, Bland reduced end-to-end call latency from 3 seconds to under 400 milliseconds and gained seamless traffic-based autoscaling to meet customer demands—with 50x growth in usage, and 100% uptime to date. Check out the story, support their ProductHunt launch, and come celebrate with us!
7 Comments

Like Comment Share
Baseten

4,241 followers
1w
Report this post
toby founders Lucas Campa 🤌 and Vincent Wilmet 🤌 came to Baseten one week away from their startup’s Product Hunt launch. Their AI-powered real-time translation service allows people to have a live video call while speaking different languages. After working with our engineers, Vincent and Lucas migrated from their development infrastructure to an ultra-low-latency production-ready deployment on Baseten—and reached #3 on Product Hunt on launch day, with zero minutes of downtime. 🔥 Read their story: https://lnkd.in/efz2_DKb
2 Comments

Like Comment Share

Browse jobs

Funding

Baseten 4 total rounds

Last Round

Series B Apr 4, 2024

US$ 40.0M

Investors

Spark Capital IVP + 5 Other investors

See more info on crunchbase

Baseten

Software Development

San Francisco, CA 4,241 followers

Fast, scalable inference in our cloud or yours

About us

Products

Baseten

Machine Learning Software

Locations

Employees at Baseten

William Lau

Amir Haghighat

Co-founder at Baseten

Aaron Relph

Design at Baseten

Anupreet Walia

Engineering leadership

Updates

Join now to see what you are missing

Similar pages

Doss

Glean

Conviction

Together AI

Bland AI

Anthropic

HeyGen

Modal

Perplexity

Anyscale

Browse jobs

Corporate Finance Intern jobs

Appointment Setter jobs

Data Science Specialist jobs

Sales Development Director jobs

Patent Agent jobs

Enterprise Account Executive jobs

Community Lead jobs

Vice President Finance jobs

Engineer jobs

Psychologist jobs

Scientist jobs

Senior Sales Executive jobs

Evangelist jobs

Specialist jobs

Sales Director jobs

Director jobs

Head of Sales jobs

Executive jobs

Linguist jobs

Analyst jobs

Funding