Baseten

Baseten

Software Development

San Francisco, CA 4,241 followers

Fast, scalable inference in our cloud or yours

About us

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature.

Website
https://www.baseten.co/
Industry
Software Development
Company size
11-50 employees
Headquarters
San Francisco, CA
Type
Privately Held
Specialties
developer tools and software engineering

Products

Locations

Employees at Baseten

Updates

  • View organization page for Baseten, graphic

    4,241 followers

    🎉 We’re excited to announce Baseten Self-hosted for unparalleled control over AI model deployments! 👉🏻 Check out our announcement blog to learn more: <<<<<https://lnkd.in/gVR6GhQ6>>>>> After working with countless AI builders across different industries, we consistently heard the need for a high-performance inference solution running in their VPC to: • Meet strict data residency requirements • Align with organizational and industry compliance standards • Leverage existing cloud commitments and resources • Customize hardware and GPU usage Both Baseten Cloud and Baseten Self-hosted offer enterprise-grade security, performance, and reliability. Baseten Self-hosted is specifically designed for companies and enterprises needing enhanced control over infrastructure and data, while gaining the performance, reliability, and scale we specialize in. 🥇 Baseten Self-hosted enables you to run inference in your own VPC with the same user experience as our Cloud offering. Model inference inputs and outputs go directly to your compute—they never touch our premises. 💚 We love to support our customers with state-of-the-art AI inference. If Baseten Self-hosted can help you meet your security and compliance needs, provide necessary control over hardware, or leverage your existing resources, get in touch! <<<<<https://lnkd.in/gSQWwH5m>>>>>

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    4,241 followers

    LLM inference on GPUs has bottlenecks at two stages: – GPU compute (in FLOPS) during prefill, when the input is being processed to generate the first token. – GPU memory (in GB/s) for the rest of inference, when the autoregressive model generates each subsequent token. To prove this: – Calculate the ops:byte ratio for a given GPU. – Calculate the arithmetic intensity of various stages of LLM inference. – Compare the two values to see where inference is compute-bound and where it is memory-bound. Follow the math for yourself: https://lnkd.in/edpvU8PM

    A guide to LLM inference and performance

    A guide to LLM inference and performance

    baseten.co

  • View organization page for Baseten, graphic

    4,241 followers

    How do you measure and maximize GPU utilization? Three stats to consider: 🏃 Compute usage: what percentage of the time is a GPU running a kernel vs sitting idle? 📖 Memory usage: what amount of the GPU’s VRAM is active during inference? 🚚 Memory bandwidth usage: how much of the available bandwidth is being used to send data to the compute cores? Check out Marius Killinger and Philip Kiely's thoughts on optimizing usage to improve throughput and cost: https://lnkd.in/gP-Gmnwe

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    4,241 followers

    Launch Faster Whisper Small for rapid transcription from our model library! The Whisper models are state-of-the-art automatic speech recognition (ASR) models commonly used for transcription. With only 244M parameters, Whisper Small strikes a balance between speed and accuracy—with the Faster implementation being up to 4x faster, with no performance loss. ⚡️ 👀 Check out our audio transcription models: https://lnkd.in/e7KqeSuZ 🚀 Launch Faster Whisper Small from our model lib: https://lnkd.in/ecR63E75

  • View organization page for Baseten, graphic

    4,241 followers

    🏆 After considering metrics like visual output quality, prompt adherence, and accurate word generation, Philip Kiely put together a list of the best open-source image generation models. The contenders: 🏋♀️ FLUX.1 by Black Forest Labs 🏃♀️ Stable Diffusion 3 by Stability AI 🚴♂️ SDXL Lightning by ByteDance 🤺 Playground 2.5 by Playground 🥇 And the winner is...up to you! Let us know which images look best to you in the comments, and check the article for our thoughts.

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    4,241 followers

    Bland AI announced $22M of funding and launched on Product Hunt today! 🎊 We’re so pumped to support the future of AI phone calling that we decided to throw an end-of-summer party with them. 🍸 Check the comments for the registration link. With Baseten, Bland reduced end-to-end call latency from 3 seconds to under 400 milliseconds and gained seamless traffic-based autoscaling to meet customer demands—with 50x growth in usage, and 100% uptime to date. Check out the story, support their ProductHunt launch, and come celebrate with us!

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    4,241 followers

    toby founders Lucas Campa 🤌 and Vincent Wilmet 🤌 came to Baseten one week away from their startup’s Product Hunt launch. Their AI-powered real-time translation service allows people to have a live video call while speaking different languages. After working with our engineers, Vincent and Lucas migrated from their development infrastructure to an ultra-low-latency production-ready deployment on Baseten—and reached #3 on Product Hunt on launch day, with zero minutes of downtime. 🔥 Read their story: https://lnkd.in/efz2_DKb

    • No alternative text description for this image

Similar pages

Browse jobs

Funding