Tentang Aplikasi Web Asisten Pintar / About Asisten Pintar Web App (Dokumen Teknis / Technical Documentation)

Dokumentasi Teknis Asisten Pintar

1. Pendahuluan

Asisten Pintar adalah sebuah aplikasi web canggih yang dirancang untuk mengotomatiskan proses pencatatan notulen selama rapat, seminar, atau diskusi. Aplikasi ini memanfaatkan kekuatan Large Language Model (LLM) untuk menyediakan transkripsi audio, ekstraksi teks dari gambar (OCR), dan pembuatan ringkasan yang cerdas dan terstruktur.

Fitur Utama:

Perekaman Audio Multi-Sumber: Mampu merekam suara dari mikrofon, audio sistem (misalnya dari Zoom atau Google Meet), atau keduanya secara bersamaan.
Transkripsi Audio ke Teks: Menggunakan model Whisper Large v3 Turbo via Groq API untuk mengubah rekaman suara menjadi teks dengan akurasi tinggi.
Ekstraksi Teks dari Gambar (OCR): Menggunakan model Llama 4 Scout via Groq API untuk mengenali dan mengekstrak teks dari gambar atau tangkapan layar.
Ringkasan Cerdas & Kontekstual: Menggunakan model Llama 4 Scout via Groq API untuk menghasilkan ringkasan terstruktur yang disesuaikan dengan tema acara (rapat, seminar, wawancara, dll.), mengidentifikasi poin-poin penting, keputusan, dan item aksi.
Ekspor Multi-Format: Memungkinkan pengguna untuk menyimpan laporan lengkap dalam format .txt, .md, atau mencetaknya sebagai .pdf. Selain itu, semua file audio dan gambar dapat diunduh dalam format .zip.

2. Arsitektur Sistem

2.1 Diagram Arsitektur

[Frontend: Browser (index.html)]
 |
 |--- Perekaman Audio/Gambar (MediaRecorder, getUserMedia)
 |--- Interaksi Pengguna (UI Events)
 |
 | REST API (HTTPS, JSON, FormData)
 v
[Backend: Node.js on Cloud Run (index.js)]
 |
 |--- 1. /api/transcribe
 |--- 2. /api/analyze-image
 |--- 3. /api/summarize
 |--- 4. /api/generate-zip
 |
 | Groq Cloud API
 v
[Model AI: Llama-4 Scout, Whisper V3 Large Turbo]

2.2 Komponen Utama

Frontend (Client-Side): Dibuat dengan index.html. Bertanggung jawab untuk menangani seluruh antarmuka pengguna (UI), mengelola proses perekaman audio menggunakan Web API, menangani upload dan preview gambar, serta memformat dan menampilkan hasil akhir. Menggunakan LocalStorage untuk menyimpan status sesi sementara.
Backend (Server-Side): Dibuat dengan index.js (Node.js & Express.js). Bertindak sebagai perantara antara frontend dan Groq API, menyediakan endpoint RESTful untuk memproses file dan mengembalikan hasilnya.
Layanan AI (Groq API): Menggunakan model whisper-large-v3-turbo untuk transkripsi dan meta-llama/llama-4-scout-17b-16e-instruct untuk OCR dan ringkasan.

3. Teknologi yang Digunakan

Frontend: TailwindCSS, Vanilla JS (ES6 Modules), Web APIs (MediaRecorder, getUserMedia), browser-image-compression.
Backend: Node.js, Express.js, groq-sdk, multer, archiver.
Model AI: Whisper V3 Large Turbo, Llama 4 Scout 17B Instruct.

4. API Backend

4.1 Transkripsi Audio

Endpoint: POST /api/transcribe
Request: FormData dengan field file (audio).

4.2 Analisis Gambar (OCR)

Endpoint: POST /api/analyze-image
Request: FormData dengan field file (gambar).

4.3 Ringkasan Notulen

Endpoint: POST /api/summarize
Request: JSON body dengan theme dan content.

4.4 Generate ZIP

Endpoint: POST /api/generate-zip
Request: FormData dengan zipFilename dan multiple file.

5. Konfigurasi

Backend: Membutuhkan environment variable GROQ_API_KEY_1, GROQ_API_KEY_2, dst.
Frontend: URL backend diatur dalam variabel BACKEND_URL di index.html.

6. Deployment

6.1 Local Development

Clone Repository: git clone <url_repository>
Masuk ke Direktori Backend: cd <nama_direktori>
Install Dependencies: npm install
Set Environment Variable: export GROQ_API_KEY_1="gsk_..."
Jalankan Server: node index.js
Buka file index.html di browser.

6.2 Deployment ke Google Cloud Run

Buat Dockerfile di direktori root backend.
Build Container Image: docker build -t gcr.io/[PROJECT_ID]/asisten-pintar .
Push Image ke GCR: docker push gcr.io/[PROJECT_ID]/asisten-pintar

Deploy ke Cloud Run:

gcloud run deploy asisten-pintar-service \
  --image gcr.io/[PROJECT_ID]/asisten-pintar \
  --platform managed \
  --region asia-southeast1 \
  --allow-unauthenticated \
  --set-env-vars="GROQ_API_KEY_1=gsk_..."

7. Testing

Pengujian manual dapat dilakukan dengan mengikuti alur kerja utama aplikasi:

Buka index.html di browser.
Isi informasi acara.
Mulai rekam audio dan pastikan transkrip muncul.
Upload gambar dan pastikan hasil OCR muncul di transkrip.
Klik "Buat Ringkasan" dan verifikasi bahwa ringkasan dalam format Markdown yang benar muncul.
Coba semua tombol unduh (.txt, .md, .pdf, audio, .zip) dan pastikan file yang dihasilkan benar.

8. Limitasi & Catatan

Kualitas Input: Akurasi transkripsi dan OCR sangat bergantung pada kualitas audio (bebas noise) dan gambar (jelas dan tidak buram).
Potensi Bias LLM: Hasil ringkasan yang dihasilkan oleh AI mungkin mengandung bias atau halusinasi. Hasilnya harus selalu ditinjau oleh manusia untuk akurasi final.
Manajemen Biaya: Penggunaan Groq API dihitung berdasarkan penggunaan token. Fitur deteksi bahasa Indonesia ditambahkan untuk menghemat token dengan menghindari terjemahan yang tidak perlu.
Timeout: Proses transkripsi dan ringkasan memiliki batas waktu untuk mencegah proses yang berjalan terlalu lama dan memakan biaya.

9. Rencana Pengembangan

Dukungan Multi-Bahasa.
Integrasi dengan Google Docs / Notion.
Integrasi real-time dengan Zoom/Meet API.
Manajemen Sesi di cloud.

10. Kontak Developer

Ir. Isparmo

Website: www.isparmo.com

Email: mail@isparmo.com

HP/WA: 0812 108 3060

Asisten Pintar Technical Documentation

1. Introduction

Asisten Pintar is an advanced web application designed to automate the process of taking minutes during meetings, seminars, or discussions. It leverages the power of Large Language Models (LLM) to provide audio transcription, text extraction from images (OCR), and intelligent, structured summaries.

Key Features:

Multi-Source Audio Recording: Capable of recording from a microphone, system audio (e.g., from Zoom or Google Meet), or both simultaneously.
Audio-to-Text Transcription: Utilizes the Whisper Large v3 Turbo model via the Groq API to convert audio into highly accurate text.
Text Extraction from Images (OCR): Employs the Llama 4 Scout model via the Groq API to recognize and extract text from images or screenshots.
Smart & Contextual Summaries: Utilizes the Llama 4 Scout via Groq API model to generate structured summaries tailored to the event's theme (meeting, seminar, interview, etc.), identifying key points, decisions, and action items.
Multi-Format Export: Allows users to save complete reports in .txt, .md, or print them as a .pdf. Additionally, all audio and image files can be downloaded as a .zip archive.

2. System Architecture

2.1 Architecture Diagram

[Frontend: Browser (index.html)]
 |
 |--- Media Recording (MediaRecorder, getUserMedia)
 |--- User Interaction (UI Events)
 |
 | REST API (HTTPS, JSON, FormData)
 v
[Backend: Node.js on Cloud Run (index.js)]
 |
 |--- 1. /api/transcribe
 |--- 2. /api/analyze-image
 |--- 3. /api/summarize
 |--- 4. /api/generate-zip
 |
 | Groq Cloud API
 v
[AI Models: Llama-4 Scout, Whisper V3 Large Turbo]

2.2 Main Components

Frontend (Client-Side): Built with index.html. Responsible for handling the entire user interface (UI), managing the media recording process using Web APIs, handling image uploads and previews, and formatting the final output. It uses LocalStorage for temporary session state storage.
Backend (Server-Side): Built with index.js (Node.js & Express.js). Acts as a bridge between the frontend and the Groq API, providing RESTful endpoints to process files and return the results.
AI Services (Groq API): Uses the whisper-large-v3-turbo model for transcription and meta-llama/llama-4-scout-17b-16e-instruct for OCR and summarization.

3. Technology Stack

Frontend: TailwindCSS, Vanilla JS (ES6 Modules), Web APIs (MediaRecorder, getUserMedia), browser-image-compression.
Backend: Node.js, Express.js, groq-sdk, multer, archiver.
AI Models: Whisper V3 Large Turbo, Llama 4 Scout 17B Instruct.

4. Backend API

4.1 Audio Transcription

Endpoint: POST /api/transcribe
Request: FormData with a file field (audio).

4.2 Image Analysis (OCR)

Endpoint: POST /api/analyze-image
Request: FormData with a file field (image).

4.3 Minutes Summarization

Endpoint: POST /api/summarize
Request: JSON body with theme and content.

4.4 Generate ZIP

Endpoint: POST /api/generate-zip
Request: FormData with zipFilename and multiple file fields.

5. Configuration

Backend: Requires environment variables GROQ_API_KEY_1, GROQ_API_KEY_2, etc.
Frontend: The backend URL is set in the BACKEND_URL variable in index.html.

6. Deployment

6.1 Local Development

Clone Repository: git clone <repository_url>
Enter Backend Directory: cd <directory_name>
Install Dependencies: npm install
Set Environment Variable: export GROQ_API_KEY_1="gsk_..."
Run Server: node index.js
Open the index.html file in a browser.

6.2 Deployment to Google Cloud Run

Create a Dockerfile in the backend root directory.
Build Container Image: docker build -t gcr.io/[PROJECT_ID]/asisten-pintar .
Push Image to GCR: docker push gcr.io/[PROJECT_ID]/asisten-pintar

Deploy to Cloud Run:

gcloud run deploy asisten-pintar-service \
  --image gcr.io/[PROJECT_ID]/asisten-pintar \
  --platform managed \
  --region asia-southeast1 \
  --allow-unauthenticated \
  --set-env-vars="GROQ_API_KEY_1=gsk_..."

7. Testing

Manual testing can be performed by following the application's main workflow:

Open index.html in a browser.
Fill in the event information.
Start recording audio and ensure the transcript appears.
Upload an image and ensure the OCR result appears in the transcript.
Click "Create Summary" and verify that a correctly formatted Markdown summary appears.
Test all download buttons (.txt, .md, .pdf, audio, .zip) and ensure the generated files are correct.

8. Limitations & Notes

Input Quality: The accuracy of transcription and OCR is highly dependent on the quality of the audio (noise-free) and images (clear and not blurry).
LLM Potential Bias: Summaries generated by the AI may contain biases or hallucinations. The results should always be reviewed by a human for final accuracy.
Cost Management: The use of the Groq API is based on token consumption. The Indonesian language detection feature was added to save tokens by avoiding unnecessary translations.
Timeouts: Transcription and summarization processes have time limits to prevent long-running, costly processes.

9. Development Roadmap

Multi-language support.
Integration with Google Docs / Notion.
Real-time integration with Zoom/Meet APIs.
Cloud-based session management.

10. Developer Contact

Ir. Isparmo

Website: www.isparmo.com

Email: mail@isparmo.com

Phone/WA: +62 812 108 3060